1
00:00:02,266 --> 00:00:06,406
In the previous video, I showed you a
process where we selected half the number

2
00:00:06,406 --> 00:00:11,426
of experiments, and I showed you
that I used a rule where C = A*B.

3
00:00:11,546 --> 00:00:14,856
Now we shouldn't automatically use these rules.

4
00:00:15,496 --> 00:00:17,416
Let me show you where I obtained it from.

5
00:00:18,336 --> 00:00:23,616
What I'm showing you here is a table that
describes how you can do less work in a way

6
00:00:23,616 --> 00:00:26,156
that you can still recover
most of the information.

7
00:00:26,156 --> 00:00:31,246
This table doesn't really have a name that
I'm aware of, so I call it a 'trade-off table'

8
00:00:31,766 --> 00:00:35,526
because it allows me to plan and
budget for a set of experiments

9
00:00:35,676 --> 00:00:39,526
where I can trade off the costs with
the information I hope to obtain.

10
00:00:40,156 --> 00:00:44,156
You're going to see this table several times
again throughout the rest of the course

11
00:00:44,156 --> 00:00:48,156
but for now: remember how in the
previous class we did four experiments

12
00:00:48,156 --> 00:00:50,066
and we had three factors in our system?

13
00:00:50,536 --> 00:00:55,116
On this table, we can locate ourselves
over here in the top left corner,

14
00:00:55,716 --> 00:00:58,316
with four runs and three factors.

15
00:00:59,326 --> 00:01:05,406
In this entry in the table, we are told
to generate C as the product of AB.

16
00:01:06,836 --> 00:01:12,666
Now remember in the wastewater example,
where we had factors C, T and S?

17
00:01:12,666 --> 00:01:17,416
That would correspond to
setting S as the product of C*T

18
00:01:17,416 --> 00:01:19,846
if we were using those factor names.

19
00:01:20,766 --> 00:01:25,836
When you're dealing with your case and your
factor names are different, simply replace them

20
00:01:25,836 --> 00:01:29,636
with A, B, Cs temporarily
to use this trade-off table.

21
00:01:31,576 --> 00:01:34,526
Now I guess you're curious about
that plus and minus sign here.

22
00:01:34,526 --> 00:01:41,096
If I use the minus, it says to
generate -C as the product of AB.

23
00:01:41,096 --> 00:01:49,736
In our wastewater example, that would
translate to -S is the product of CT.

24
00:01:49,736 --> 00:01:56,086
If you created your design following
the rule with the minus sign,

25
00:01:56,646 --> 00:01:59,976
you'd notice that you would
end up with the closed circles

26
00:01:59,976 --> 00:02:03,096
on the original cube, rather
than the open circles.

27
00:02:03,436 --> 00:02:04,736
But notice the symmetry.

28
00:02:05,556 --> 00:02:07,756
The closed circles still have

29
00:02:07,756 --> 00:02:12,306
that same collapsibility we noticed
earlier with the open circles.

30
00:02:13,436 --> 00:02:17,096
Now with so many learners from a wide
range of backgrounds in this course,

31
00:02:17,656 --> 00:02:20,776
I know that some of you might have
difficulty with the math that follows.

32
00:02:21,566 --> 00:02:22,246
Be patient.

33
00:02:22,596 --> 00:02:24,496
Watch it all unfold here on the screen.

34
00:02:24,886 --> 00:02:27,196
And I'll explain it in the
end in plain language.

35
00:02:27,836 --> 00:02:30,906
We're hoping to appeal to those of
you with a technical background,

36
00:02:31,276 --> 00:02:33,796
and also those of you with
a non-technical background.

37
00:02:34,796 --> 00:02:36,306
So let's get started with that math.

38
00:02:36,446 --> 00:02:39,166
Start by writing out the standard order table.

39
00:02:39,166 --> 00:02:42,386
Three factors means that
there are eight experiments.

40
00:02:42,506 --> 00:02:45,926
But in the last video, I showed
you that the best four experiments

41
00:02:45,926 --> 00:02:49,236
to pick are either the open
circles or the closed circles.

42
00:02:49,646 --> 00:02:54,576
Let's work with the open circles, they
corresponded to rows two, three, five,

43
00:02:54,576 --> 00:02:57,146
and eight from the original
standard order table.

44
00:02:57,146 --> 00:03:02,326
So I'm going to remove those other four
rows, the experiments we didn't run,

45
00:03:02,936 --> 00:03:06,076
and leave only the four behind
that we actually ran.

46
00:03:07,896 --> 00:03:10,306
Now I'm going to slightly
rearrange them for you.

47
00:03:10,666 --> 00:03:14,716
Put row five first, and then
original row two and three,

48
00:03:14,716 --> 00:03:18,086
and then finally the original
eighth row becomes our fourth row.

49
00:03:18,086 --> 00:03:25,236
Notice that columns A and B are in standard
order and that column C is the product of A

50
00:03:25,236 --> 00:03:31,106
and B. In a prior video I showed you the
matrix form for a three factor system,

51
00:03:31,236 --> 00:03:33,666
in the context of the wastewater
treatment example.

52
00:03:35,586 --> 00:03:41,746
The matrix form adds extra columns for the
two factor interactions and an extra column

53
00:03:41,746 --> 00:03:43,346
for the three factor interaction.

54
00:03:44,546 --> 00:04:04,126
There were eight unknowns in that
example, in that vector b. So I'm going

55
00:04:04,126 --> 00:04:07,686
to remove those other four rows,
the experiments we didn't run,

56
00:04:08,296 --> 00:04:11,436
and leave only the four behind
that we actually ran.

57
00:04:13,266 --> 00:04:15,676
Now I'm going to slightly
rearrange them for you.

58
00:04:16,026 --> 00:04:20,076
Put row five first, and then
the original row two and three,

59
00:04:20,076 --> 00:04:23,536
and then finally the original
row eight becomes our fourth row.

60
00:04:29,126 --> 00:04:32,716
Examine any patterns you
notice in the eight columns.

61
00:04:34,586 --> 00:04:37,026
Did you pick up on the similar
columns in the matrix?

62
00:04:37,026 --> 00:04:41,706
For example, notice that the
column AB matches the column for C.

63
00:04:42,276 --> 00:04:48,656
And that the B column has the same recurring
pattern as the AC column, minus minus plus plus.

64
00:04:48,816 --> 00:04:55,046
The BC column matches A. And finally, the
ABC column matches the intercept column.

65
00:04:56,116 --> 00:04:59,146
What you need to take away
from this explanation is

66
00:04:59,146 --> 00:05:03,126
that you notice the same pattern
reoccurring in certain columns.

67
00:05:04,056 --> 00:05:08,346
What this means is we won't be able to
tell those columns apart from each other.

68
00:05:09,876 --> 00:05:11,856
Telling columns apart is critical.

69
00:05:12,846 --> 00:05:15,786
In the prior four factorials,
perhaps you noticed

70
00:05:15,786 --> 00:05:18,316
that each column was unique and independent.

71
00:05:18,936 --> 00:05:24,846
This helps us know the unique contribution of
that factor to the outcome, the y variable.

72
00:05:26,076 --> 00:05:28,826
What does it mean if we cannot
tell the columns apart?

73
00:05:29,186 --> 00:05:32,526
Let's use A and the BC columns as an example.

74
00:05:33,566 --> 00:05:37,426
By doing this fractional factorial,
from a mathematical perspective,

75
00:05:38,056 --> 00:05:40,086
those two columns appear to be the same.

76
00:05:40,376 --> 00:05:43,526
We use the word alias to describe the situation.

77
00:05:43,816 --> 00:05:47,786
Now, you may be familiar with the word
alias as used in the English language.

78
00:05:47,876 --> 00:05:52,656
For example, if I asked you,
who is Jorge Mario Bergoglio?

79
00:05:53,216 --> 00:05:54,806
You may not necessarily know.

80
00:05:55,046 --> 00:05:59,506
But you do know his alias, Pope Francis.

81
00:05:59,506 --> 00:06:01,016
We all have aliases.

82
00:06:01,526 --> 00:06:06,146
In the real world, my family and friends
know me by my full name, Kevin George Dunn.

83
00:06:06,736 --> 00:06:11,686
But, online, my alias is my email
address or my username for a website.

84
00:06:11,686 --> 00:06:16,576
So an alias simply means we have
another name for something else.

85
00:06:16,576 --> 00:06:23,756
Back to these experiments, where the alias for
A is B times C. There are three other aliases,

86
00:06:23,946 --> 00:06:29,226
B is the alias for A times C,
C is the alias for A times B,

87
00:06:29,986 --> 00:06:31,826
and that one was intentional, remember.

88
00:06:31,986 --> 00:06:36,146
We chose to set C as the
product of A and B in our design.

89
00:06:36,236 --> 00:06:41,236
Finally, the last alias is the three
factor interaction between A, B,

90
00:06:41,236 --> 00:06:43,916
and C, is aliased with the intercept.

91
00:06:44,336 --> 00:06:49,826
Now when I talked about not being able to
distinguish between A and the BC interaction,

92
00:06:50,256 --> 00:06:54,436
or not being able to tell the columns
apart, that is a consequence of aliasing.

93
00:06:54,976 --> 00:06:57,076
The effects of A and the effects

94
00:06:57,076 --> 00:07:01,066
of the BC interaction are aliased
or mixed up with each other.

95
00:07:01,186 --> 00:07:06,896
So if there's a large effect size, it
might be due to A, or it might be due

96
00:07:06,896 --> 00:07:08,976
to the BC two factor interaction.

97
00:07:09,526 --> 00:07:13,446
We won't know until we run more experiments.

98
00:07:13,446 --> 00:07:17,576
The term confounding is used to
describe what is happening here.

99
00:07:18,746 --> 00:07:22,926
That means, after doing the
reduced set of four experiments,

100
00:07:23,326 --> 00:07:28,336
you will never really be sure whether it
was A that caused the change in the outcome,

101
00:07:28,336 --> 00:07:31,366
or whether it was the two factor interaction BC.

102
00:07:31,366 --> 00:07:39,826
Confounded is a word that means confused, it
means the effect of A is confused or mixed

103
00:07:39,826 --> 00:07:43,766
up with or confounded with the BC interaction.

104
00:07:44,446 --> 00:07:50,266
We cannot tell them apart, A is
an alias for BC and BC is an alias

105
00:07:50,266 --> 00:07:56,456
for A. That's the price we pay for doing
half the experiments, and in some cases,

106
00:07:56,856 --> 00:07:58,896
it is a price that is worth paying.

107
00:07:59,316 --> 00:08:00,506
Let's review the math.

108
00:08:01,176 --> 00:08:05,736
Those with a background in this sort of thing
will recognize that one way to solve a set

109
00:08:05,736 --> 00:08:10,316
of underdetermined equations where
there are more unknowns than equations,

110
00:08:10,816 --> 00:08:16,186
is to collapse these columns together to
achieve a square system that can be solved.

111
00:08:17,366 --> 00:08:20,526
Now you can see algebraically
how this confounding develops.

112
00:08:20,896 --> 00:08:26,436
The coefficient for A in the model is a
combination of the original A, plus BC.

113
00:08:26,436 --> 00:08:33,776
Similarly, the coefficient for B is the
sum of the original B plus AC, and so on.

114
00:08:35,006 --> 00:08:37,636
These four entries here are our aliases.

115
00:08:38,796 --> 00:08:43,766
It also explains why you see only four
coefficients in the R software outputs

116
00:08:43,766 --> 00:08:45,936
and NA values for the remainder of them.

117
00:08:47,066 --> 00:08:52,326
R notices that there are aliased terms in the
full system that we requested a model for,

118
00:08:52,416 --> 00:08:56,906
but it will only report the
coefficient for one of the aliases.

119
00:08:56,906 --> 00:09:00,396
To end off the class, you might be
concerned that you've lost quite a bit

120
00:09:00,396 --> 00:09:06,566
of information by using a fractional factorial.

121
00:09:06,566 --> 00:09:13,366
In the next video we see the
situation isn't quite so bad.

122
00:09:13,996 --> 00:09:18,676
Using prior knowledge about our process we can
actually benefit from knowing about the aliases.

123
00:09:20,076 --> 00:09:23,846
Now hopefully this video has not
left you too confused or confounded.

124
00:09:24,816 --> 00:09:28,136
Just to summarize, there are three
new terms that you must be comfortable

125
00:09:28,136 --> 00:09:33,486
with from today's class, half
fractions, aliasing, and confounding.

126
00:09:34,216 --> 00:09:36,506
We're going to use them a
lot more in the next class.