1
00:00:05,606 --> 00:00:09,426
Our focus with this module is to
understand how to do less work

2
00:00:09,426 --> 00:00:13,936
and still get mostly the same amount of
information, as if we had done all the work.

3
00:00:14,586 --> 00:00:19,296
A bit of educated guessing is required, and
some assumptions are used along the way.

4
00:00:20,146 --> 00:00:24,436
Now, do you remember that rule that when we
were dealing with a system with "k" factors,

5
00:00:24,436 --> 00:00:29,836
and there are two levels for each factor, that
we will have 2 to the power of "k" experiments?

6
00:00:30,496 --> 00:00:32,696
That's a lot of experiments in many cases.

7
00:00:33,576 --> 00:00:36,716
We saw that in the prior module,
that when we used the software,

8
00:00:36,716 --> 00:00:38,736
we could estimate all those coefficients.

9
00:00:39,716 --> 00:00:42,826
The key insight that you will
take away from these videos is

10
00:00:43,016 --> 00:00:45,446
that we don't have to run all those experiments.

11
00:00:45,826 --> 00:00:50,296
We can do fewer, but there's going to be
a price to pay; and we're going to figure

12
00:00:50,296 --> 00:00:52,206
out what that price is in this video.

13
00:00:52,476 --> 00:00:55,696
Here's an experiment with
two factors at two levels

14
00:00:56,326 --> 00:00:58,836
and there are the four parameters
that we can estimate.

15
00:00:59,186 --> 00:01:03,946
The intercept, the main effect of the first
factor, the main effect of the second factor

16
00:01:03,946 --> 00:01:06,566
and the two factor interaction between the two.

17
00:01:07,826 --> 00:01:12,696
Here is a system with three factors, and as
we can see, we can estimate eight parameters

18
00:01:12,766 --> 00:01:15,686
after we have completed the eight experiments.

19
00:01:15,966 --> 00:01:20,886
A system with four factors will have a
total of 16 experiments in a full factorial.

20
00:01:20,886 --> 00:01:26,426
Such as system will have 16 parameters that
we can estimate using computer software.

21
00:01:28,236 --> 00:01:31,796
You can probably appreciate that this
procedure quickly becomes prohibitive

22
00:01:31,906 --> 00:01:33,586
for most practical systems.

23
00:01:34,306 --> 00:01:37,856
There are many systems where
there are 6, 7, or more factors.

24
00:01:38,356 --> 00:01:42,386
We do not want to perform so many
experiments required by the full factorial.

25
00:01:43,616 --> 00:01:46,786
It will be both time prohibitive
and cost prohibitive.

26
00:01:48,116 --> 00:01:54,006
This is even true for systems that can be highly
automated, e.g. systems with DNA sequencing

27
00:01:54,356 --> 00:01:57,486
or systems that are done using
computer software and stimulation.

28
00:01:57,976 --> 00:02:04,126
There is also very little use in estimating all
2 to the power of "k" coefficients, that's many,

29
00:02:04,126 --> 00:02:07,086
many coefficients in some experiments.

30
00:02:07,086 --> 00:02:09,796
These higher order interactions
are non-existent,

31
00:02:10,246 --> 00:02:14,666
and many of those coefficients will be
so small, that they're practically zero.

32
00:02:15,586 --> 00:02:20,216
You'll seldom see a 3 factor interaction
that is actually present in a real system.

33
00:02:21,006 --> 00:02:26,526
And a 4th order, and higher level interactions,
almost certainly don't exist in practice.

34
00:02:26,576 --> 00:02:31,586
By using some educated guessing, and making
reasonable assumptions about our system,

35
00:02:31,996 --> 00:02:35,336
we are going to figure out a
way to do fewer experiments

36
00:02:35,336 --> 00:02:41,676
and still retain the essential information
of the important effects in our system.

37
00:02:41,676 --> 00:02:44,616
At the core of this approach
is an implicit assumption

38
00:02:44,816 --> 00:02:47,836
that we ignore these higher-order
coefficients in the model.

39
00:02:48,686 --> 00:02:51,286
There are occasions when it
is appropriate to do that,

40
00:02:51,956 --> 00:02:54,696
and there will be times when
our assumptions are faulty.

41
00:02:56,076 --> 00:03:00,536
It is critical to understand that there are
practical situations where it's quite okay

42
00:03:00,536 --> 00:03:04,046
to lose some of this prediction
accuracy from the higher-order terms.

43
00:03:05,236 --> 00:03:09,776
Those higher-order terms definitely helped
you fine tune the predictions but the cost

44
00:03:09,776 --> 00:03:11,796
of obtaining them can be prohibitive.

45
00:03:12,766 --> 00:03:16,386
You'll need to decide whether or
not it is worth doing that work.

46
00:03:16,986 --> 00:03:18,896
And that's the subject of today's video.

47
00:03:20,526 --> 00:03:26,256
Perhaps let me ask you to consider the question
this way: if we only had the time and a budget

48
00:03:26,256 --> 00:03:30,736
to do 4 experiments, which 4 of
these original 8 would you do?

49
00:03:31,656 --> 00:03:35,556
You might start by considering to only
run the 4 experiments here at the front,

50
00:03:36,306 --> 00:03:40,796
but that won't work so well because you
will only have factor C at its low level.

51
00:03:41,216 --> 00:03:43,926
There will be no experiments
at the high level for factor C,

52
00:03:44,426 --> 00:03:48,256
and so you won't really know
what factor C does in the system.

53
00:03:48,256 --> 00:03:54,626
So then you might say: "what if I select these
two at the front and those two at the back?"

54
00:03:55,456 --> 00:03:58,626
Those represent the middle four
rows from the standard order table.

55
00:03:59,216 --> 00:04:01,726
That's not a bad choice, but it's not the best.

56
00:04:01,826 --> 00:04:05,186
Let me show you a better choice
then I will explain it afterwards.

57
00:04:06,236 --> 00:04:08,566
Here is the set of 4 experiments
that you should do.

58
00:04:09,166 --> 00:04:13,236
Either select the 4 with open
circles or the 4 with closed circles.

59
00:04:13,236 --> 00:04:16,186
Notice the interesting pattern in the cube.

60
00:04:16,186 --> 00:04:19,966
It is intentionally selected
that way and let me explain why.

61
00:04:20,986 --> 00:04:21,946
We'll work backwards here.

62
00:04:21,946 --> 00:04:27,286
Assuming we have completed these 4
experiments - the 4 with open circles.

63
00:04:27,846 --> 00:04:30,576
And now when we analyze the data we discover

64
00:04:30,576 --> 00:04:33,736
that factor A is not significant
from the Pareto plot.

65
00:04:33,736 --> 00:04:39,796
If A is not significant then it essentially
implies that we could have ignored factor A,

66
00:04:39,796 --> 00:04:44,286
and never really needed to
include it in our experiments.

67
00:04:44,286 --> 00:04:47,416
Another way of saying that, is that
factor A could have been at the -

68
00:04:47,416 --> 00:04:52,896
level or at the + level, and it really wouldn't
have affected our outcome variable much.

69
00:04:52,896 --> 00:04:58,016
If A can exist at two levels and
not really affect our outcome,

70
00:04:58,486 --> 00:05:02,176
that means that we can collapse the
minus and the plus layers together.

71
00:05:02,786 --> 00:05:04,346
And notice then what happens.

72
00:05:05,196 --> 00:05:13,266
As we do that, we recover 4 experiments
in factors B and C. Four experiments

73
00:05:13,266 --> 00:05:15,956
in two factors; that's a full factorial!

74
00:05:16,536 --> 00:05:18,426
We don't have to do any more work here.

75
00:05:18,836 --> 00:05:24,496
These four experiments that we've already run,
now complete a full factorial in factors B

76
00:05:24,496 --> 00:05:31,336
and C. In fact you can prove this to yourself
for the case when factor B is not significant.

77
00:05:31,916 --> 00:05:35,756
Then it collapses to a full
factorial in factor A and factor C.

78
00:05:35,756 --> 00:05:42,706
If factor C is not significant then it collapses
to a full factorial in factor A and factor B.

79
00:05:44,536 --> 00:05:49,016
So from that perspective, these are
really a good set of 4 experiments to use.

80
00:05:49,016 --> 00:05:54,596
So now let's imagine that we've
run only these 4 experiments.

81
00:05:54,596 --> 00:05:58,296
I'd like to show you how we could
analyze the data and I'm going

82
00:05:58,296 --> 00:06:00,426
to use the water treatment example again.

83
00:06:01,426 --> 00:06:08,216
I hope you don't mind if I rename the factors
to A, B, and C. I'm doing this because I want

84
00:06:08,216 --> 00:06:12,316
to use the water treatment example that
you're comfortable with, but at the end,

85
00:06:12,646 --> 00:06:17,236
I want to extend what we learned
here today to any system, and A, B,

86
00:06:17,236 --> 00:06:19,876
and C are the most generic way to do that.

87
00:06:21,626 --> 00:06:24,986
Now assume that each of these
experiments were very expensive.

88
00:06:25,596 --> 00:06:28,046
Maybe they cost around $10,000 each.

89
00:06:28,046 --> 00:06:34,456
So instead of doing 8, let's assume
we've only done these 4: half the work.

90
00:06:35,076 --> 00:06:38,686
Our boss is going to be pretty
impressed that we've saved $40,000.

91
00:06:39,716 --> 00:06:42,386
Open the software and let's see what happens.

92
00:06:43,536 --> 00:06:49,086
Using the best choice design I talked about
earlier, where you've only done experiments 2,

93
00:06:49,086 --> 00:06:56,606
3, 5 and 8 from the original set, I'm going to
ask the software to create new variables for A,

94
00:06:56,926 --> 00:07:01,756
B and C, which only include those 4 experiments.

95
00:07:01,756 --> 00:07:04,316
And here are the 4 outcomes at those conditions.

96
00:07:05,076 --> 00:07:09,086
Now if you just go ahead and type
in the code from the previous class,

97
00:07:09,086 --> 00:07:13,376
you can see that the software will
create a model from A, B and C;

98
00:07:13,886 --> 00:07:16,786
and it includes 2 and 3 factor interactions.

99
00:07:17,976 --> 00:07:22,176
But what you will notice that's different
from last time, is all these NA terms.

100
00:07:22,916 --> 00:07:27,666
That NA stands for "Not Applicable";
those terms cannot be estimated.

101
00:07:28,826 --> 00:07:34,256
But we got 4 estimates of 4 coefficients,
we ran 4 experiments so we expected that.

102
00:07:34,376 --> 00:07:37,086
The full model prediction has 8 parameters

103
00:07:37,086 --> 00:07:40,416
and would have required 8
experiments to calculate all 8 of them.

104
00:07:40,416 --> 00:07:48,206
Now I hope you're still curious about how
I selected those four experiments to run.

105
00:07:48,336 --> 00:07:53,726
Hold that question in your mind,
I'll come back to it, I promise.

106
00:07:53,726 --> 00:07:59,796
But I want to show you first what
we lost out by doing less work.

107
00:07:59,796 --> 00:08:03,336
That way you can judge whether it was worth it.

108
00:08:03,336 --> 00:08:08,496
Let me assume we've done all 8 experiments.

109
00:08:08,496 --> 00:08:15,946
And let me compare that to the case where
we've only done 4 of the experiments.

110
00:08:15,946 --> 00:08:20,946
We're going to write out the two
prediction models side-by-side

111
00:08:20,946 --> 00:08:23,696
so that you can see the differences
between them.

112
00:08:23,696 --> 00:08:29,536
In this particular example, you can see that
three of the terms are numerically similar;

113
00:08:29,536 --> 00:08:33,836
it's not going to lead to
serious misinterpretation.

114
00:08:33,836 --> 00:08:35,656
However, there is one term
that is very different.

115
00:08:35,656 --> 00:08:36,956
What has happened over there?

116
00:08:36,956 --> 00:08:40,886
I'm going to show you now how
that reduced design was found.

117
00:08:40,886 --> 00:08:43,056
How did we come to that best choice?

118
00:08:43,056 --> 00:08:45,486
We call this a half fraction.

119
00:08:45,516 --> 00:08:51,466
The full set of experiments for 3 factors
would've required 2 to the 3 experiments.

120
00:08:51,546 --> 00:08:56,656
If we want to do half the work, then we
can divide by 2 here, which is equal to 4.

121
00:08:56,656 --> 00:09:01,516
Or for those of you that remember your
exponent rules, we could write this

122
00:09:01,516 --> 00:09:05,046
as 2 to the power of (3 minus 1).

123
00:09:05,046 --> 00:09:09,236
This equals 2 to the power of 2, which equals 4.

124
00:09:09,356 --> 00:09:13,806
There is a systematic way
to select those four runs.

125
00:09:13,806 --> 00:09:20,176
Since we know that we will have 4 experiments,
we can quite happily go ahead and write

126
00:09:20,176 --> 00:09:26,206
out our standard order table for the
first two factors, A and B. We do this

127
00:09:26,206 --> 00:09:30,356
because we know two factors
require 4 experiments.

128
00:09:30,576 --> 00:09:35,626
Okay, but what about that
third factor, factor C?

129
00:09:35,626 --> 00:09:39,966
At what settings should we
write out that factor?

130
00:09:39,966 --> 00:09:47,016
We write it out as C equals A times B. In
fact, we say "generate factor C as A times B".

131
00:09:47,016 --> 00:09:56,256
So there we have that factor C is equal to +, -,
-, + for the 4 experiments; the multiplication

132
00:09:56,256 --> 00:10:01,526
of the values in column A
and column B. Let's visualize

133
00:10:01,526 --> 00:10:06,126
where those 4 points are on the original cube.

134
00:10:06,236 --> 00:10:11,866
The first row is at low A and low
B, and high C, so it appears here.

135
00:10:11,866 --> 00:10:17,286
The next point is that high A, low B,
and then low C. So that's over here.

136
00:10:17,286 --> 00:10:22,186
The third experiment is there, and the
last experiment is at high A, high B,

137
00:10:22,186 --> 00:10:27,376
and high C. Notice how that
corresponds to the ideal selection

138
00:10:27,376 --> 00:10:30,846
of four experiments we made
at the start of this video.

139
00:10:30,846 --> 00:10:36,766
In the next video I'm going to show you where
I got that rule where C should equal A times B.

140
00:10:36,766 --> 00:10:40,666
So let's understand the trade off here.

141
00:10:40,666 --> 00:10:45,996
If we do half the amount of
experiments we have to accept

142
00:10:46,356 --> 00:10:49,696
that we get less information from the system.

143
00:10:49,696 --> 00:10:54,436
I guess you can say there's
no such thing as a free lunch.

144
00:10:54,436 --> 00:10:56,286
You can't get something for nothing.

145
00:10:56,286 --> 00:11:00,836
The question is: "what is the
penalty for doing fewer experiments?"

146
00:11:00,836 --> 00:11:04,046
"What is this free lunch costing me?"

147
00:11:04,416 --> 00:11:07,966
I mean, if we had paid an extra $40,000,

148
00:11:07,966 --> 00:11:12,706
and did the extra four experiments
we'd have that extra information.

149
00:11:12,706 --> 00:11:14,666
You can already see that over here.

150
00:11:14,736 --> 00:11:20,276
We had some good estimates
of the three parameters.

151
00:11:20,276 --> 00:11:23,146
The intercept, the A main
effect, the C main effect.

152
00:11:23,146 --> 00:11:26,516
But the B main effect was actually quite wrong.

153
00:11:26,516 --> 00:11:33,326
Also you notice that we didn't get any
estimates of the two-factor interactions.

154
00:11:34,236 --> 00:11:40,346
Let me drop in two words that we
will come back to in later classes.

155
00:11:40,346 --> 00:11:42,526
"Screening" and "optimization".

156
00:11:42,526 --> 00:11:48,416
When we are screening, we don't mind
having reduced knowledge of the system.

157
00:11:48,416 --> 00:11:53,046
For example, we don't mind if the
two-factor interactions are not all known,

158
00:11:53,586 --> 00:11:58,126
or if the estimates of the
factors are not quite correct.

159
00:11:58,126 --> 00:12:03,296
Later on, when optimizing, though, we want
more specific information about the system:

160
00:12:03,296 --> 00:12:05,076
a better level of prediction accuracy.

161
00:12:05,076 --> 00:12:12,526
At that point is when we will require better
resolution of the main effects and interactions.

162
00:12:12,526 --> 00:12:18,646
So this is what the $40,000 is costing us: a
reduction in the model's prediction quality.

163
00:12:18,646 --> 00:12:22,466
You could ask whether that's
worth the money saved.

164
00:12:22,466 --> 00:12:24,496
Well, you'll never really
know the correct answer,

165
00:12:24,496 --> 00:12:27,536
unless you do the full set of experiments.

166
00:12:27,536 --> 00:12:32,236
But I'm going to show you how we can make
some educated guesses later in this module.

167
00:12:32,396 --> 00:12:36,636
What we've done here by not
running those extra experiments, is,

168
00:12:36,636 --> 00:12:39,936
we've rather cleverly selected a
subset of them to save $40,000.

169
00:12:39,936 --> 00:12:42,226
We can use this money later on.

170
00:12:42,226 --> 00:12:49,596
For when we required a more detailed
model to find that optimum in the system.

171
00:12:49,596 --> 00:12:55,606
George Box, the famous statistician from
whose text book, we're using this example,

172
00:12:55,606 --> 00:12:58,116
said is a rough rule, but only a portion
about 25% of the experimental efforts

173
00:12:58,146 --> 00:12:59,856
and budget should be invested in
the first experimental designs.

174
00:12:59,886 --> 00:13:00,546
I paraphrased that slightly.

175
00:13:00,576 --> 00:13:02,196
But basically he is saying that you
should leave some money, and time,

176
00:13:02,226 --> 00:13:03,216
for later on to figure out the details.

177
00:13:03,246 --> 00:13:05,376
In the beginning you don't even know yet
if A, B or C are actually significant.

178
00:13:05,406 --> 00:13:06,876
First figure that out before
you go build a detailed model,

179
00:13:06,906 --> 00:13:08,016
with two-level and three-level interactions.

180
00:13:08,046 --> 00:13:09,156
That's where we're going
to leave the class today.

181
00:13:09,186 --> 00:13:11,316
We've shown you the end point that when you
do half the work, you lose a bit of accuracy

182
00:13:11,346 --> 00:13:12,876
in your model but there's a
great built-in backup strategy

183
00:13:12,906 --> 00:13:14,316
in the clever selection of
which half of the work to do.

184
00:13:14,346 --> 00:13:16,176
I guess you could say at least be smart
about which half of the work to do.

185
00:13:16,206 --> 00:13:18,156
In the next class we're going to learn
the technical terms and the mechanics

186
00:13:18,186 --> 00:13:19,056
around creating these half-fractions.