1
00:00:00,036 --> 00:00:03,656
So let's look at an example to end this module.

2
00:00:04,146 --> 00:00:07,766
We said in the prior video that you
should always include as many factors

3
00:00:07,766 --> 00:00:10,906
as you possibly can in a set of experiments.

4
00:00:10,906 --> 00:00:13,016
Do you remember why we recommend that?

5
00:00:13,476 --> 00:00:16,056
If not, please review the prior video again.

6
00:00:16,056 --> 00:00:19,876
In this example we are going to use 7 factors,

7
00:00:20,256 --> 00:00:23,486
and the fewest possible experiments;
that's 8 experiments.

8
00:00:23,776 --> 00:00:28,386
We are going to screen out which of those
7 factors really affect our outcome.

9
00:00:29,836 --> 00:00:33,806
So it is a screening design with 8
experiments and a resolution of III.

10
00:00:34,416 --> 00:00:38,386
I could choose more experiments, and
then go to higher and higher resolutions.

11
00:00:38,876 --> 00:00:43,056
But let's see what happens when we start with
just eight experiments and seven factors.

12
00:00:45,776 --> 00:00:51,556
With eight experiments, we have factors A, B
and C to form a full factorial in eight rows.

13
00:00:52,166 --> 00:00:55,976
The tradeoff table tells us
to generate factors D, E,

14
00:00:56,106 --> 00:01:01,216
F and G. Now notice that
this is a 2^{7 - 4} design.

15
00:01:01,326 --> 00:01:04,326
So this design has p=4.

16
00:01:04,786 --> 00:01:10,536
These 4 generators, can be used to create the
columns for the remaining factors in my system.

17
00:01:11,716 --> 00:01:13,156
And here's the completed table.

18
00:01:13,496 --> 00:01:17,226
I can go ahead and run the
experiments and start my analysis.

19
00:01:17,226 --> 00:01:20,696
But the whole purpose of the tools
introduced in this module is all

20
00:01:20,696 --> 00:01:24,416
about checking your aliasing
before you start the analysis.

21
00:01:24,416 --> 00:01:25,226
Let's go do that.

22
00:01:29,386 --> 00:01:32,316
Our 4 generators are rearranged over here.

23
00:01:33,496 --> 00:01:37,566
I equals ABD, I equals ACE, and so on.

24
00:01:38,586 --> 00:01:40,846
How many words in our defining relationship?

25
00:01:41,306 --> 00:01:47,496
Two to the power of p and with p=4
in this case, that equals 16 words.

26
00:01:48,086 --> 00:01:51,076
That's a lot of words to figure
out, but let's give it a try.

27
00:01:51,076 --> 00:01:53,416
The first few words are easy.

28
00:01:54,106 --> 00:02:05,086
Take the rearranged generators individually:
I = ABD = ACE = BCF = ABCG That's 5 of them.

29
00:02:06,026 --> 00:02:14,406
Now we can add to that the combinations
two at a time: (ABD)(ACE) = BCDE.

30
00:02:15,276 --> 00:02:18,896
The next combination two at
a time is: (ABD)(BCF) = ACDF.

31
00:02:18,896 --> 00:02:26,096
You can prove to yourself that those are
the remaining four (CDG, ABEF, BEG, AFG).

32
00:02:26,096 --> 00:02:29,316
Now we've got 11 words so far
in our defining relationship.

33
00:02:29,996 --> 00:02:32,886
The next step is to take our
generators three at a time:

34
00:02:33,766 --> 00:02:42,586
(ABD)(ACE)(BCF) = DEF Try the
next three (ADEG, CEFG, BDFG).

35
00:02:42,586 --> 00:02:45,156
So, there we have a total of 15.

36
00:02:45,156 --> 00:02:50,446
And the final combination is to use all
four generators multiplied together.

37
00:02:51,036 --> 00:02:54,386
And that simplifies to ABCDEFG.

38
00:02:54,386 --> 00:02:57,356
So, here's our complete defining relationship.

39
00:02:57,356 --> 00:03:03,856
Now, let's go try and calculate the
aliasing for factor A. If we go and do that,

40
00:03:04,026 --> 00:03:06,496
we get this very long expression over here.

41
00:03:07,166 --> 00:03:10,836
I've highlighted only the two-factor
interactions that are confounded

42
00:03:10,836 --> 00:03:14,506
with the main effect of A. I
can create this list of aliases

43
00:03:14,506 --> 00:03:16,806
for the seven main effects in my design.

44
00:03:17,586 --> 00:03:22,716
This illustrates the tremendous confounding
that takes place in the very dense designs

45
00:03:22,716 --> 00:03:25,176
at the far right-hand side
of the trade-off table.

46
00:03:25,966 --> 00:03:32,126
Remember, instead of doing two to the seven,
which equals 128 experiments, we've done 8.

47
00:03:32,816 --> 00:03:35,876
There's going to be a steep price
to pay for this reduction in work.

48
00:03:35,876 --> 00:03:39,276
Now let's go and look at the
numbers from the outcome variable,

49
00:03:39,356 --> 00:03:42,076
and how to continue on with the analysis.

50
00:03:42,246 --> 00:03:47,596
And as you'll see, and this is very typical, the
analysis goes much quicker than the planning.

51
00:03:49,586 --> 00:03:52,366
Here's the code that you can
use to analyze this design.

52
00:03:52,986 --> 00:03:54,926
Please copy and paste it from the website.

53
00:03:55,646 --> 00:03:58,976
We recommend that you always clear
your environment from prior work.

54
00:03:58,976 --> 00:04:02,206
This is because you might have
a variable with the same name

55
00:04:02,206 --> 00:04:05,146
from a different analysis;
this will avoid any confusion.

56
00:04:06,266 --> 00:04:10,416
Build the linear model in exactly the same
way as you created the design on paper.

57
00:04:10,416 --> 00:04:16,376
First, define the three variables that
you start with: A, B, and C. Next,

58
00:04:16,666 --> 00:04:21,006
generate the remaining four factors using
the definitions from the tradeoff table.

59
00:04:21,636 --> 00:04:26,206
When you inspect these variables in the console,
you should get exactly what you had on paper.

60
00:04:26,936 --> 00:04:30,906
Now, add the outcome values
recorded for the eight experiments.

61
00:04:30,906 --> 00:04:33,106
I'm going to take them from
the standard order table.

62
00:04:33,966 --> 00:04:36,216
When you are ready to visualize
your linear model,

63
00:04:36,596 --> 00:04:39,336
load the PID package, using
the "library" command.

64
00:04:40,016 --> 00:04:43,356
You would have installed this package
if you had been following prior videos.

65
00:04:43,936 --> 00:04:47,266
I will quickly note that R
packages are frequently updated.

66
00:04:47,726 --> 00:04:50,836
You should check for updates
regularly, as demonstrated here.

67
00:04:50,836 --> 00:04:55,116
So use the "paretoPlot(...)" command
and let's examine the output.

68
00:04:58,906 --> 00:05:04,286
We can see here that the factors C, A and
G are significant and have a negative,

69
00:05:04,416 --> 00:05:06,476
reducing effect, on the outcome variable.

70
00:05:06,526 --> 00:05:08,996
Factor E is a little smaller.

71
00:05:09,556 --> 00:05:14,526
And factors B, D and F have
small to negligible coefficients.

72
00:05:15,486 --> 00:05:21,206
Note however, when we say factor A up here
is important, it is really A that is aliased

73
00:05:21,206 --> 00:05:24,276
with a variety of two factor
and higher interactions.

74
00:05:25,276 --> 00:05:30,506
As long as the assumption is true that those two
factor and higher order interactions are small,

75
00:05:30,506 --> 00:05:38,286
or zero, then that bar in the Pareto plot
essentially represents the effect of A. What

76
00:05:38,286 --> 00:05:41,126
about the unimportance of
small effects down here?

77
00:05:41,866 --> 00:05:44,006
They can be removed -- judiciously.

78
00:05:44,536 --> 00:05:47,646
As long as you are confident
that when you varied factor B,

79
00:05:47,986 --> 00:05:52,316
you did so over a large enough range to
affect the outcome variable meaningfully,

80
00:05:52,596 --> 00:05:55,276
then you can be sure that this Pareto plot shows

81
00:05:55,276 --> 00:05:58,586
that factor B really has no
significant effect on the outcome.

82
00:05:59,136 --> 00:06:00,346
It is safe to remove it.

83
00:06:01,266 --> 00:06:04,866
In other words, we have screened
factor B out of consideration.

84
00:06:06,356 --> 00:06:10,186
So let's go remove factors B,
F, and D for those reasons.

85
00:06:10,906 --> 00:06:15,676
By removing these three factors, we've
reduced ourselves from 7 to 4 factors,

86
00:06:16,106 --> 00:06:19,066
but we've still have done eight experiments.

87
00:06:19,066 --> 00:06:23,926
We might as well have done the experiments
with only factors A, C, G and E present.

88
00:06:24,856 --> 00:06:28,766
Note however that we do not
have to redo the experiments.

89
00:06:28,766 --> 00:06:33,956
If you refit the model in R with only these four
factors you get exactly the same coefficients

90
00:06:33,956 --> 00:06:34,436
as before.

91
00:06:34,436 --> 00:06:39,066
This is due to the independence property
that's built into the model's design.

92
00:06:40,016 --> 00:06:43,316
Those of you with a least-squares
background, will recognize that the columns

93
00:06:43,316 --> 00:06:49,566
in this matrix are independent, so when you
rebuild the model you will get the same results.

94
00:06:49,566 --> 00:06:52,666
So, there's that; we've essentially
found ourselves a system

95
00:06:52,666 --> 00:06:55,326
with four factors in eight experiments.

96
00:06:55,326 --> 00:06:57,886
We've eliminated three unimportant variables,

97
00:06:57,886 --> 00:07:00,686
as we've learned that they have
little effect on our outcome.

98
00:07:01,296 --> 00:07:04,786
We have retained four important factors
that we know affect our outcome.

99
00:07:04,786 --> 00:07:09,986
We will see in the following module
that we can focus our future attention

100
00:07:09,986 --> 00:07:13,116
on these important factors
now, to optimize the system.

101
00:07:14,816 --> 00:07:16,506
So that's the end of this module.

102
00:07:16,506 --> 00:07:21,486
For advanced students I do want to
point out two other reduced designs.

103
00:07:22,096 --> 00:07:28,336
The first, is a Plackett-Burman design,
the regular tradeoff table shows

104
00:07:28,336 --> 00:07:33,286
that you can do 4, 8, 16,
32, 64, and so on runs.

105
00:07:34,186 --> 00:07:37,306
But what if you had a budget,
for example for 24 runs.

106
00:07:37,486 --> 00:07:40,506
That's more than 16 but not quite enough for 32.

107
00:07:41,536 --> 00:07:45,996
Well Placket-Burman design works well for these
cases where you have a budget that is a multiple

108
00:07:45,996 --> 00:07:49,196
of four but not one of the
existing powers in the table.

109
00:07:49,196 --> 00:07:52,796
So a budget of 20, 24, 28, and so on.

110
00:07:54,126 --> 00:07:57,146
I'm not going to go into the details
of the Placket-Burman design,

111
00:07:57,576 --> 00:08:00,956
but now that you know the terminology,
you can go search for more information.

112
00:08:00,956 --> 00:08:05,896
The final type of design to be aware
of is a class of designs called the

113
00:08:05,896 --> 00:08:10,716
"Definitive Screening Design", and here's a
link that you can read up some more information.

114
00:08:11,836 --> 00:08:14,266
These designs are a type of optimal design.

115
00:08:14,266 --> 00:08:17,446
Let's quickly define the term "optimal", here.

116
00:08:18,246 --> 00:08:22,106
It means, that the experiments
selected, obey some sort of criterion,

117
00:08:22,376 --> 00:08:24,626
and they're optimized to meet that criterion.

118
00:08:25,526 --> 00:08:29,106
The great thing about an optimal design
is that they can be very flexible.

119
00:08:29,106 --> 00:08:34,686
For example, if you had a limited budget you
can create an optimal design for a given number

120
00:08:34,686 --> 00:08:40,366
of factors you are investigating to maximize one
of these optimality criteria to fit your budget.

121
00:08:41,686 --> 00:08:45,986
A computer algorithm is used to find the
settings for each one of the budgeted number

122
00:08:45,986 --> 00:08:49,676
of runs, so that the optimization
criterion is maximized.

123
00:08:50,156 --> 00:08:53,076
In other words the computer is
designing the experiments for you.

124
00:08:53,076 --> 00:08:55,886
And there's several of those criteria available.

125
00:08:55,886 --> 00:09:01,156
This is where the topic of experimental
design quickly becomes more mathematical

126
00:09:01,156 --> 00:09:02,566
than this course is intended for.

127
00:09:02,566 --> 00:09:07,596
So I'm going to leave you at reading this link
for more information, and you can quickly see

128
00:09:07,596 --> 00:09:12,096
that these modern, computer-created
designs, have some very distinct advantages.

129
00:09:14,166 --> 00:09:17,516
So on reflection this has been
a long module of the course.

130
00:09:17,546 --> 00:09:19,826
It is imperative that you work on case studies,

131
00:09:20,156 --> 00:09:23,476
and preferably with your own
data to solidify your knowledge.

132
00:09:23,476 --> 00:09:27,476
This can be a tough topic
to grasp, so don't be afraid

133
00:09:27,476 --> 00:09:31,436
to watch these videos several
times, and to ask questions.

134
00:09:32,196 --> 00:09:35,276
Working with fractional factorials
is a bit like playing with fire,

135
00:09:35,886 --> 00:09:37,996
the only way to learn is to burn your fingers.

136
00:09:37,996 --> 00:09:43,546
So go ahead, play with the fire, but preferably
on a system that has no painful penalty.

137
00:09:44,016 --> 00:09:49,776
Like making biscuits or trying out recipes
for good coffee, or preferably both.