1 00:00:00,036 --> 00:00:03,656 So let's look at an example to end this module. 2 00:00:04,146 --> 00:00:07,766 We said in the prior video that you should always include as many factors 3 00:00:07,766 --> 00:00:10,906 as you possibly can in a set of experiments. 4 00:00:10,906 --> 00:00:13,016 Do you remember why we recommend that? 5 00:00:13,476 --> 00:00:16,056 If not, please review the prior video again. 6 00:00:16,056 --> 00:00:19,876 In this example we are going to use 7 factors, 7 00:00:20,256 --> 00:00:23,486 and the fewest possible experiments; that's 8 experiments. 8 00:00:23,776 --> 00:00:28,386 We are going to screen out which of those 7 factors really affect our outcome. 9 00:00:29,836 --> 00:00:33,806 So it is a screening design with 8 experiments and a resolution of III. 10 00:00:34,416 --> 00:00:38,386 I could choose more experiments, and then go to higher and higher resolutions. 11 00:00:38,876 --> 00:00:43,056 But let's see what happens when we start with just eight experiments and seven factors. 12 00:00:45,776 --> 00:00:51,556 With eight experiments, we have factors A, B and C to form a full factorial in eight rows. 13 00:00:52,166 --> 00:00:55,976 The tradeoff table tells us to generate factors D, E, 14 00:00:56,106 --> 00:01:01,216 F and G. Now notice that this is a 2^{7 - 4} design. 15 00:01:01,326 --> 00:01:04,326 So this design has p=4. 16 00:01:04,786 --> 00:01:10,536 These 4 generators, can be used to create the columns for the remaining factors in my system. 17 00:01:11,716 --> 00:01:13,156 And here's the completed table. 18 00:01:13,496 --> 00:01:17,226 I can go ahead and run the experiments and start my analysis. 19 00:01:17,226 --> 00:01:20,696 But the whole purpose of the tools introduced in this module is all 20 00:01:20,696 --> 00:01:24,416 about checking your aliasing before you start the analysis. 21 00:01:24,416 --> 00:01:25,226 Let's go do that. 22 00:01:29,386 --> 00:01:32,316 Our 4 generators are rearranged over here. 23 00:01:33,496 --> 00:01:37,566 I equals ABD, I equals ACE, and so on. 24 00:01:38,586 --> 00:01:40,846 How many words in our defining relationship? 25 00:01:41,306 --> 00:01:47,496 Two to the power of p and with p=4 in this case, that equals 16 words. 26 00:01:48,086 --> 00:01:51,076 That's a lot of words to figure out, but let's give it a try. 27 00:01:51,076 --> 00:01:53,416 The first few words are easy. 28 00:01:54,106 --> 00:02:05,086 Take the rearranged generators individually: I = ABD = ACE = BCF = ABCG That's 5 of them. 29 00:02:06,026 --> 00:02:14,406 Now we can add to that the combinations two at a time: (ABD)(ACE) = BCDE. 30 00:02:15,276 --> 00:02:18,896 The next combination two at a time is: (ABD)(BCF) = ACDF. 31 00:02:18,896 --> 00:02:26,096 You can prove to yourself that those are the remaining four (CDG, ABEF, BEG, AFG). 32 00:02:26,096 --> 00:02:29,316 Now we've got 11 words so far in our defining relationship. 33 00:02:29,996 --> 00:02:32,886 The next step is to take our generators three at a time: 34 00:02:33,766 --> 00:02:42,586 (ABD)(ACE)(BCF) = DEF Try the next three (ADEG, CEFG, BDFG). 35 00:02:42,586 --> 00:02:45,156 So, there we have a total of 15. 36 00:02:45,156 --> 00:02:50,446 And the final combination is to use all four generators multiplied together. 37 00:02:51,036 --> 00:02:54,386 And that simplifies to ABCDEFG. 38 00:02:54,386 --> 00:02:57,356 So, here's our complete defining relationship. 39 00:02:57,356 --> 00:03:03,856 Now, let's go try and calculate the aliasing for factor A. If we go and do that, 40 00:03:04,026 --> 00:03:06,496 we get this very long expression over here. 41 00:03:07,166 --> 00:03:10,836 I've highlighted only the two-factor interactions that are confounded 42 00:03:10,836 --> 00:03:14,506 with the main effect of A. I can create this list of aliases 43 00:03:14,506 --> 00:03:16,806 for the seven main effects in my design. 44 00:03:17,586 --> 00:03:22,716 This illustrates the tremendous confounding that takes place in the very dense designs 45 00:03:22,716 --> 00:03:25,176 at the far right-hand side of the trade-off table. 46 00:03:25,966 --> 00:03:32,126 Remember, instead of doing two to the seven, which equals 128 experiments, we've done 8. 47 00:03:32,816 --> 00:03:35,876 There's going to be a steep price to pay for this reduction in work. 48 00:03:35,876 --> 00:03:39,276 Now let's go and look at the numbers from the outcome variable, 49 00:03:39,356 --> 00:03:42,076 and how to continue on with the analysis. 50 00:03:42,246 --> 00:03:47,596 And as you'll see, and this is very typical, the analysis goes much quicker than the planning. 51 00:03:49,586 --> 00:03:52,366 Here's the code that you can use to analyze this design. 52 00:03:52,986 --> 00:03:54,926 Please copy and paste it from the website. 53 00:03:55,646 --> 00:03:58,976 We recommend that you always clear your environment from prior work. 54 00:03:58,976 --> 00:04:02,206 This is because you might have a variable with the same name 55 00:04:02,206 --> 00:04:05,146 from a different analysis; this will avoid any confusion. 56 00:04:06,266 --> 00:04:10,416 Build the linear model in exactly the same way as you created the design on paper. 57 00:04:10,416 --> 00:04:16,376 First, define the three variables that you start with: A, B, and C. Next, 58 00:04:16,666 --> 00:04:21,006 generate the remaining four factors using the definitions from the tradeoff table. 59 00:04:21,636 --> 00:04:26,206 When you inspect these variables in the console, you should get exactly what you had on paper. 60 00:04:26,936 --> 00:04:30,906 Now, add the outcome values recorded for the eight experiments. 61 00:04:30,906 --> 00:04:33,106 I'm going to take them from the standard order table. 62 00:04:33,966 --> 00:04:36,216 When you are ready to visualize your linear model, 63 00:04:36,596 --> 00:04:39,336 load the PID package, using the "library" command. 64 00:04:40,016 --> 00:04:43,356 You would have installed this package if you had been following prior videos. 65 00:04:43,936 --> 00:04:47,266 I will quickly note that R packages are frequently updated. 66 00:04:47,726 --> 00:04:50,836 You should check for updates regularly, as demonstrated here. 67 00:04:50,836 --> 00:04:55,116 So use the "paretoPlot(...)" command and let's examine the output. 68 00:04:58,906 --> 00:05:04,286 We can see here that the factors C, A and G are significant and have a negative, 69 00:05:04,416 --> 00:05:06,476 reducing effect, on the outcome variable. 70 00:05:06,526 --> 00:05:08,996 Factor E is a little smaller. 71 00:05:09,556 --> 00:05:14,526 And factors B, D and F have small to negligible coefficients. 72 00:05:15,486 --> 00:05:21,206 Note however, when we say factor A up here is important, it is really A that is aliased 73 00:05:21,206 --> 00:05:24,276 with a variety of two factor and higher interactions. 74 00:05:25,276 --> 00:05:30,506 As long as the assumption is true that those two factor and higher order interactions are small, 75 00:05:30,506 --> 00:05:38,286 or zero, then that bar in the Pareto plot essentially represents the effect of A. What 76 00:05:38,286 --> 00:05:41,126 about the unimportance of small effects down here? 77 00:05:41,866 --> 00:05:44,006 They can be removed -- judiciously. 78 00:05:44,536 --> 00:05:47,646 As long as you are confident that when you varied factor B, 79 00:05:47,986 --> 00:05:52,316 you did so over a large enough range to affect the outcome variable meaningfully, 80 00:05:52,596 --> 00:05:55,276 then you can be sure that this Pareto plot shows 81 00:05:55,276 --> 00:05:58,586 that factor B really has no significant effect on the outcome. 82 00:05:59,136 --> 00:06:00,346 It is safe to remove it. 83 00:06:01,266 --> 00:06:04,866 In other words, we have screened factor B out of consideration. 84 00:06:06,356 --> 00:06:10,186 So let's go remove factors B, F, and D for those reasons. 85 00:06:10,906 --> 00:06:15,676 By removing these three factors, we've reduced ourselves from 7 to 4 factors, 86 00:06:16,106 --> 00:06:19,066 but we've still have done eight experiments. 87 00:06:19,066 --> 00:06:23,926 We might as well have done the experiments with only factors A, C, G and E present. 88 00:06:24,856 --> 00:06:28,766 Note however that we do not have to redo the experiments. 89 00:06:28,766 --> 00:06:33,956 If you refit the model in R with only these four factors you get exactly the same coefficients 90 00:06:33,956 --> 00:06:34,436 as before. 91 00:06:34,436 --> 00:06:39,066 This is due to the independence property that's built into the model's design. 92 00:06:40,016 --> 00:06:43,316 Those of you with a least-squares background, will recognize that the columns 93 00:06:43,316 --> 00:06:49,566 in this matrix are independent, so when you rebuild the model you will get the same results. 94 00:06:49,566 --> 00:06:52,666 So, there's that; we've essentially found ourselves a system 95 00:06:52,666 --> 00:06:55,326 with four factors in eight experiments. 96 00:06:55,326 --> 00:06:57,886 We've eliminated three unimportant variables, 97 00:06:57,886 --> 00:07:00,686 as we've learned that they have little effect on our outcome. 98 00:07:01,296 --> 00:07:04,786 We have retained four important factors that we know affect our outcome. 99 00:07:04,786 --> 00:07:09,986 We will see in the following module that we can focus our future attention 100 00:07:09,986 --> 00:07:13,116 on these important factors now, to optimize the system. 101 00:07:14,816 --> 00:07:16,506 So that's the end of this module. 102 00:07:16,506 --> 00:07:21,486 For advanced students I do want to point out two other reduced designs. 103 00:07:22,096 --> 00:07:28,336 The first, is a Plackett-Burman design, the regular tradeoff table shows 104 00:07:28,336 --> 00:07:33,286 that you can do 4, 8, 16, 32, 64, and so on runs. 105 00:07:34,186 --> 00:07:37,306 But what if you had a budget, for example for 24 runs. 106 00:07:37,486 --> 00:07:40,506 That's more than 16 but not quite enough for 32. 107 00:07:41,536 --> 00:07:45,996 Well Placket-Burman design works well for these cases where you have a budget that is a multiple 108 00:07:45,996 --> 00:07:49,196 of four but not one of the existing powers in the table. 109 00:07:49,196 --> 00:07:52,796 So a budget of 20, 24, 28, and so on. 110 00:07:54,126 --> 00:07:57,146 I'm not going to go into the details of the Placket-Burman design, 111 00:07:57,576 --> 00:08:00,956 but now that you know the terminology, you can go search for more information. 112 00:08:00,956 --> 00:08:05,896 The final type of design to be aware of is a class of designs called the 113 00:08:05,896 --> 00:08:10,716 "Definitive Screening Design", and here's a link that you can read up some more information. 114 00:08:11,836 --> 00:08:14,266 These designs are a type of optimal design. 115 00:08:14,266 --> 00:08:17,446 Let's quickly define the term "optimal", here. 116 00:08:18,246 --> 00:08:22,106 It means, that the experiments selected, obey some sort of criterion, 117 00:08:22,376 --> 00:08:24,626 and they're optimized to meet that criterion. 118 00:08:25,526 --> 00:08:29,106 The great thing about an optimal design is that they can be very flexible. 119 00:08:29,106 --> 00:08:34,686 For example, if you had a limited budget you can create an optimal design for a given number 120 00:08:34,686 --> 00:08:40,366 of factors you are investigating to maximize one of these optimality criteria to fit your budget. 121 00:08:41,686 --> 00:08:45,986 A computer algorithm is used to find the settings for each one of the budgeted number 122 00:08:45,986 --> 00:08:49,676 of runs, so that the optimization criterion is maximized. 123 00:08:50,156 --> 00:08:53,076 In other words the computer is designing the experiments for you. 124 00:08:53,076 --> 00:08:55,886 And there's several of those criteria available. 125 00:08:55,886 --> 00:09:01,156 This is where the topic of experimental design quickly becomes more mathematical 126 00:09:01,156 --> 00:09:02,566 than this course is intended for. 127 00:09:02,566 --> 00:09:07,596 So I'm going to leave you at reading this link for more information, and you can quickly see 128 00:09:07,596 --> 00:09:12,096 that these modern, computer-created designs, have some very distinct advantages. 129 00:09:14,166 --> 00:09:17,516 So on reflection this has been a long module of the course. 130 00:09:17,546 --> 00:09:19,826 It is imperative that you work on case studies, 131 00:09:20,156 --> 00:09:23,476 and preferably with your own data to solidify your knowledge. 132 00:09:23,476 --> 00:09:27,476 This can be a tough topic to grasp, so don't be afraid 133 00:09:27,476 --> 00:09:31,436 to watch these videos several times, and to ask questions. 134 00:09:32,196 --> 00:09:35,276 Working with fractional factorials is a bit like playing with fire, 135 00:09:35,886 --> 00:09:37,996 the only way to learn is to burn your fingers. 136 00:09:37,996 --> 00:09:43,546 So go ahead, play with the fire, but preferably on a system that has no painful penalty. 137 00:09:44,016 --> 00:09:49,776 Like making biscuits or trying out recipes for good coffee, or preferably both.