1 00:00:03,120 --> 00:00:05,580 This class is all about blocking. 2 00:00:05,580 --> 00:00:08,730 The distinction between covariates and disturbances is in the prior 3 00:00:08,730 --> 00:00:12,320 class, was that covariates are measurable, and disturbances are not. 4 00:00:13,420 --> 00:00:15,030 Neither of them can be controlled. 5 00:00:16,320 --> 00:00:19,949 We're generally comfortable with the idea of control in our experiments. 6 00:00:21,270 --> 00:00:23,890 Variables that we can control, should be controlled. 7 00:00:24,940 --> 00:00:27,370 Remember that idea that you should always keep 8 00:00:27,370 --> 00:00:30,800 everything fixed, and only change what you're experimenting with? 9 00:00:31,820 --> 00:00:36,160 The idea of keeping things fixed is what we mean by the term "control". 10 00:00:37,850 --> 00:00:40,585 Now we're going to make a subtle distinction. 11 00:00:40,585 --> 00:00:45,030 There are variables you can control, and choose to actively vary. 12 00:00:45,030 --> 00:00:46,980 We call them "factors". 13 00:00:46,980 --> 00:00:50,020 You can control these variables, and you can measure them. 14 00:00:51,260 --> 00:00:55,090 Remember the cell phone app example from the previous class? 15 00:00:55,090 --> 00:00:58,010 Recall the variables from that example were Factor 16 00:00:58,010 --> 00:01:01,390 A was the type of marketing promotion used. 17 00:01:01,390 --> 00:01:06,500 Either a free in-app purchase, or a 30 day trial of all the features. 18 00:01:06,500 --> 00:01:12,930 Factor B was the marketing message and factor C was the in-app purchase price. 19 00:01:12,930 --> 00:01:17,980 All three of those were controlled and measurable where the term "measurable" is 20 00:01:17,980 --> 00:01:24,040 used fairly loosely as a way to say that you can quantify the value of your factor. 21 00:01:24,040 --> 00:01:27,110 Take a minute now, to think about the experiments, 22 00:01:27,110 --> 00:01:30,540 and the factors you have varied, during this course. 23 00:01:30,540 --> 00:01:33,600 Are your factors actually controllable? 24 00:01:33,600 --> 00:01:37,250 Can you measure or quantify all your factors? 25 00:01:37,250 --> 00:01:40,060 If you have any doubt, it's a good time to consult 26 00:01:40,060 --> 00:01:43,380 with your colleagues and post a message in the course forums. 27 00:01:44,670 --> 00:01:46,190 Now here's the subtle distinction. 28 00:01:47,550 --> 00:01:51,200 What if you had a factor you can control, it varies during 29 00:01:51,200 --> 00:01:53,960 the experiments, but the factor isn't 30 00:01:53,960 --> 00:01:57,360 actually the main focus of your experiments? 31 00:01:57,360 --> 00:01:59,320 These are called nuisance factors. 32 00:01:59,320 --> 00:02:01,810 Lets take a look at the cell phone example again. 33 00:02:01,810 --> 00:02:07,220 The end user could be on an Apple or Android operating system. 34 00:02:07,220 --> 00:02:10,540 You could control this, because you can select 35 00:02:10,540 --> 00:02:14,990 only Apple or Android users during your experiments. 36 00:02:14,990 --> 00:02:18,750 In fact, this could have been another factor. 37 00:02:18,750 --> 00:02:21,570 You might have called it Factor D, but this 38 00:02:21,570 --> 00:02:26,090 really isn't expected to be a significant factor of interest. 39 00:02:26,090 --> 00:02:29,360 It's not what the aim of your experiments are about. 40 00:02:29,360 --> 00:02:31,670 We call this a "nuisance factor". 41 00:02:31,670 --> 00:02:36,500 Now from the previous video, you learned that you must randomize your experiments. 42 00:02:36,500 --> 00:02:40,030 If you go run all your Apple experiments first, and then all 43 00:02:40,030 --> 00:02:42,990 your Android experiments after, you've actually 44 00:02:42,990 --> 00:02:45,430 confounded your variables on this factor D. 45 00:02:47,186 --> 00:02:50,830 Ultimately, you would want successful sales, no matter 46 00:02:50,830 --> 00:02:53,720 whether you have Apple or Android users, but you 47 00:02:53,720 --> 00:02:57,680 must intentionally plan your experiments ahead of time, to 48 00:02:57,680 --> 00:03:01,630 avoid this nuisance factor from having a confounding effect. 49 00:03:01,630 --> 00:03:02,730 We call this "blocking". 50 00:03:03,835 --> 00:03:08,060 There're a number of instances in which you want to block for nuisance variables. 51 00:03:09,180 --> 00:03:14,710 In our baking example, imagine a situation where you are going to run out of flour. 52 00:03:14,710 --> 00:03:17,160 You can do half your experiments on one brand of 53 00:03:17,160 --> 00:03:20,460 flour, and half the experiments with another brand of flour. 54 00:03:22,630 --> 00:03:25,780 Another case might be experiments in a factory. 55 00:03:25,780 --> 00:03:28,600 Half the experiments are done during the day shift, 56 00:03:28,600 --> 00:03:31,960 and the other half is done during the night shift. 57 00:03:31,960 --> 00:03:34,075 Or if you're testing gas mileage in a 58 00:03:34,075 --> 00:03:37,086 car, you might have one driver and another driver. 59 00:03:37,086 --> 00:03:39,818 60 00:03:39,818 --> 00:03:43,199 People sometimes find blocking tricky to understand. 61 00:03:43,199 --> 00:03:45,300 And here's one way I deal with it. 62 00:03:45,300 --> 00:03:50,130 I ask myself, is my system, or process, going to have to 63 00:03:50,130 --> 00:03:55,140 successfully work with different values of this nuisance variable in the future? 64 00:03:56,350 --> 00:04:00,340 If the answer is yes, then I design with blocking in mind. 65 00:04:02,240 --> 00:04:05,920 If I don't design my experiments with blocking in mind, then that 66 00:04:05,920 --> 00:04:08,680 nuisance variable might affect the outcome, 67 00:04:08,680 --> 00:04:10,770 and I won't really understand what's happened. 68 00:04:12,200 --> 00:04:15,750 If the answer to the question is "no", that means that I've 69 00:04:15,750 --> 00:04:20,310 got good control over the system, and I can avoid the nuisance variable. 70 00:04:22,600 --> 00:04:25,300 Now that we've discussed the need for blocking, let's 71 00:04:25,300 --> 00:04:28,271 see how to plan experiments where there are 2 blocks. 72 00:04:29,370 --> 00:04:32,570 And the general rule for this situation is to 73 00:04:32,570 --> 00:04:35,480 add a new factor to your standard order table. 74 00:04:36,570 --> 00:04:39,950 We already have 3 factors, A, B, and C. 75 00:04:39,950 --> 00:04:43,060 So that's 8 experiments in a full factorial. 76 00:04:43,060 --> 00:04:49,090 Now consider adding the new nuisance factor as a new variable, D, to the table: 77 00:04:49,090 --> 00:04:51,170 Apple versus Android. 78 00:04:51,170 --> 00:04:54,210 This factor has two levels; minus and plus. 79 00:04:55,620 --> 00:04:58,770 How do we go about picking which experiments to assign to our 80 00:04:58,770 --> 00:05:03,110 Apple phone users, and which experiments to assign to our Android users? 81 00:05:04,240 --> 00:05:05,390 Here's a hint. 82 00:05:05,390 --> 00:05:08,131 If we were to do a full set of experiments in four 83 00:05:08,131 --> 00:05:13,880 factors, we would've required two to the power four, or 16, experiments. 84 00:05:13,880 --> 00:05:16,990 Instead, we're doing eight experiments shown here. 85 00:05:16,990 --> 00:05:19,670 Eight is half of sixteen, and so there's no 86 00:05:19,670 --> 00:05:24,200 surprise that this is effectively generated using a half fraction. 87 00:05:25,560 --> 00:05:27,420 Once you understand the principle of half 88 00:05:27,420 --> 00:05:30,190 fractions from the prior class, you'll perhaps 89 00:05:30,190 --> 00:05:36,250 intuitively see that you assign factor D using the table we showed last time. 90 00:05:36,250 --> 00:05:42,410 That factor D is generated as the product of A times B times C. 91 00:05:42,410 --> 00:05:44,230 Here's the interesting part: 92 00:05:44,230 --> 00:05:47,950 once you've generated Factor D in this way, we set all the minus 93 00:05:47,950 --> 00:05:52,760 sign experiments to Android users, and all the plus sign experiments to Apple users. 94 00:05:53,885 --> 00:05:55,590 Let's visualize this on a cube plot. 95 00:05:56,740 --> 00:06:02,870 The closed circles are the Android users, and the open circles are the Apple users. 96 00:06:02,870 --> 00:06:04,420 Doesn't this plot look familiar to you? 97 00:06:06,310 --> 00:06:10,260 Now let's go take a look at the reasons for assigning the experiments in this way. 98 00:06:11,460 --> 00:06:15,610 Imagine that there really is an effect that Android users are more 99 00:06:15,610 --> 00:06:19,680 likely to keep using your app, and that Apple users are less likely. 100 00:06:20,740 --> 00:06:23,760 We can imagine that our outcome variable 101 00:06:23,760 --> 00:06:27,410 for Android users is boosted by a consistent 102 00:06:27,410 --> 00:06:32,480 amount "g", and I will put a small tilde over these numbers to indicate that. 103 00:06:33,800 --> 00:06:37,310 Experiments involving Apple users are reduced by some 104 00:06:37,310 --> 00:06:40,910 amount "h", where "h" is a negative number. 105 00:06:40,910 --> 00:06:44,340 And I'll add a small circle above their outcomes. 106 00:06:44,340 --> 00:06:48,275 Remember how we calculated the main effects? 107 00:06:48,275 --> 00:06:49,343 High minus low. 108 00:06:49,343 --> 00:06:50,213 High minus low. 109 00:06:50,213 --> 00:06:52,240 And high minus low. 110 00:06:52,240 --> 00:06:54,430 And then we average the answer. 111 00:06:54,430 --> 00:06:57,710 Well for the main effect of A, we notice that there's 112 00:06:57,710 --> 00:07:02,660 an equal number of additions with a tilde, as there are subtractions. 113 00:07:02,660 --> 00:07:04,110 The same for the circles. 114 00:07:04,110 --> 00:07:07,820 Two positive circles, and two negative circles. 115 00:07:07,820 --> 00:07:12,130 From a practical point of view, this implies that any bias due 116 00:07:12,130 --> 00:07:16,608 to Android users and bias due to Apple users will cancel out. 117 00:07:16,608 --> 00:07:21,530 So our main effect of A will be well estimated and 118 00:07:21,530 --> 00:07:27,090 not affected by any differences that exist between Apple versus Android users. 119 00:07:27,090 --> 00:07:28,800 That's a desirable requirement. 120 00:07:30,330 --> 00:07:33,600 In fact, every parameter in the model will be well 121 00:07:33,600 --> 00:07:38,510 estimated without bias, except for the three factor interaction, ABC. 122 00:07:40,100 --> 00:07:43,160 That parameter is badly estimated. 123 00:07:43,160 --> 00:07:45,500 All the tilde's have minuses, all the 124 00:07:45,500 --> 00:07:49,200 circles have pluses, so no cancellation occurs then. 125 00:07:50,260 --> 00:07:54,150 In fact, the ABC three-factor interaction is 126 00:07:54,150 --> 00:07:58,230 confounded with this D effect of the nuisance variable. 127 00:07:58,230 --> 00:08:04,050 That was intentional, and it is usually the best course of action in most cases. 128 00:08:04,050 --> 00:08:08,200 We often expect that our three factor interaction to be negligible. 129 00:08:08,200 --> 00:08:12,230 So we have sacrificed our three factor interaction here, in order 130 00:08:12,230 --> 00:08:17,290 to minimize the effect that the nuisance variable has on our system. 131 00:08:17,290 --> 00:08:18,840 Though it's outside the scope of this 132 00:08:18,840 --> 00:08:22,580 course, more complex blocking schemes are possible. 133 00:08:22,580 --> 00:08:25,760 For example, if we were testing Apple, Android, 134 00:08:25,760 --> 00:08:28,680 and Blackberry users, we would have three levels. 135 00:08:28,680 --> 00:08:32,320 In our baking example, if we were using large quantities 136 00:08:32,320 --> 00:08:37,240 of flour, we might have had 4 different flour suppliers. 137 00:08:37,240 --> 00:08:39,710 Experiments in a factory might rely on 138 00:08:39,710 --> 00:08:42,680 3 shifts, a morning, afternoon and evening shift. 139 00:08:43,745 --> 00:08:47,240 There're cases where there's blocking with more than two groups. 140 00:08:47,240 --> 00:08:52,040 And what happens is, we simply create extra blocking factors in the design. 141 00:08:53,285 --> 00:08:56,880 There are tables in many of the statistics textbooks that 142 00:08:56,880 --> 00:09:00,600 show how to generate these blocks in an optimal way. 143 00:09:00,600 --> 00:09:02,640 I'll leave that for you to explore on your own. 144 00:09:04,400 --> 00:09:07,680 Let's return to our two-level blocking designs. 145 00:09:07,680 --> 00:09:13,140 The general rule that you can remember for this case with two blocks, is that you add 146 00:09:13,140 --> 00:09:16,620 a new factor to the system, and generate the 147 00:09:16,620 --> 00:09:19,690 design as if you were running a half fraction.