1 00:00:02,546 --> 00:00:06,316 In today's class, my goal is to show you how we analyze data, 2 00:00:06,486 --> 00:00:09,466 from factorial experiments when three factors were used. 3 00:00:09,676 --> 00:00:14,856 This example, is based on one from this textbook by Box, Hunter and Hunter, 4 00:00:15,216 --> 00:00:17,176 called "Statistics for Experimenters". 5 00:00:18,056 --> 00:00:23,286 The experiments described in that example, were run to find the combination of settings 6 00:00:23,416 --> 00:00:27,786 that would reduce the amount of pollution discharged from the water treatment facility. 7 00:00:29,026 --> 00:00:33,226 This is clearly a case where we would like to minimize the amount of pollutant. 8 00:00:33,916 --> 00:00:38,286 So, minimizing our outcome variable would be the objective. 9 00:00:38,476 --> 00:00:40,736 Three factors were considered. 10 00:00:41,066 --> 00:00:45,406 The first one, factor C, was the chemical compound used. 11 00:00:45,686 --> 00:00:50,256 Let's call that compound P, and compound Q. We don't really know their names. 12 00:00:51,726 --> 00:00:54,926 Factor T, was the temperature of the treatment. 13 00:00:55,416 --> 00:00:59,146 Whether we were treating the water at 72 or 100 Fahrenheit. 14 00:01:00,036 --> 00:01:04,106 And factor S, was the stirring speed either a slow speed 15 00:01:04,106 --> 00:01:08,156 of 200 revolutions per minute, or a high speed of 400. 16 00:01:09,066 --> 00:01:13,966 Notice that every factor has two levels, and going back to that mathematical idea, 17 00:01:14,466 --> 00:01:20,556 that two to the power "k", is the total number of experiments; "k" is equal to three 18 00:01:20,556 --> 00:01:25,046 in this example, so we get a total of eight possible combinations. 19 00:01:25,926 --> 00:01:28,136 Here's a short quiz to test that knowledge. 20 00:01:29,846 --> 00:01:31,346 So let's take a look at the results. 21 00:01:31,876 --> 00:01:37,146 We will always present our data, and analyze it using what we call "standard order". 22 00:01:38,236 --> 00:01:42,436 Standard order, requires that we create a column for each of our factors. 23 00:01:42,436 --> 00:01:48,976 So C, S, and T. Note that I could have used A, B, and C for the three factors, 24 00:01:49,556 --> 00:01:52,656 but very often we'll switch to letters that actually match 25 00:01:52,656 --> 00:01:54,646 of our factor names, but you don't have to. 26 00:01:55,676 --> 00:01:57,346 So back to the standard order table. 27 00:01:58,316 --> 00:02:08,106 And the rule is, we vary the first factor the fastest: - + - + - + - + The second factor, 28 00:02:08,106 --> 00:02:13,226 temperature, is varied the next fastest, between its low and high levels. 29 00:02:13,296 --> 00:02:17,156 So two minuses, two pluses, two minuses, two pluses. 30 00:02:17,386 --> 00:02:21,336 And then the last factor S, is varied the slowest. 31 00:02:21,636 --> 00:02:25,356 So four low levels (-) and four high levels (+). 32 00:02:25,416 --> 00:02:27,046 Those make up our entire table. 33 00:02:31,516 --> 00:02:34,696 Never run the experiments in the order of this table. 34 00:02:35,466 --> 00:02:37,736 The order must be randomly selected. 35 00:02:38,486 --> 00:02:42,606 So what we will do, is add a column to our table to keep track of the order 36 00:02:42,606 --> 00:02:45,836 in which we actually ran the experiments. 37 00:02:45,836 --> 00:02:48,506 Also add a column over here for the outcome variable. 38 00:02:49,066 --> 00:02:52,956 In this case, the outcome was the pollutant amount, measured in pounds. 39 00:02:54,516 --> 00:02:58,906 One thing that's so great about the standard order table, is that we can get a quick sense 40 00:02:58,906 --> 00:03:01,816 of the factor's influence on the outcome variable. 41 00:03:03,216 --> 00:03:06,396 Take a look, for example, at how the pollution amounts changes, 42 00:03:06,636 --> 00:03:10,636 when we change the chemical compound, factor C. That factor goes low, 43 00:03:10,636 --> 00:03:12,806 high, low, high, low, high, low, high. 44 00:03:13,906 --> 00:03:17,946 We see that same pattern in the pollution amounts. 45 00:03:17,946 --> 00:03:23,706 Take a look at the effect of factor S, the first four experiments, have a very high level 46 00:03:23,706 --> 00:03:28,746 of pollution on average, while the last four experiments have a low level of pollution. 47 00:03:29,726 --> 00:03:37,566 That also matches with factor S. We can already tell, just from this table, that factor C 48 00:03:37,826 --> 00:03:42,386 and factor S are going to be really important to understanding the results. 49 00:03:44,026 --> 00:03:45,786 Let's go back to our cube plot. 50 00:03:46,776 --> 00:03:49,256 And this time, our cube plot is actually a cube. 51 00:03:50,096 --> 00:03:53,726 We can draw it by showing the first factor along the horizontal axis. 52 00:03:54,016 --> 00:03:59,046 The next factor on the vertical axis and the final factor, S, is shown in 53 00:03:59,046 --> 00:04:01,186 and out of the page in this diagonal way. 54 00:04:02,196 --> 00:04:05,246 Next, we transcribe the values onto this cube. 55 00:04:06,576 --> 00:04:09,886 This is really easy when we follow the standard order sequence. 56 00:04:10,116 --> 00:04:14,116 Take a look: 5, 30, 6, 33. 57 00:04:14,786 --> 00:04:17,446 Then 4, 3, 5, and 4. 58 00:04:18,676 --> 00:04:21,466 I love this visual representation of the experimental data. 59 00:04:21,996 --> 00:04:24,946 It really helps us achieve our objective so quickly. 60 00:04:25,816 --> 00:04:27,856 Take a few seconds and answer this question. 61 00:04:28,816 --> 00:04:34,446 At what levels, should we set up three factors in order to achieve the lowest pollution amount? 62 00:04:35,716 --> 00:04:36,316 That's right. 63 00:04:36,566 --> 00:04:41,506 It's very clear we need to use chemical Q, operate at a low temperature, 64 00:04:42,156 --> 00:04:45,616 and with high stirring speeds of 400 revolutions per minute. 65 00:04:46,136 --> 00:04:47,726 Later on in the course, we're going 66 00:04:47,726 --> 00:04:50,976 to start examining what happens when you move outside this cube. 67 00:04:51,336 --> 00:04:54,196 And I want you to already start to think along those lines. 68 00:04:55,526 --> 00:04:57,926 But let's come back to the data we have right here, 69 00:04:58,256 --> 00:05:01,526 and analyze the main effects and the interactions. 70 00:05:02,716 --> 00:05:05,336 Start with a first factor C. The choice 71 00:05:05,336 --> 00:05:08,876 of either chemical P, or chemical Q at the high level. 72 00:05:10,166 --> 00:05:15,196 If we look at the cube, we actually have four estimates of that main effect, 73 00:05:15,496 --> 00:05:18,126 along each of the four horizontal edges. 74 00:05:18,886 --> 00:05:23,196 At high temperature, and high stirring speed, in other words high T 75 00:05:23,196 --> 00:05:27,236 and high S, that effect is equal to 4 - 5. 76 00:05:28,206 --> 00:05:33,346 At high temperature and low speed, that's 33 - 6. 77 00:05:34,416 --> 00:05:41,336 At T- and high speed, in other words S+, it is 3 - 4. 78 00:05:42,066 --> 00:05:46,896 And finally, at low temperature and low speed, it's 30 - 5. 79 00:05:48,346 --> 00:05:51,576 So four estimates of the effect of the chemical. 80 00:05:52,736 --> 00:05:59,996 And the average of these four, is equal to 50 divided by 4, equal to 12.5. 81 00:05:59,996 --> 00:06:00,746 Let's pause here. 82 00:06:00,746 --> 00:06:04,206 I always tell my students, it's no good just calculating numbers. 83 00:06:04,336 --> 00:06:07,856 What does this value of 12.5 really mean, in plain language? 84 00:06:08,396 --> 00:06:10,676 How would you describe this value to your manager 85 00:06:10,676 --> 00:06:13,976 who doesn't really understand any statistics? 86 00:06:13,976 --> 00:06:21,216 What it says, is on average, we expect an increase in the pollution amount when we go 87 00:06:21,216 --> 00:06:25,936 from using chemical P to using chemical Q. And remember, 88 00:06:25,936 --> 00:06:28,496 by convention, we report half of that amount. 89 00:06:28,886 --> 00:06:31,266 So report a value of 6.25. 90 00:06:32,356 --> 00:06:37,346 One further thing to notice, is that the discrepancy of that chemical effect at high S, 91 00:06:37,406 --> 00:06:40,796 and low S. Notice the very large difference there. 92 00:06:41,606 --> 00:06:47,726 From the prior class number 2C, this should be alerting to you that there's an interaction 93 00:06:47,726 --> 00:06:53,846 between factor C and factor S. But before we get to that, let's take a look at temperature. 94 00:06:55,086 --> 00:06:56,866 When we examined the table earlier, 95 00:06:57,326 --> 00:07:00,226 we didn't really notice anything special about temperature. 96 00:07:00,746 --> 00:07:02,916 And we should be able to confirm that, numerically. 97 00:07:04,306 --> 00:07:07,856 We've four estimates of the temperature effect along the vertical axis. 98 00:07:08,176 --> 00:07:16,866 4 - 3 here; 33 - 30 up here; 5 - 4 up there; and 6 - 5 here up at the front. 99 00:07:17,756 --> 00:07:21,376 So on average, we get a value of 1.5 as our difference. 100 00:07:21,526 --> 00:07:25,616 Or if we report half of it, that's an effect of 0.75. 101 00:07:27,126 --> 00:07:33,796 Lastly, let's take a look at the effect of stirring speed S. Along the four diagonal axis, 102 00:07:33,796 --> 00:07:42,456 we have 4 - 33; up here, 3 - 30 down here; 5 - 6 here; and 4 - 5 over there. 103 00:07:43,686 --> 00:07:47,636 The average of those differences, is -14.5. 104 00:07:48,376 --> 00:07:51,986 And if we report half of it, that's -7.25. 105 00:07:52,656 --> 00:08:01,426 The -14.5 tells us that we expect, on average, a reduction of 14.5 pounds of pollution, 106 00:08:01,926 --> 00:08:05,316 when we go from a low stirring speed to a high stirring speed. 107 00:08:05,316 --> 00:08:09,756 So clearly, it's in our favour to use high stirring speeds 108 00:08:10,056 --> 00:08:11,966 in order to get that reduced pollution. 109 00:08:12,996 --> 00:08:16,676 You should always step back at this point and make sure these results make sense. 110 00:08:17,346 --> 00:08:22,866 Horizontally, we see going from chemical P to Q increases the pollution amounts. 111 00:08:23,816 --> 00:08:26,426 That value of 6.25 looks about right. 112 00:08:27,296 --> 00:08:30,026 A small value of 0.75 for temperature, 113 00:08:30,026 --> 00:08:33,676 also looks right because it really has a very small effect. 114 00:08:34,166 --> 00:08:38,516 And finally, increasing the stirring speed has the largest reduction 115 00:08:38,516 --> 00:08:42,296 on pollution: a decrease of 7.25 units. 116 00:08:42,906 --> 00:08:46,416 You noticed while I was reviewing these results, I started to build 117 00:08:46,416 --> 00:08:49,146 up a numeric representation for you on the screen. 118 00:08:50,146 --> 00:08:52,726 We did that in the class where we considered the ginger biscuits, 119 00:08:52,726 --> 00:08:54,716 and I just followed the same idea here. 120 00:08:55,626 --> 00:08:58,666 "y", represents the prediction of the pollution. 121 00:08:59,686 --> 00:09:02,686 The "11.25" value here is the baseline. 122 00:09:03,706 --> 00:09:07,036 It is the average of all eight of the outcome values, 123 00:09:07,436 --> 00:09:13,236 (5 + 30 + 6 + 33 + 4 + 3 + 5 + 4) divided by eight. 124 00:09:14,306 --> 00:09:18,426 The other three terms, are the separate effects of each factor. 125 00:09:18,796 --> 00:09:21,026 Those are the main effects. 126 00:09:21,026 --> 00:09:23,776 Let's see how we can use this model to make some predictions. 127 00:09:24,826 --> 00:09:28,156 Consider the situation, where we were using chemical Q, 128 00:09:28,866 --> 00:09:32,386 in other words x_C is coded as a value of +1. 129 00:09:33,366 --> 00:09:34,726 Let's use low temperature. 130 00:09:35,356 --> 00:09:38,046 So x_T, is coded as -1. 131 00:09:39,006 --> 00:09:41,136 And also let's use low steering speed. 132 00:09:41,346 --> 00:09:43,186 So x_S, is -1. 133 00:09:44,446 --> 00:09:53,346 The predicted value, is 11.25 + 6.25 - 0.75 + 7.25. 134 00:09:53,966 --> 00:09:55,536 That's a value of 24. 135 00:09:56,606 --> 00:10:00,966 That's quite a bit different to the value of 30 pounds which was actually recorded. 136 00:10:01,506 --> 00:10:03,586 There's something we haven't accounted for. 137 00:10:03,966 --> 00:10:08,746 And that's the interaction between C and S. An interaction is 138 00:10:08,746 --> 00:10:11,506 when you have one variable behaving very differently, 139 00:10:11,846 --> 00:10:13,916 depending on the level of another variable. 140 00:10:15,136 --> 00:10:22,116 We noticed earlier that the chemical effect, has a change from 30 to 5 and 33 to 6 over here 141 00:10:22,116 --> 00:10:25,086 on the front face at low stirring speed. 142 00:10:26,136 --> 00:10:31,376 Yet on the back face of the cube at high speeds, the effect is almost zero. 143 00:10:31,516 --> 00:10:33,696 3 - 4; 4 - 5. 144 00:10:33,846 --> 00:10:34,976 Very small amounts. 145 00:10:36,276 --> 00:10:41,076 It's very clear, the stirring speed modifies the effect of the chemical. 146 00:10:41,996 --> 00:10:47,176 There's an interaction between S and C. How do we quantify this? 147 00:10:48,056 --> 00:10:54,176 Well, like we did in class 2C, we have to add a new term to our prediction model. 148 00:10:54,876 --> 00:11:01,126 And that term, bCS, is multiplied by xC and xS. 149 00:11:02,106 --> 00:11:04,676 But how do we go calculate the b_{CS} value? 150 00:11:06,666 --> 00:11:09,826 Let's go follow the same ideas as did in class 2C. 151 00:11:11,296 --> 00:11:13,566 We have two chances to calculate it. 152 00:11:14,116 --> 00:11:18,086 One instance, at high temperature and one instance at low temperature. 153 00:11:18,696 --> 00:11:21,166 We'll calculate both, and then average the answer. 154 00:11:21,756 --> 00:11:25,056 And then, as we've always done, report half the value. 155 00:11:25,506 --> 00:11:33,366 So at high temperature, the difference due to C, at high speed, is 4 - 5. 156 00:11:34,406 --> 00:11:39,676 The difference due to C at low speed is much greater, 33 - 6. 157 00:11:40,846 --> 00:11:44,076 As you remember, interactions are always reported 158 00:11:44,076 --> 00:11:46,736 as half the difference, going from high to low. 159 00:11:47,646 --> 00:11:55,066 In other words, that's -1 - 27, which equals -28, and half of that is -14. 160 00:11:56,326 --> 00:11:58,386 Let's report that at the lower temperature. 161 00:11:59,416 --> 00:12:02,806 The difference due to C at high speed is 3 - 4. 162 00:12:03,116 --> 00:12:07,136 The difference due to C at low speed is much greater, 30 - 5. 163 00:12:07,136 --> 00:12:16,566 Report half the value, from high to low, and that is -1 - 25 = -26. 164 00:12:16,666 --> 00:12:20,416 Dividing that by two, gives -13. 165 00:12:21,196 --> 00:12:24,396 So now we have two estimates of the interaction effect. 166 00:12:25,136 --> 00:12:29,286 One estimate is -14, the other estimate is -13. 167 00:12:29,506 --> 00:12:32,716 The average of those two numbers is -13.5. 168 00:12:33,046 --> 00:12:38,516 And when we report it, let's put in, in our model a value of half that amount. 169 00:12:38,746 --> 00:12:41,476 So in other words, -6.75. 170 00:12:42,336 --> 00:12:45,886 That's the value for b_{CS} = -6.75. 171 00:12:46,656 --> 00:12:49,096 Now, let me just pause here for a second, 172 00:12:49,346 --> 00:12:53,266 and emphasize that this is all very tedious if you do it by hand. 173 00:12:53,806 --> 00:12:58,546 And we're going to show some computerized ways to do this faster in the next few classes. 174 00:12:59,046 --> 00:13:04,576 But, I always recommend, let's start with "by hand" and then see the advantage 175 00:13:04,576 --> 00:13:06,756 of it later on, when we go to computers. 176 00:13:07,136 --> 00:13:12,046 So let's take our predictions now again from the previous example and see if they are improved. 177 00:13:12,916 --> 00:13:24,176 The predicted value earlier, was 11.25 + 6.25 - 0.75 + 7.25; but now with this interaction term, 178 00:13:24,796 --> 00:13:30,486 we have an additional part "-6.75" times "1" times "-1". 179 00:13:31,726 --> 00:13:37,796 What that means is we actually get an additional amount, of 6.75 due to the interaction. 180 00:13:38,206 --> 00:13:44,366 Getting us a prediction of 30.75, much, much closer to the actual value. 181 00:13:45,436 --> 00:13:48,406 Notice here that the interaction actually works against us. 182 00:13:48,856 --> 00:13:51,976 That interaction has increased the amount of pollution. 183 00:13:52,416 --> 00:13:59,076 We could also calculate CT interactions and TS interactions. 184 00:13:59,076 --> 00:14:01,686 I've only shown you the CS interactions. 185 00:14:02,406 --> 00:14:06,206 In fact, there's even a three factor interaction, a CTS interaction, 186 00:14:06,556 --> 00:14:08,956 but all of this gets very tedious and error prone. 187 00:14:09,186 --> 00:14:13,456 Coming up in the next module we can't wait to show you some computerized shortcuts 188 00:14:13,456 --> 00:14:15,516 that will take care of all of this work for you. 189 00:14:15,896 --> 00:14:20,986 Now, at the risk of this class going on a little bit too long, I want you to sit back, 190 00:14:21,156 --> 00:14:23,506 and just think about that interaction for a second. 191 00:14:24,266 --> 00:14:28,906 Don't just see it as a number, but let's try to interpret what's really going on over here. 192 00:14:30,226 --> 00:14:33,436 Why does chemical Q appear to be less effective at lower speed, 193 00:14:33,916 --> 00:14:35,746 but at high speed it works really well? 194 00:14:36,826 --> 00:14:41,806 Maybe chemical Q, just takes a little bit longer to dissolve in water than chemical P does. 195 00:14:42,466 --> 00:14:45,546 At low stirring speeds, chemical Q is not effective, 196 00:14:46,156 --> 00:14:49,186 but at high speeds both chemicals are equally effective. 197 00:14:49,956 --> 00:14:52,456 Now here's where experiments can be really powerful. 198 00:14:53,716 --> 00:14:57,246 We saw that the lowest pollution was over here in this corner. 199 00:14:57,706 --> 00:15:01,556 When we use chemical Q, with high speed, and low temperature. 200 00:15:02,556 --> 00:15:07,686 But what if, the government's requirements was pollution had to be smaller than 10? 201 00:15:08,956 --> 00:15:13,546 And imagine also, that chemical Q cost you double the amount 202 00:15:13,586 --> 00:15:16,746 of chemical P. You can see where this is going. 203 00:15:17,826 --> 00:15:23,926 We can see that any operating point on this plane, would be effective as long 204 00:15:23,926 --> 00:15:29,436 as it's not the point with low speed and chemical Q. In fact, 205 00:15:29,836 --> 00:15:34,996 it might be a whole lot more economically profitable, to operate at this point over here, 206 00:15:35,296 --> 00:15:37,326 producing five pounds of pollution. 207 00:15:37,326 --> 00:15:40,456 We still meet the requirements for safe operation, 208 00:15:40,646 --> 00:15:46,426 because we're below the government level of 10 units, and we use less energy for stirring, 209 00:15:46,996 --> 00:15:51,006 and a cheaper chemical, P. Actually what we've done here, 210 00:15:51,546 --> 00:15:55,396 is considered an additional outcome in our mind: "profit". 211 00:15:56,386 --> 00:16:00,216 Recognize that profits or costs often play a role in any system. 212 00:16:00,536 --> 00:16:05,306 So you should always be aware of the economic impacts of every corner in your cube. 213 00:16:06,536 --> 00:16:10,216 To end this class, and this example, I want you to consider this. 214 00:16:10,496 --> 00:16:16,476 Does the fact that temperature having a small effect imply that temperature is meaningless? 215 00:16:16,966 --> 00:16:18,366 The answer is no. 216 00:16:18,726 --> 00:16:21,466 It is important to recognize that even effects 217 00:16:21,466 --> 00:16:24,496 with small numbers have an important interpretation. 218 00:16:24,966 --> 00:16:29,946 It means that over the range of temperatures selected, in this case between 72 219 00:16:29,946 --> 00:16:32,996 and a 100 Fahrenheit, that temperature has a small 220 00:16:32,996 --> 00:16:35,096 to negligible change on the pollution amount. 221 00:16:36,096 --> 00:16:40,726 Now this is a key insight, because the engineer or operator can take this, 222 00:16:40,766 --> 00:16:45,366 and select operating conditions which are the most economically advantageous. 223 00:16:46,026 --> 00:16:47,756 Again this comes down to profit. 224 00:16:47,756 --> 00:16:52,626 It is conceivable that when using lower temperature we will save energy. 225 00:16:52,996 --> 00:16:57,536 And, because temperature has such a small effect on the system overall it means 226 00:16:57,536 --> 00:17:00,776 that we will not significantly affect the pollution level 227 00:17:00,776 --> 00:17:02,476 when we operate at low temperature. 228 00:17:03,216 --> 00:17:04,556 That's a great result. 229 00:17:06,006 --> 00:17:09,096 So I want to thank you for staying with me during these examples. 230 00:17:09,636 --> 00:17:13,836 I know that they have been longer than normal, but I hope they have been insightful. 231 00:17:15,146 --> 00:17:18,386 In the module coming up next, we're going to start looking 232 00:17:18,386 --> 00:17:22,246 at how we can do fewer experiments, but still get a good amount 233 00:17:22,246 --> 00:17:24,586 of information about our process. 234 00:17:24,666 --> 00:17:26,246 Hope to see you over there.