1 00:00:02,266 --> 00:00:06,406 In the previous video, I showed you a process where we selected half the number 2 00:00:06,406 --> 00:00:11,426 of experiments, and I showed you that I used a rule where C = A*B. 3 00:00:11,546 --> 00:00:14,856 Now we shouldn't automatically use these rules. 4 00:00:15,496 --> 00:00:17,416 Let me show you where I obtained it from. 5 00:00:18,336 --> 00:00:23,616 What I'm showing you here is a table that describes how you can do less work in a way 6 00:00:23,616 --> 00:00:26,156 that you can still recover most of the information. 7 00:00:26,156 --> 00:00:31,246 This table doesn't really have a name that I'm aware of, so I call it a 'trade-off table' 8 00:00:31,766 --> 00:00:35,526 because it allows me to plan and budget for a set of experiments 9 00:00:35,676 --> 00:00:39,526 where I can trade off the costs with the information I hope to obtain. 10 00:00:40,156 --> 00:00:44,156 You're going to see this table several times again throughout the rest of the course 11 00:00:44,156 --> 00:00:48,156 but for now: remember how in the previous class we did four experiments 12 00:00:48,156 --> 00:00:50,066 and we had three factors in our system? 13 00:00:50,536 --> 00:00:55,116 On this table, we can locate ourselves over here in the top left corner, 14 00:00:55,716 --> 00:00:58,316 with four runs and three factors. 15 00:00:59,326 --> 00:01:05,406 In this entry in the table, we are told to generate C as the product of AB. 16 00:01:06,836 --> 00:01:12,666 Now remember in the wastewater example, where we had factors C, T and S? 17 00:01:12,666 --> 00:01:17,416 That would correspond to setting S as the product of C*T 18 00:01:17,416 --> 00:01:19,846 if we were using those factor names. 19 00:01:20,766 --> 00:01:25,836 When you're dealing with your case and your factor names are different, simply replace them 20 00:01:25,836 --> 00:01:29,636 with A, B, Cs temporarily to use this trade-off table. 21 00:01:31,576 --> 00:01:34,526 Now I guess you're curious about that plus and minus sign here. 22 00:01:34,526 --> 00:01:41,096 If I use the minus, it says to generate -C as the product of AB. 23 00:01:41,096 --> 00:01:49,736 In our wastewater example, that would translate to -S is the product of CT. 24 00:01:49,736 --> 00:01:56,086 If you created your design following the rule with the minus sign, 25 00:01:56,646 --> 00:01:59,976 you'd notice that you would end up with the closed circles 26 00:01:59,976 --> 00:02:03,096 on the original cube, rather than the open circles. 27 00:02:03,436 --> 00:02:04,736 But notice the symmetry. 28 00:02:05,556 --> 00:02:07,756 The closed circles still have 29 00:02:07,756 --> 00:02:12,306 that same collapsibility we noticed earlier with the open circles. 30 00:02:13,436 --> 00:02:17,096 Now with so many learners from a wide range of backgrounds in this course, 31 00:02:17,656 --> 00:02:20,776 I know that some of you might have difficulty with the math that follows. 32 00:02:21,566 --> 00:02:22,246 Be patient. 33 00:02:22,596 --> 00:02:24,496 Watch it all unfold here on the screen. 34 00:02:24,886 --> 00:02:27,196 And I'll explain it in the end in plain language. 35 00:02:27,836 --> 00:02:30,906 We're hoping to appeal to those of you with a technical background, 36 00:02:31,276 --> 00:02:33,796 and also those of you with a non-technical background. 37 00:02:34,796 --> 00:02:36,306 So let's get started with that math. 38 00:02:36,446 --> 00:02:39,166 Start by writing out the standard order table. 39 00:02:39,166 --> 00:02:42,386 Three factors means that there are eight experiments. 40 00:02:42,506 --> 00:02:45,926 But in the last video, I showed you that the best four experiments 41 00:02:45,926 --> 00:02:49,236 to pick are either the open circles or the closed circles. 42 00:02:49,646 --> 00:02:54,576 Let's work with the open circles, they corresponded to rows two, three, five, 43 00:02:54,576 --> 00:02:57,146 and eight from the original standard order table. 44 00:02:57,146 --> 00:03:02,326 So I'm going to remove those other four rows, the experiments we didn't run, 45 00:03:02,936 --> 00:03:06,076 and leave only the four behind that we actually ran. 46 00:03:07,896 --> 00:03:10,306 Now I'm going to slightly rearrange them for you. 47 00:03:10,666 --> 00:03:14,716 Put row five first, and then original row two and three, 48 00:03:14,716 --> 00:03:18,086 and then finally the original eighth row becomes our fourth row. 49 00:03:18,086 --> 00:03:25,236 Notice that columns A and B are in standard order and that column C is the product of A 50 00:03:25,236 --> 00:03:31,106 and B. In a prior video I showed you the matrix form for a three factor system, 51 00:03:31,236 --> 00:03:33,666 in the context of the wastewater treatment example. 52 00:03:35,586 --> 00:03:41,746 The matrix form adds extra columns for the two factor interactions and an extra column 53 00:03:41,746 --> 00:03:43,346 for the three factor interaction. 54 00:03:44,546 --> 00:04:04,126 There were eight unknowns in that example, in that vector b. So I'm going 55 00:04:04,126 --> 00:04:07,686 to remove those other four rows, the experiments we didn't run, 56 00:04:08,296 --> 00:04:11,436 and leave only the four behind that we actually ran. 57 00:04:13,266 --> 00:04:15,676 Now I'm going to slightly rearrange them for you. 58 00:04:16,026 --> 00:04:20,076 Put row five first, and then the original row two and three, 59 00:04:20,076 --> 00:04:23,536 and then finally the original row eight becomes our fourth row. 60 00:04:29,126 --> 00:04:32,716 Examine any patterns you notice in the eight columns. 61 00:04:34,586 --> 00:04:37,026 Did you pick up on the similar columns in the matrix? 62 00:04:37,026 --> 00:04:41,706 For example, notice that the column AB matches the column for C. 63 00:04:42,276 --> 00:04:48,656 And that the B column has the same recurring pattern as the AC column, minus minus plus plus. 64 00:04:48,816 --> 00:04:55,046 The BC column matches A. And finally, the ABC column matches the intercept column. 65 00:04:56,116 --> 00:04:59,146 What you need to take away from this explanation is 66 00:04:59,146 --> 00:05:03,126 that you notice the same pattern reoccurring in certain columns. 67 00:05:04,056 --> 00:05:08,346 What this means is we won't be able to tell those columns apart from each other. 68 00:05:09,876 --> 00:05:11,856 Telling columns apart is critical. 69 00:05:12,846 --> 00:05:15,786 In the prior four factorials, perhaps you noticed 70 00:05:15,786 --> 00:05:18,316 that each column was unique and independent. 71 00:05:18,936 --> 00:05:24,846 This helps us know the unique contribution of that factor to the outcome, the y variable. 72 00:05:26,076 --> 00:05:28,826 What does it mean if we cannot tell the columns apart? 73 00:05:29,186 --> 00:05:32,526 Let's use A and the BC columns as an example. 74 00:05:33,566 --> 00:05:37,426 By doing this fractional factorial, from a mathematical perspective, 75 00:05:38,056 --> 00:05:40,086 those two columns appear to be the same. 76 00:05:40,376 --> 00:05:43,526 We use the word alias to describe the situation. 77 00:05:43,816 --> 00:05:47,786 Now, you may be familiar with the word alias as used in the English language. 78 00:05:47,876 --> 00:05:52,656 For example, if I asked you, who is Jorge Mario Bergoglio? 79 00:05:53,216 --> 00:05:54,806 You may not necessarily know. 80 00:05:55,046 --> 00:05:59,506 But you do know his alias, Pope Francis. 81 00:05:59,506 --> 00:06:01,016 We all have aliases. 82 00:06:01,526 --> 00:06:06,146 In the real world, my family and friends know me by my full name, Kevin George Dunn. 83 00:06:06,736 --> 00:06:11,686 But, online, my alias is my email address or my username for a website. 84 00:06:11,686 --> 00:06:16,576 So an alias simply means we have another name for something else. 85 00:06:16,576 --> 00:06:23,756 Back to these experiments, where the alias for A is B times C. There are three other aliases, 86 00:06:23,946 --> 00:06:29,226 B is the alias for A times C, C is the alias for A times B, 87 00:06:29,986 --> 00:06:31,826 and that one was intentional, remember. 88 00:06:31,986 --> 00:06:36,146 We chose to set C as the product of A and B in our design. 89 00:06:36,236 --> 00:06:41,236 Finally, the last alias is the three factor interaction between A, B, 90 00:06:41,236 --> 00:06:43,916 and C, is aliased with the intercept. 91 00:06:44,336 --> 00:06:49,826 Now when I talked about not being able to distinguish between A and the BC interaction, 92 00:06:50,256 --> 00:06:54,436 or not being able to tell the columns apart, that is a consequence of aliasing. 93 00:06:54,976 --> 00:06:57,076 The effects of A and the effects 94 00:06:57,076 --> 00:07:01,066 of the BC interaction are aliased or mixed up with each other. 95 00:07:01,186 --> 00:07:06,896 So if there's a large effect size, it might be due to A, or it might be due 96 00:07:06,896 --> 00:07:08,976 to the BC two factor interaction. 97 00:07:09,526 --> 00:07:13,446 We won't know until we run more experiments. 98 00:07:13,446 --> 00:07:17,576 The term confounding is used to describe what is happening here. 99 00:07:18,746 --> 00:07:22,926 That means, after doing the reduced set of four experiments, 100 00:07:23,326 --> 00:07:28,336 you will never really be sure whether it was A that caused the change in the outcome, 101 00:07:28,336 --> 00:07:31,366 or whether it was the two factor interaction BC. 102 00:07:31,366 --> 00:07:39,826 Confounded is a word that means confused, it means the effect of A is confused or mixed 103 00:07:39,826 --> 00:07:43,766 up with or confounded with the BC interaction. 104 00:07:44,446 --> 00:07:50,266 We cannot tell them apart, A is an alias for BC and BC is an alias 105 00:07:50,266 --> 00:07:56,456 for A. That's the price we pay for doing half the experiments, and in some cases, 106 00:07:56,856 --> 00:07:58,896 it is a price that is worth paying. 107 00:07:59,316 --> 00:08:00,506 Let's review the math. 108 00:08:01,176 --> 00:08:05,736 Those with a background in this sort of thing will recognize that one way to solve a set 109 00:08:05,736 --> 00:08:10,316 of underdetermined equations where there are more unknowns than equations, 110 00:08:10,816 --> 00:08:16,186 is to collapse these columns together to achieve a square system that can be solved. 111 00:08:17,366 --> 00:08:20,526 Now you can see algebraically how this confounding develops. 112 00:08:20,896 --> 00:08:26,436 The coefficient for A in the model is a combination of the original A, plus BC. 113 00:08:26,436 --> 00:08:33,776 Similarly, the coefficient for B is the sum of the original B plus AC, and so on. 114 00:08:35,006 --> 00:08:37,636 These four entries here are our aliases. 115 00:08:38,796 --> 00:08:43,766 It also explains why you see only four coefficients in the R software outputs 116 00:08:43,766 --> 00:08:45,936 and NA values for the remainder of them. 117 00:08:47,066 --> 00:08:52,326 R notices that there are aliased terms in the full system that we requested a model for, 118 00:08:52,416 --> 00:08:56,906 but it will only report the coefficient for one of the aliases. 119 00:08:56,906 --> 00:09:00,396 To end off the class, you might be concerned that you've lost quite a bit 120 00:09:00,396 --> 00:09:06,566 of information by using a fractional factorial. 121 00:09:06,566 --> 00:09:13,366 In the next video we see the situation isn't quite so bad. 122 00:09:13,996 --> 00:09:18,676 Using prior knowledge about our process we can actually benefit from knowing about the aliases. 123 00:09:20,076 --> 00:09:23,846 Now hopefully this video has not left you too confused or confounded. 124 00:09:24,816 --> 00:09:28,136 Just to summarize, there are three new terms that you must be comfortable 125 00:09:28,136 --> 00:09:33,486 with from today's class, half fractions, aliasing, and confounding. 126 00:09:34,216 --> 00:09:36,506 We're going to use them a lot more in the next class.