1 00:00:02,740 --> 00:00:09,769 In this video, we demonstrate how the response surface strategy changes as we reach the optimum. 2 00:00:09,769 --> 00:00:15,570 Issues of curvature and non-linearity become important at the peak of the mountain. 3 00:00:15,570 --> 00:00:20,250 One advantage of response surface methods is that we learn about the region around us 4 00:00:20,250 --> 00:00:26,500 as we go. Remember that analogy of walking with a ski pole in your hand? Well, we never 5 00:00:26,500 --> 00:00:31,000 really know the region around us. So when we use that ski pole to figure out what the 6 00:00:31,000 --> 00:00:37,700 terrain looks like, we need to have a way to know when we've reached the top. 7 00:00:37,700 --> 00:00:43,450 Let's just quickly contrast the response surface approach with the OFAT approach. The COST 8 00:00:43,450 --> 00:00:48,260 approach, or the OFAT approach, makes you think that you're at the optimum, but you 9 00:00:48,260 --> 00:00:53,850 can never really be sure. In this case that we saw earlier with two factors, you would 10 00:00:53,850 --> 00:00:59,590 alternate between optimizing factor A, then factor B, then optimize factor A again, then 11 00:00:59,590 --> 00:01:05,869 B again. And you'll eventually get to an optimum, but will you be sure you're at the peak? How 12 00:01:05,869 --> 00:01:10,450 do you know you don't need to do another round of optimizing in A and B again? 13 00:01:10,450 --> 00:01:17,200 Also, if I'd optimized B first and then A, I would have arrived at the optimum faster. 14 00:01:17,200 --> 00:01:22,729 This seems like a lottery! Sometimes you get to the peak quickly, and sometimes slower. 15 00:01:22,729 --> 00:01:26,929 Not surprisingly, statisticians don't like this sort of thing. 16 00:01:26,929 --> 00:01:33,359 Furthermore, this approach doesn't scale well. If you had five factors, for example, A, B, 17 00:01:33,359 --> 00:01:39,350 C, D, and E, then this haphazard searching across the five factors leads to inefficient 18 00:01:39,350 --> 00:01:40,649 experimentation. 19 00:01:40,649 --> 00:01:45,889 By using the COST approach you will not learn about the interactions in your system. Recall 20 00:01:45,889 --> 00:01:50,469 from an earlier video in this module that learning more about our systems was the first 21 00:01:50,469 --> 00:01:54,130 way we can use data to improve our processes. 22 00:01:54,130 --> 00:02:01,479 So let's resume and continue with the model built on points 11, 12, 13, 14, and the baseline 23 00:02:01,479 --> 00:02:08,450 at point 10. We pointed out that the contour plots exhibit curvature. The lines are not 24 00:02:08,450 --> 00:02:15,190 parallel. These curved lines come from the interaction term, indicating that the interaction 25 00:02:15,190 --> 00:02:19,250 coefficient is important relative to the main effects. 26 00:02:19,250 --> 00:02:25,400 In prior models, the interaction term was small. Notice though, that the steepest ascent 27 00:02:25,400 --> 00:02:31,490 method will still send us up in the correct direction if we ignore the interaction terms. 28 00:02:31,490 --> 00:02:35,910 The interaction term, if we had accounted for it, would send us in a slightly different 29 00:02:35,910 --> 00:02:37,110 angle. 30 00:02:37,110 --> 00:02:43,220 But in this example, the discrepancy is not so bad. Had their interaction term sent us 31 00:02:43,220 --> 00:02:48,330 into a different direction, we would definitely follow that direction instead of the steepest 32 00:02:48,330 --> 00:02:54,510 ascent that is determined only with the linear terms. But more on that to come with this 33 00:02:54,510 --> 00:02:57,090 topic of "curvature". 34 00:02:57,090 --> 00:03:02,520 Let's quickly go take a step in the direction for run number 15. And because you are good 35 00:03:02,520 --> 00:03:08,470 at this now, I am going to take a step of "Delta x_T" equal to the size two and the 36 00:03:08,470 --> 00:03:15,440 corresponding "Delta x_P" is equal to to minus two-thirds. You can do the rest of the calculations 37 00:03:15,440 --> 00:03:24,240 yourself and show that the predicted value of profit at this location is $742, and that 38 00:03:24,240 --> 00:03:31,250 corresponds to these real world values and these coded values. When we run the actual 39 00:03:31,250 --> 00:03:39,770 experiment, we record a profit of $735. That's an overestimate of $7. This overestimate is 40 00:03:39,770 --> 00:03:42,410 comparable to the main effect. 41 00:03:42,410 --> 00:03:48,450 And we also have visual evidence now of curvature. This is starting to tell me that I should 42 00:03:48,450 --> 00:03:53,880 change my strategy. When we start to enter a region of curvature in response surface 43 00:03:53,880 --> 00:04:00,540 methods, it is the presence of a change in the surface's linearity that's apparent. 44 00:04:00,540 --> 00:04:06,380 We're becoming more nonlinear, and likely approaching an optimum. It is desirable to 45 00:04:06,380 --> 00:04:12,290 know when this is happening. And one indication of that already is that our interaction terms 46 00:04:12,290 --> 00:04:19,719 are large, they cannot be ignored. And visually, we see that as these non-parallel lines in 47 00:04:19,719 --> 00:04:22,419 the contour plot. 48 00:04:22,419 --> 00:04:28,650 The second indication that an optimum is close by is that we are levelling out. Levelling 49 00:04:28,650 --> 00:04:34,689 out means that my outcome values, in the neighbourhood, are getting closer and closer, even when I'm 50 00:04:34,689 --> 00:04:37,900 taking reasonable step changes. 51 00:04:37,900 --> 00:04:45,319 Let's see this. The spread in profit values in the first factorial was around a $300 difference. 52 00:04:45,319 --> 00:04:52,999 In the second factorial over here, that spread was around $150. And now in this third factorial, 53 00:04:52,999 --> 00:04:56,430 my spreads are only $15 to $20. 54 00:04:56,430 --> 00:05:01,419 We're not making the gains we had made earlier. And if we're not careful, we can be affected 55 00:05:01,419 --> 00:05:07,569 by noise. If we don't know the level of noise around us, we might be misled. How do I know 56 00:05:07,569 --> 00:05:13,569 whether that spread of $15 to $20 is any different to the noise in the system? Another way to 57 00:05:13,569 --> 00:05:20,789 ask that is if we repeated those corner experiments, would we get similar values or different values? 58 00:05:20,789 --> 00:05:26,090 So let's go calculate what the noise level is. Run at least three or four repeated experiments 59 00:05:26,090 --> 00:05:32,830 at the same condition. And we typically use the baseline. So here at the base of the factorial. 60 00:05:32,830 --> 00:05:43,800 I previously had an outcome of $732, and two more runs give me an outcome of $733 and $737. 61 00:05:43,800 --> 00:05:50,779 So there's a spread of about $5. That spread is very different to the spread over the corner 62 00:05:50,779 --> 00:05:56,479 points of the factorial. Indicating, I'm still seeing signal over the noise. 63 00:05:56,479 --> 00:06:02,879 The third indication of an optimum, is whether our predictions are too high, or too low. 64 00:06:02,879 --> 00:06:10,340 We saw here at point 15, we had a prediction error of $7, just over our level of noise. 65 00:06:10,340 --> 00:06:13,669 This indicates the model can be improved. 66 00:06:13,669 --> 00:06:20,159 We often observe strong changes in the model's surface near the optimum. For example, if 67 00:06:20,159 --> 00:06:23,770 you're making a product, you want to make it long enough to bring out the beautiful 68 00:06:23,770 --> 00:06:30,069 colours and caramelization flavours that occur. But go just a little bit too far and it becomes 69 00:06:30,069 --> 00:06:31,069 burnt. 70 00:06:31,069 --> 00:06:36,710 We also see this in engineering systems. Often, our optimal point of operation is right at 71 00:06:36,710 --> 00:06:41,599 the edge of a cliff, and if we go just a little bit further, we fall over to the edge of the 72 00:06:41,599 --> 00:06:46,949 cliff and see our outcome value drop down rapidly. Another good reason to take small 73 00:06:46,949 --> 00:06:48,689 steps near the optimum. 74 00:06:48,689 --> 00:06:54,069 A fourth way to detect curvature is that our model does not fit the surface very well. 75 00:06:54,069 --> 00:06:59,990 A linear model cannot fit a curved surface well. And we use the terminology, "lack of 76 00:06:59,990 --> 00:07:06,979 fit", to quantify that. Let me show you. In our first factorial, the center point was 77 00:07:06,979 --> 00:07:15,180 $407, but the predicted center point was $390. That's a difference of $17. 78 00:07:15,180 --> 00:07:20,729 Now that might seem large, but it really isn't when we compare it to the main effect of 55 79 00:07:20,729 --> 00:07:28,770 and 134. Recall what the interpretation of that number 55 is again? So a $17 difference 80 00:07:28,770 --> 00:07:33,710 really is small, indicating a small lack of fit. 81 00:07:33,710 --> 00:07:41,759 In the second factorial, the actual center was $657 while the predicted center was $645. 82 00:07:41,759 --> 00:07:47,830 A difference of $12. That again is small when compared to the neighbourhood we're in. 83 00:07:47,830 --> 00:07:53,460 In this third factorial though, the actual center is at the average of these three baseline 84 00:07:53,460 --> 00:08:01,089 values, $734. Compare that to the predicted center value of $724. That's a difference 85 00:08:01,089 --> 00:08:07,819 of $10. Which when compared to the largest effect of 7.5 and to the level of noise of 86 00:08:07,819 --> 00:08:13,839 about $5, indicates an important deviation in the model, versus the actual surface that 87 00:08:13,839 --> 00:08:17,080 we're on, at least in the center. 88 00:08:17,080 --> 00:08:21,689 So if we're getting large deviations at the center, we cannot hope to get good predictions 89 00:08:21,689 --> 00:08:26,930 outside of the range of the model. And good predictions are essential to optimize in the 90 00:08:26,930 --> 00:08:29,599 correct direction. 91 00:08:29,599 --> 00:08:34,710 So there are four ways that we've shown to check for inadequacy in the model. And those 92 00:08:34,710 --> 00:08:39,440 of you with a statistical background can go calculate the confidence intervals on the 93 00:08:39,440 --> 00:08:44,720 model coefficients, and observe that they're very wide. None of the terms in the model 94 00:08:44,720 --> 00:08:46,649 are statistically significant. 95 00:08:46,649 --> 00:08:52,470 Well, as we saw in the single-variable popcorn example, when faced with a poorly predicting 96 00:08:52,470 --> 00:08:58,079 model in a region that has curvature, we can add terms to account for the nonlinearity: 97 00:08:58,079 --> 00:09:01,690 "quadratic terms". So let's go add these now. 98 00:09:01,690 --> 00:09:08,449 There are two options: adding points on the face of the cube, or adding points a little 99 00:09:08,449 --> 00:09:15,060 bit further called "axial points" or "star points". These points are at a distance denoted 100 00:09:15,060 --> 00:09:20,750 as alpha from the center. Alpha is a value greater than 1 to ensure they are outside 101 00:09:20,750 --> 00:09:22,670 the cube. 102 00:09:22,670 --> 00:09:28,170 The design on the left works well if you hit into a constraint, or can not leave the factorial 103 00:09:28,170 --> 00:09:34,070 space. The design on the right, comes from a class of designs called central composite 104 00:09:34,070 --> 00:09:40,750 designs or CCD, and they're preferred for the statistical reason called rotatability. 105 00:09:40,750 --> 00:09:46,980 Just a quick aside, rotatability simply means that the prediction error is equal for any 106 00:09:46,980 --> 00:09:52,329 two points that are the same distance from the center. And it's a desirable statistical 107 00:09:52,329 --> 00:09:53,019 property. 108 00:09:53,019 --> 00:09:59,810 Now, there are various choices on the distance alpha and the number of center points to use, 109 00:09:59,810 --> 00:10:04,920 but that's a messy discussion that you can research quite easily. The general advice 110 00:10:04,920 --> 00:10:10,980 is this though: run the factorials first; then run the star points afterwards at a distance 111 00:10:10,980 --> 00:10:16,300 of alpha equal to 2\^k taken to the power of 0.25. 112 00:10:16,300 --> 00:10:22,940 So, if you have two factors, alpha = 1.41, and if you had three factors, you would have 113 00:10:22,940 --> 00:10:31,019 alpha = 1.68. Also, add three to four center points to assess lack of fit. And run these 114 00:10:31,019 --> 00:10:35,310 center points at different times, not all after each other. 115 00:10:35,310 --> 00:10:43,000 Notice this though, from the individual perspective of factor T and from factor P, each of these 116 00:10:43,000 --> 00:10:49,800 have runs at five distinct levels, and that's what helps us go accurately fit that quadratic 117 00:10:49,800 --> 00:10:51,959 model. 118 00:10:51,959 --> 00:10:58,529 Let's go do this! The first star point is run number 18 at a value of +alpha for factor 119 00:10:58,529 --> 00:11:06,259 T in coded units, and a value of zero in factor P. Let's add that to the table, and also calculate 120 00:11:06,259 --> 00:11:13,079 the real world units for it in the usual way. So that's 343 parts per hour, and a sales 121 00:11:13,079 --> 00:11:20,740 price of $1.63. You can go practice reproducing the other three star points, and let's go 122 00:11:20,740 --> 00:11:26,600 add one final center point experiment, number 22, so that we have a total of four center 123 00:11:26,600 --> 00:11:28,190 points. 124 00:11:28,190 --> 00:11:33,139 Now we go run these experiments, in random order of course, and report the values here 125 00:11:33,139 --> 00:11:40,350 in standard order. Notice firstly that the center point 22 is similar to the prior values 126 00:11:40,350 --> 00:11:44,370 indicating that the system is still stable and reproducible. 127 00:11:44,370 --> 00:11:48,730 Well we've got quite the collection of data here. A central composite design (CCD), always 128 00:11:48,730 --> 00:11:54,190 has the factorial points, center points, and star points. Now I've arranged them in that 129 00:11:54,190 --> 00:11:56,629 order in the R code. 130 00:11:56,629 --> 00:12:02,730 When we run that code, we get the quadratic model from them all. I will leave it as a 131 00:12:02,730 --> 00:12:06,879 small challenge to you, to go prove the following two things. 132 00:12:06,879 --> 00:12:13,029 Firstly, the model's prediction of the center point, when compared to the average of the 133 00:12:13,029 --> 00:12:20,100 four center points has a very small deviation. So this model fits well, at least at the center. 134 00:12:20,100 --> 00:12:26,079 Secondly, this quadratic model's prediction of the other points, for example, one of the 135 00:12:26,079 --> 00:12:31,810 corner points, or one of the star points, or even experiment 15 over here, is a very 136 00:12:31,810 --> 00:12:37,690 good prediction. There is little prediction error. So we have confidence in this model's 137 00:12:37,690 --> 00:12:39,190 prediction. 138 00:12:39,190 --> 00:12:44,769 Now let's go visualize those as a contour plots. And right away, we can see we are in 139 00:12:44,769 --> 00:12:50,129 fact near the optimum. Visually, the axial point is pretty close to the predicted optimum 140 00:12:50,129 --> 00:12:56,129 region from the model. That's good enough to stop here and use as our optimum. 141 00:12:56,129 --> 00:13:01,110 But let's say the quadratic model had looked like this one instead. Then you would go run 142 00:13:01,110 --> 00:13:06,560 your next experiment over here based on the model at that predicted optimum. And then 143 00:13:06,560 --> 00:13:11,170 you would go verify the model's prediction ability at that point to check that you've 144 00:13:11,170 --> 00:13:13,569 reached the optimum. 145 00:13:13,569 --> 00:13:17,480 Now we can be a bit more precise -- for those of you who don't like to trust the visual 146 00:13:17,480 --> 00:13:23,589 judgement. We can take this quadratic equation, differentiate it with respect to the coded 147 00:13:23,589 --> 00:13:29,519 variables, set it equal to zero, and you will get a set of two linear equations and two 148 00:13:29,519 --> 00:13:34,500 unknowns, which you can then solve using your favourite linear algebra software, or by hand. 149 00:13:34,500 --> 00:13:40,060 When you go do that, you get the predicted optimum at 343 parts per hour, and a selling 150 00:13:40,060 --> 00:13:48,470 price of $1.59. The quadratic model tells us to expect a profit of $740 at this point. 151 00:13:48,470 --> 00:13:55,240 Running that 23rd experiment gives an actual profit of $739; that's very close agreement. 152 00:13:55,240 --> 00:14:00,439 This is definitely the largest value we've observed over the entire approach followed. 153 00:14:00,439 --> 00:14:04,790 So this video has answered the last question we had in an earlier video in this module: 154 00:14:04,790 --> 00:14:09,839 "How do we know when to stop?" We know that we can stop when our model matches the surface 155 00:14:09,839 --> 00:14:14,329 well; and the model predicts an optimum. Using the model, we know that we've reached the 156 00:14:14,329 --> 00:14:22,600 peak of the mountain, even though we cannot see the actual mountain around us . 157 00:14:22,600 --> 00:14:27,990 So let's recap our entire approach. Start by building successive linear models, shown 158 00:14:27,990 --> 00:14:33,540 here in blue, green, and orange, respectively. I'm showing you the prediction contours in 159 00:14:33,540 --> 00:14:39,819 those colours for the local region around each model. Each of those local models had 160 00:14:39,819 --> 00:14:43,519 their baseline or 0-0 value. 161 00:14:43,519 --> 00:14:48,199 These past videos have also shown that we should incorporate the baseline points, as 162 00:14:48,199 --> 00:14:54,149 well as other points in the neighbourhood in our model, to help improve their estimates. 163 00:14:54,149 --> 00:14:59,980 We use our models as long as we have confidence in their predictions. We rebuild the model 164 00:14:59,980 --> 00:15:04,970 once we demonstrate those predictions are poor, judged by comparing the predictions 165 00:15:04,970 --> 00:15:09,509 to the actual values, and taking noise into account. 166 00:15:09,509 --> 00:15:15,569 As we approach the optimum, issues regarding curvature, which we studied in four points, 167 00:15:15,569 --> 00:15:22,069 become apparent. We have to change our strategy. If we pick up we have curvature, based on 168 00:15:22,069 --> 00:15:28,740 these criteria, we have to start decreasing our step size and to start fitting quadratic 169 00:15:28,740 --> 00:15:29,600 models. 170 00:15:29,600 --> 00:15:35,620 The principle of an optimum is that it's nonlinear. Points around us must be lower. And so our 171 00:15:35,620 --> 00:15:42,170 last prediction model that we build, shown here in red, illustrates that quite nicely. 172 00:15:42,170 --> 00:15:47,459 To end off with though, let me show you the true surface in a grey colour. This is obviously 173 00:15:47,459 --> 00:15:52,350 something you would never seen in practice. But seeing it here gives you good confidence 174 00:15:52,350 --> 00:15:55,649 that we're doing the right thing all along. 175 00:15:55,649 --> 00:16:00,170 You can see how the models in blue, green, and orange approximated the non-linear surface 176 00:16:00,170 --> 00:16:06,980 very well in their local region. Outside their local neighbourhood, they start to deviate. 177 00:16:06,980 --> 00:16:13,240 The non-linear model fits the surface over a wider region. That isn't too surprising. 178 00:16:13,240 --> 00:16:19,779 The information to build that non-linear model, required four plus four, plus four, or 12 179 00:16:19,779 --> 00:16:24,569 experiments. And we use that non-linear model to place our final experiment(s) very close 180 00:16:24,569 --> 00:16:26,740 to the true optimum. 181 00:16:26,740 --> 00:16:32,180 To end this video, I will add one point: the real optimum, may move. 182 00:16:32,180 --> 00:16:38,279 Our system could deteriorate and change, so that optimum that you found - won't stay there. 183 00:16:38,279 --> 00:16:44,540 They are experimental tools that continually keep searching and moving towards the optimum. 184 00:16:44,540 --> 00:16:50,420 We won't have time to cover them in this course, but the topic of Evolutionary Operation (EVOP) 185 00:16:50,420 --> 00:16:56,620 is what you should search for if that interests you. It is particularly applicable to manufacturing 186 00:16:56,620 --> 00:17:02,370 systems that are never stable. That mountain is moving and you have to move as well in 187 00:17:02,370 --> 00:17:03,749 order to remain at the peak.