1
00:00:02,740 --> 00:00:09,769
In this video, we demonstrate how the response
surface strategy changes as we reach the optimum.

2
00:00:09,769 --> 00:00:15,570
Issues of curvature and non-linearity become
important at the peak of the mountain.

3
00:00:15,570 --> 00:00:20,250
One advantage of response surface methods
is that we learn about the region around us

4
00:00:20,250 --> 00:00:26,500
as we go. Remember that analogy of walking
with a ski pole in your hand? Well, we never

5
00:00:26,500 --> 00:00:31,000
really know the region around us. So when
we use that ski pole to figure out what the

6
00:00:31,000 --> 00:00:37,700
terrain looks like, we need to have a way
to know when we've reached the top.

7
00:00:37,700 --> 00:00:43,450
Let's just quickly contrast the response surface
approach with the OFAT approach. The COST

8
00:00:43,450 --> 00:00:48,260
approach, or the OFAT approach, makes you
think that you're at the optimum, but you

9
00:00:48,260 --> 00:00:53,850
can never really be sure. In this case that
we saw earlier with two factors, you would

10
00:00:53,850 --> 00:00:59,590
alternate between optimizing factor A, then
factor B, then optimize factor A again, then

11
00:00:59,590 --> 00:01:05,869
B again. And you'll eventually get to an optimum,
but will you be sure you're at the peak? How

12
00:01:05,869 --> 00:01:10,450
do you know you don't need to do another round
of optimizing in A and B again?

13
00:01:10,450 --> 00:01:17,200
Also, if I'd optimized B first and then A,
I would have arrived at the optimum faster.

14
00:01:17,200 --> 00:01:22,729
This seems like a lottery! Sometimes you get
to the peak quickly, and sometimes slower.

15
00:01:22,729 --> 00:01:26,929
Not surprisingly, statisticians don't like
this sort of thing.

16
00:01:26,929 --> 00:01:33,359
Furthermore, this approach doesn't scale well.
If you had five factors, for example, A, B,

17
00:01:33,359 --> 00:01:39,350
C, D, and E, then this haphazard searching
across the five factors leads to inefficient

18
00:01:39,350 --> 00:01:40,649
experimentation.

19
00:01:40,649 --> 00:01:45,889
By using the COST approach you will not learn
about the interactions in your system. Recall

20
00:01:45,889 --> 00:01:50,469
from an earlier video in this module that
learning more about our systems was the first

21
00:01:50,469 --> 00:01:54,130
way we can use data to improve our processes.

22
00:01:54,130 --> 00:02:01,479
So let's resume and continue with the model
built on points 11, 12, 13, 14, and the baseline

23
00:02:01,479 --> 00:02:08,450
at point 10. We pointed out that the contour
plots exhibit curvature. The lines are not

24
00:02:08,450 --> 00:02:15,190
parallel. These curved lines come from the
interaction term, indicating that the interaction

25
00:02:15,190 --> 00:02:19,250
coefficient is important relative to the main
effects.

26
00:02:19,250 --> 00:02:25,400
In prior models, the interaction term was
small. Notice though, that the steepest ascent

27
00:02:25,400 --> 00:02:31,490
method will still send us up in the correct
direction if we ignore the interaction terms.

28
00:02:31,490 --> 00:02:35,910
The interaction term, if we had accounted
for it, would send us in a slightly different

29
00:02:35,910 --> 00:02:37,110
angle.

30
00:02:37,110 --> 00:02:43,220
But in this example, the discrepancy is not
so bad. Had their interaction term sent us

31
00:02:43,220 --> 00:02:48,330
into a different direction, we would definitely
follow that direction instead of the steepest

32
00:02:48,330 --> 00:02:54,510
ascent that is determined only with the linear
terms. But more on that to come with this

33
00:02:54,510 --> 00:02:57,090
topic of "curvature".

34
00:02:57,090 --> 00:03:02,520
Let's quickly go take a step in the direction
for run number 15. And because you are good

35
00:03:02,520 --> 00:03:08,470
at this now, I am going to take a step of
"Delta x_T" equal to the size two and the

36
00:03:08,470 --> 00:03:15,440
corresponding "Delta x_P" is equal to to minus
two-thirds. You can do the rest of the calculations

37
00:03:15,440 --> 00:03:24,240
yourself and show that the predicted value
of profit at this location is $742, and that

38
00:03:24,240 --> 00:03:31,250
corresponds to these real world values and
these coded values. When we run the actual

39
00:03:31,250 --> 00:03:39,770
experiment, we record a profit of $735. That's
an overestimate of $7. This overestimate is

40
00:03:39,770 --> 00:03:42,410
comparable to the main effect.

41
00:03:42,410 --> 00:03:48,450
And we also have visual evidence now of curvature.
This is starting to tell me that I should

42
00:03:48,450 --> 00:03:53,880
change my strategy. When we start to enter
a region of curvature in response surface

43
00:03:53,880 --> 00:04:00,540
methods, it is the presence of a change in
the surface's linearity that's apparent.

44
00:04:00,540 --> 00:04:06,380
We're becoming more nonlinear, and likely
approaching an optimum. It is desirable to

45
00:04:06,380 --> 00:04:12,290
know when this is happening. And one indication
of that already is that our interaction terms

46
00:04:12,290 --> 00:04:19,719
are large, they cannot be ignored. And visually,
we see that as these non-parallel lines in

47
00:04:19,719 --> 00:04:22,419
the contour plot.

48
00:04:22,419 --> 00:04:28,650
The second indication that an optimum is close
by is that we are levelling out. Levelling

49
00:04:28,650 --> 00:04:34,689
out means that my outcome values, in the neighbourhood,
are getting closer and closer, even when I'm

50
00:04:34,689 --> 00:04:37,900
taking reasonable step changes.

51
00:04:37,900 --> 00:04:45,319
Let's see this. The spread in profit values
in the first factorial was around a $300 difference.

52
00:04:45,319 --> 00:04:52,999
In the second factorial over here, that spread
was around $150. And now in this third factorial,

53
00:04:52,999 --> 00:04:56,430
my spreads are only $15 to $20.

54
00:04:56,430 --> 00:05:01,419
We're not making the gains we had made earlier.
And if we're not careful, we can be affected

55
00:05:01,419 --> 00:05:07,569
by noise. If we don't know the level of noise
around us, we might be misled. How do I know

56
00:05:07,569 --> 00:05:13,569
whether that spread of $15 to $20 is any different
to the noise in the system? Another way to

57
00:05:13,569 --> 00:05:20,789
ask that is if we repeated those corner experiments,
would we get similar values or different values?

58
00:05:20,789 --> 00:05:26,090
So let's go calculate what the noise level
is. Run at least three or four repeated experiments

59
00:05:26,090 --> 00:05:32,830
at the same condition. And we typically use
the baseline. So here at the base of the factorial.

60
00:05:32,830 --> 00:05:43,800
I previously had an outcome of $732, and two
more runs give me an outcome of $733 and $737.

61
00:05:43,800 --> 00:05:50,779
So there's a spread of about $5. That spread
is very different to the spread over the corner

62
00:05:50,779 --> 00:05:56,479
points of the factorial. Indicating, I'm still
seeing signal over the noise.

63
00:05:56,479 --> 00:06:02,879
The third indication of an optimum, is whether
our predictions are too high, or too low.

64
00:06:02,879 --> 00:06:10,340
We saw here at point 15, we had a prediction
error of $7, just over our level of noise.

65
00:06:10,340 --> 00:06:13,669
This indicates the model can be improved.

66
00:06:13,669 --> 00:06:20,159
We often observe strong changes in the model's
surface near the optimum. For example, if

67
00:06:20,159 --> 00:06:23,770
you're making a product, you want to make
it long enough to bring out the beautiful

68
00:06:23,770 --> 00:06:30,069
colours and caramelization flavours that occur.
But go just a little bit too far and it becomes

69
00:06:30,069 --> 00:06:31,069
burnt.

70
00:06:31,069 --> 00:06:36,710
We also see this in engineering systems. Often,
our optimal point of operation is right at

71
00:06:36,710 --> 00:06:41,599
the edge of a cliff, and if we go just a little
bit further, we fall over to the edge of the

72
00:06:41,599 --> 00:06:46,949
cliff and see our outcome value drop down
rapidly. Another good reason to take small

73
00:06:46,949 --> 00:06:48,689
steps near the optimum.

74
00:06:48,689 --> 00:06:54,069
A fourth way to detect curvature is that our
model does not fit the surface very well.

75
00:06:54,069 --> 00:06:59,990
A linear model cannot fit a curved surface
well. And we use the terminology, "lack of

76
00:06:59,990 --> 00:07:06,979
fit", to quantify that. Let me show you. In
our first factorial, the center point was

77
00:07:06,979 --> 00:07:15,180
$407, but the predicted center point was $390.
That's a difference of $17.

78
00:07:15,180 --> 00:07:20,729
Now that might seem large, but it really isn't
when we compare it to the main effect of 55

79
00:07:20,729 --> 00:07:28,770
and 134. Recall what the interpretation of
that number 55 is again? So a $17 difference

80
00:07:28,770 --> 00:07:33,710
really is small, indicating a small lack of
fit.

81
00:07:33,710 --> 00:07:41,759
In the second factorial, the actual center
was $657 while the predicted center was $645.

82
00:07:41,759 --> 00:07:47,830
A difference of $12. That again is small when
compared to the neighbourhood we're in.

83
00:07:47,830 --> 00:07:53,460
In this third factorial though, the actual
center is at the average of these three baseline

84
00:07:53,460 --> 00:08:01,089
values, $734. Compare that to the predicted
center value of $724. That's a difference

85
00:08:01,089 --> 00:08:07,819
of $10. Which when compared to the largest
effect of 7.5 and to the level of noise of

86
00:08:07,819 --> 00:08:13,839
about $5, indicates an important deviation
in the model, versus the actual surface that

87
00:08:13,839 --> 00:08:17,080
we're on, at least in the center.

88
00:08:17,080 --> 00:08:21,689
So if we're getting large deviations at the
center, we cannot hope to get good predictions

89
00:08:21,689 --> 00:08:26,930
outside of the range of the model. And good
predictions are essential to optimize in the

90
00:08:26,930 --> 00:08:29,599
correct direction.

91
00:08:29,599 --> 00:08:34,710
So there are four ways that we've shown to
check for inadequacy in the model. And those

92
00:08:34,710 --> 00:08:39,440
of you with a statistical background can go
calculate the confidence intervals on the

93
00:08:39,440 --> 00:08:44,720
model coefficients, and observe that they're
very wide. None of the terms in the model

94
00:08:44,720 --> 00:08:46,649
are statistically significant.

95
00:08:46,649 --> 00:08:52,470
Well, as we saw in the single-variable popcorn
example, when faced with a poorly predicting

96
00:08:52,470 --> 00:08:58,079
model in a region that has curvature, we can
add terms to account for the nonlinearity:

97
00:08:58,079 --> 00:09:01,690
"quadratic terms". So let's go add these now.

98
00:09:01,690 --> 00:09:08,449
There are two options: adding points on the
face of the cube, or adding points a little

99
00:09:08,449 --> 00:09:15,060
bit further called "axial points" or "star
points". These points are at a distance denoted

100
00:09:15,060 --> 00:09:20,750
as alpha from the center. Alpha is a value
greater than 1 to ensure they are outside

101
00:09:20,750 --> 00:09:22,670
the cube.

102
00:09:22,670 --> 00:09:28,170
The design on the left works well if you hit
into a constraint, or can not leave the factorial

103
00:09:28,170 --> 00:09:34,070
space. The design on the right, comes from
a class of designs called central composite

104
00:09:34,070 --> 00:09:40,750
designs or CCD, and they're preferred for
the statistical reason called rotatability.

105
00:09:40,750 --> 00:09:46,980
Just a quick aside, rotatability simply means
that the prediction error is equal for any

106
00:09:46,980 --> 00:09:52,329
two points that are the same distance from
the center. And it's a desirable statistical

107
00:09:52,329 --> 00:09:53,019
property.

108
00:09:53,019 --> 00:09:59,810
Now, there are various choices on the distance
alpha and the number of center points to use,

109
00:09:59,810 --> 00:10:04,920
but that's a messy discussion that you can
research quite easily. The general advice

110
00:10:04,920 --> 00:10:10,980
is this though: run the factorials first;
then run the star points afterwards at a distance

111
00:10:10,980 --> 00:10:16,300
of alpha equal to 2\^k taken to the power
of 0.25.

112
00:10:16,300 --> 00:10:22,940
So, if you have two factors, alpha = 1.41,
and if you had three factors, you would have

113
00:10:22,940 --> 00:10:31,019
alpha = 1.68. Also, add three to four center
points to assess lack of fit. And run these

114
00:10:31,019 --> 00:10:35,310
center points at different times, not all
after each other.

115
00:10:35,310 --> 00:10:43,000
Notice this though, from the individual perspective
of factor T and from factor P, each of these

116
00:10:43,000 --> 00:10:49,800
have runs at five distinct levels, and that's
what helps us go accurately fit that quadratic

117
00:10:49,800 --> 00:10:51,959
model.

118
00:10:51,959 --> 00:10:58,529
Let's go do this! The first star point is
run number 18 at a value of +alpha for factor

119
00:10:58,529 --> 00:11:06,259
T in coded units, and a value of zero in factor
P. Let's add that to the table, and also calculate

120
00:11:06,259 --> 00:11:13,079
the real world units for it in the usual way.
So that's 343 parts per hour, and a sales

121
00:11:13,079 --> 00:11:20,740
price of $1.63. You can go practice reproducing
the other three star points, and let's go

122
00:11:20,740 --> 00:11:26,600
add one final center point experiment, number
22, so that we have a total of four center

123
00:11:26,600 --> 00:11:28,190
points.

124
00:11:28,190 --> 00:11:33,139
Now we go run these experiments, in random
order of course, and report the values here

125
00:11:33,139 --> 00:11:40,350
in standard order. Notice firstly that the
center point 22 is similar to the prior values

126
00:11:40,350 --> 00:11:44,370
indicating that the system is still stable
and reproducible.

127
00:11:44,370 --> 00:11:48,730
Well we've got quite the collection of data
here. A central composite design (CCD), always

128
00:11:48,730 --> 00:11:54,190
has the factorial points, center points, and
star points. Now I've arranged them in that

129
00:11:54,190 --> 00:11:56,629
order in the R code.

130
00:11:56,629 --> 00:12:02,730
When we run that code, we get the quadratic
model from them all. I will leave it as a

131
00:12:02,730 --> 00:12:06,879
small challenge to you, to go prove the following
two things.

132
00:12:06,879 --> 00:12:13,029
Firstly, the model's prediction of the center
point, when compared to the average of the

133
00:12:13,029 --> 00:12:20,100
four center points has a very small deviation.
So this model fits well, at least at the center.

134
00:12:20,100 --> 00:12:26,079
Secondly, this quadratic model's prediction
of the other points, for example, one of the

135
00:12:26,079 --> 00:12:31,810
corner points, or one of the star points,
or even experiment 15 over here, is a very

136
00:12:31,810 --> 00:12:37,690
good prediction. There is little prediction
error. So we have confidence in this model's

137
00:12:37,690 --> 00:12:39,190
prediction.

138
00:12:39,190 --> 00:12:44,769
Now let's go visualize those as a contour
plots. And right away, we can see we are in

139
00:12:44,769 --> 00:12:50,129
fact near the optimum. Visually, the axial
point is pretty close to the predicted optimum

140
00:12:50,129 --> 00:12:56,129
region from the model. That's good enough
to stop here and use as our optimum.

141
00:12:56,129 --> 00:13:01,110
But let's say the quadratic model had looked
like this one instead. Then you would go run

142
00:13:01,110 --> 00:13:06,560
your next experiment over here based on the
model at that predicted optimum. And then

143
00:13:06,560 --> 00:13:11,170
you would go verify the model's prediction
ability at that point to check that you've

144
00:13:11,170 --> 00:13:13,569
reached the optimum.

145
00:13:13,569 --> 00:13:17,480
Now we can be a bit more precise -- for those
of you who don't like to trust the visual

146
00:13:17,480 --> 00:13:23,589
judgement. We can take this quadratic equation,
differentiate it with respect to the coded

147
00:13:23,589 --> 00:13:29,519
variables, set it equal to zero, and you will
get a set of two linear equations and two

148
00:13:29,519 --> 00:13:34,500
unknowns, which you can then solve using your
favourite linear algebra software, or by hand.

149
00:13:34,500 --> 00:13:40,060
When you go do that, you get the predicted
optimum at 343 parts per hour, and a selling

150
00:13:40,060 --> 00:13:48,470
price of $1.59. The quadratic model tells
us to expect a profit of $740 at this point.

151
00:13:48,470 --> 00:13:55,240
Running that 23rd experiment gives an actual
profit of $739; that's very close agreement.

152
00:13:55,240 --> 00:14:00,439
This is definitely the largest value we've
observed over the entire approach followed.

153
00:14:00,439 --> 00:14:04,790
So this video has answered the last question
we had in an earlier video in this module:

154
00:14:04,790 --> 00:14:09,839
"How do we know when to stop?" We know that
we can stop when our model matches the surface

155
00:14:09,839 --> 00:14:14,329
well; and the model predicts an optimum. Using
the model, we know that we've reached the

156
00:14:14,329 --> 00:14:22,600
peak of the mountain, even though we cannot
see the actual mountain around us .

157
00:14:22,600 --> 00:14:27,990
So let's recap our entire approach. Start
by building successive linear models, shown

158
00:14:27,990 --> 00:14:33,540
here in blue, green, and orange, respectively.
I'm showing you the prediction contours in

159
00:14:33,540 --> 00:14:39,819
those colours for the local region around
each model. Each of those local models had

160
00:14:39,819 --> 00:14:43,519
their baseline or 0-0 value.

161
00:14:43,519 --> 00:14:48,199
These past videos have also shown that we
should incorporate the baseline points, as

162
00:14:48,199 --> 00:14:54,149
well as other points in the neighbourhood
in our model, to help improve their estimates.

163
00:14:54,149 --> 00:14:59,980
We use our models as long as we have confidence
in their predictions. We rebuild the model

164
00:14:59,980 --> 00:15:04,970
once we demonstrate those predictions are
poor, judged by comparing the predictions

165
00:15:04,970 --> 00:15:09,509
to the actual values, and taking noise into
account.

166
00:15:09,509 --> 00:15:15,569
As we approach the optimum, issues regarding
curvature, which we studied in four points,

167
00:15:15,569 --> 00:15:22,069
become apparent. We have to change our strategy.
If we pick up we have curvature, based on

168
00:15:22,069 --> 00:15:28,740
these criteria, we have to start decreasing
our step size and to start fitting quadratic

169
00:15:28,740 --> 00:15:29,600
models.

170
00:15:29,600 --> 00:15:35,620
The principle of an optimum is that it's nonlinear.
Points around us must be lower. And so our

171
00:15:35,620 --> 00:15:42,170
last prediction model that we build, shown
here in red, illustrates that quite nicely.

172
00:15:42,170 --> 00:15:47,459
To end off with though, let me show you the
true surface in a grey colour. This is obviously

173
00:15:47,459 --> 00:15:52,350
something you would never seen in practice.
But seeing it here gives you good confidence

174
00:15:52,350 --> 00:15:55,649
that we're doing the right thing all along.

175
00:15:55,649 --> 00:16:00,170
You can see how the models in blue, green,
and orange approximated the non-linear surface

176
00:16:00,170 --> 00:16:06,980
very well in their local region. Outside their
local neighbourhood, they start to deviate.

177
00:16:06,980 --> 00:16:13,240
The non-linear model fits the surface over
a wider region. That isn't too surprising.

178
00:16:13,240 --> 00:16:19,779
The information to build that non-linear model,
required four plus four, plus four, or 12

179
00:16:19,779 --> 00:16:24,569
experiments. And we use that non-linear model
to place our final experiment(s) very close

180
00:16:24,569 --> 00:16:26,740
to the true optimum.

181
00:16:26,740 --> 00:16:32,180
To end this video, I will add one point: the
real optimum, may move.

182
00:16:32,180 --> 00:16:38,279
Our system could deteriorate and change, so
that optimum that you found - won't stay there.

183
00:16:38,279 --> 00:16:44,540
They are experimental tools that continually
keep searching and moving towards the optimum.

184
00:16:44,540 --> 00:16:50,420
We won't have time to cover them in this course,
but the topic of Evolutionary Operation (EVOP)

185
00:16:50,420 --> 00:16:56,620
is what you should search for if that interests
you. It is particularly applicable to manufacturing

186
00:16:56,620 --> 00:17:02,370
systems that are never stable. That mountain
is moving and you have to move as well in

187
00:17:02,370 --> 00:17:03,749
order to remain at the peak.