1
00:00:00,136 --> 00:00:02,326
My goal with this video is to show you

2
00:00:02,326 --> 00:00:06,496
where the predictive model we calculate
using computer software comes from.

3
00:00:06,556 --> 00:00:09,606
This predictive model is
called a least-squares model.

4
00:00:09,606 --> 00:00:12,316
And these models are widely used in companies.

5
00:00:12,886 --> 00:00:17,536
You've certainly seen them if you've
taken a basic math or statistics class.

6
00:00:17,536 --> 00:00:20,666
Quickly watch this video, even
if you understand least squares.

7
00:00:20,666 --> 00:00:24,096
If you have limited experience
though with least-squares,

8
00:00:24,376 --> 00:00:27,866
take a moment to see the extra
resources we've posted for you.

9
00:00:27,866 --> 00:00:31,336
We certainly want to give
you as much help as we can.

10
00:00:31,866 --> 00:00:34,866
Now in the videos in the prior
module, we were looking at popcorn.

11
00:00:35,226 --> 00:00:38,616
And I'm going to use that
example again in this class.

12
00:00:38,616 --> 00:00:43,606
In the popcorn experiment, our objective was
to maximize the amount of popcorn created.

13
00:00:44,346 --> 00:00:47,176
Our outcome variable was the
number of popped kernels.

14
00:00:47,826 --> 00:00:51,666
Here is the cube plot, and the corresponding
predictive model that we created.

15
00:00:53,076 --> 00:00:58,866
The predictive model has four
parameters: 67, 10, 4, and -1.

16
00:00:58,916 --> 00:01:04,296
67 was the baseline amount, the average
of all four experimental outcomes.

17
00:01:04,916 --> 00:01:08,246
We also refer to that as the intercept,
and you'll see why in a minute.

18
00:01:09,306 --> 00:01:12,206
"10" is the effect of factor
A, the cooking time.

19
00:01:12,206 --> 00:01:17,486
This is what we call the main effect
for factor A. "4" is the effect

20
00:01:17,486 --> 00:01:20,016
of factor B, the kind of popcorn we used.

21
00:01:20,526 --> 00:01:23,796
And lastly, the "-1" is the
two factor interaction term.

22
00:01:23,796 --> 00:01:26,966
Do you recall how we calculated
these numbers by hand?

23
00:01:27,636 --> 00:01:30,296
Go back to the videos in the
previous module if you are not sure.

24
00:01:30,296 --> 00:01:37,096
The most general form of the least squares
model for this system is y equals b_0,

25
00:01:37,096 --> 00:01:46,256
plus b_A times x_A, plus b_B times
x_B, plus b_{AB} times x_A times x_B.

26
00:01:47,996 --> 00:01:53,396
The x_A is the coded value for factor A, and
it represents the amount of cooking time.

27
00:01:53,396 --> 00:01:58,626
If x_A = -1, that represents
160 seconds of time.

28
00:01:59,066 --> 00:02:03,056
And x_A = +1 represents 200
seconds of cooking time.

29
00:02:03,366 --> 00:02:08,526
The "-1" and "+1" are called
coded units and the 160 seconds

30
00:02:08,526 --> 00:02:11,826
and 200 seconds are called real world units.

31
00:02:11,826 --> 00:02:17,486
Note that we can not use real world units
in this equation only the coded units.

32
00:02:17,486 --> 00:02:18,876
Similarly for x_B.

33
00:02:19,246 --> 00:02:24,276
It is coded so that "-1" represents white
corn and plus one represents yellow corn.

34
00:02:24,276 --> 00:02:30,656
Similar to the x_A case, the -1 and +1
are the coded units, while white corn

35
00:02:30,656 --> 00:02:33,716
and yellow corn are the real-world units.

36
00:02:33,716 --> 00:02:38,956
Recall that with categorical variables
we assigned the -1 and +1 arbitrarily.

37
00:02:38,956 --> 00:02:43,146
The sign of the coded unit will not
change the model's interpretation.

38
00:02:44,456 --> 00:02:47,626
Now take a look at what happens
if I write that equation down,

39
00:02:47,756 --> 00:02:50,466
for each of the four experimental
points in the system.

40
00:02:51,616 --> 00:02:55,696
We can substitute in values for the coded
units into this prediction equation.

41
00:02:55,696 --> 00:03:03,606
For the first experiment, for example, we
would have y_1 equals b_0 _ b_A times x_{A-},

42
00:03:03,606 --> 00:03:10,286
plus b_B, times x_{B-}, plus
b_{AB} times x_{A-} times x_{B-}.

43
00:03:10,466 --> 00:03:13,546
That's because x_A is at the minus level,

44
00:03:13,546 --> 00:03:16,486
and x_B is at the minus level,
for the first experiment.

45
00:03:17,676 --> 00:03:22,246
We can repeat this process for the other three
points in the cube, as shown here on the screen.

46
00:03:24,076 --> 00:03:30,686
Now let's go substitute in -1, or +1, for the
factors A and B, and we will get four equations.

47
00:03:31,306 --> 00:03:34,346
Notice that the 4 equations
have 4 unknown parameters.

48
00:03:34,676 --> 00:03:38,196
b_0, b_A, b_B, and b_{AB}.

49
00:03:38,196 --> 00:03:43,536
If you have some mathematical background,
you will recall that four equations

50
00:03:43,536 --> 00:03:47,366
with four unknowns represents a
set of equations that we can solve.

51
00:03:48,306 --> 00:03:52,856
These equations are linear, and so they're
very efficiently solved using matrix methods.

52
00:03:53,846 --> 00:03:54,806
Let me show you how.

53
00:03:55,866 --> 00:03:59,406
In matrix form, the equations are
written as shown here on the screen.

54
00:03:59,406 --> 00:04:01,996
Three things quickly become apparent.

55
00:04:02,086 --> 00:04:05,856
Firstly, we notice a column
of 1's in the first column.

56
00:04:06,036 --> 00:04:10,026
That corresponds to this
parameter: b_0, the intercept.

57
00:04:10,026 --> 00:04:13,826
Next we notice that the second
and third columns, in other words,

58
00:04:13,826 --> 00:04:16,226
the columns that correspond
to the parameters for A

59
00:04:16,226 --> 00:04:19,346
and B are simply the columns
from the standard order table.

60
00:04:20,206 --> 00:04:25,166
And finally the last column corresponds
to the two factor interaction for AB.

61
00:04:26,256 --> 00:04:33,226
You'll notice that this is simply the column for
A, multiplied by the column for B. This comes

62
00:04:33,226 --> 00:04:37,306
from minus minus is plus;
plus times minus is minus.

63
00:04:37,666 --> 00:04:43,186
Minus times plus is minus; and
finally, plus times plus is plus.

64
00:04:43,186 --> 00:04:49,966
This entire set of equations can be written as
vector "y" equals matrix "X" times vector "b".

65
00:04:51,166 --> 00:04:57,126
Now for those of you with some background in
least-squares, will realize that the solution

66
00:04:57,126 --> 00:05:03,896
to this set of equations is 
b = (X^T* X)^{-1} multiplied by (X^T * y).

67
00:05:04,096 --> 00:05:06,486
If you don't have that experience, don't worry.

68
00:05:06,836 --> 00:05:10,676
The computer software, will solve these
equations very efficiently for us.

69
00:05:10,916 --> 00:05:12,096
That's what computers are good for.

70
00:05:12,096 --> 00:05:17,106
All we require is the "X"
matrix and the "y" vector.

71
00:05:17,336 --> 00:05:21,536
And we have these, the "X" matrix is
assembled from the standard order table,

72
00:05:21,606 --> 00:05:25,026
and the "y" vector is simply
the four experimental outcomes.

73
00:05:25,196 --> 00:05:28,586
The software will calculate these
four parameters, In other words,

74
00:05:28,646 --> 00:05:30,466
the four entries in the vector "b".

75
00:05:30,776 --> 00:05:38,366
Those corresponds to b_0, the intercept, b_A,
b_B, and b_AB for the two factor interaction.

76
00:05:38,546 --> 00:05:41,416
So now we are ready to use
the computer software.

77
00:05:41,836 --> 00:05:45,546
Please watch the next video to see
how those 4 parameters are calculated.