1
00:00:00,306 --> 00:00:04,166
Okay, so in today's class we're going to
look at how we can build a predictive model

2
00:00:04,166 --> 00:00:07,196
for our experimental system,
but using computer software.

3
00:00:08,156 --> 00:00:10,826
In prior videos I showed you
how we did it all by hand,

4
00:00:11,396 --> 00:00:13,806
now it's time to make the
computer do the work for us.

5
00:00:15,126 --> 00:00:18,116
There are a variety of software
options that we might use,

6
00:00:18,116 --> 00:00:20,206
and there are forum discussions about this.

7
00:00:20,206 --> 00:00:25,706
However, for this course we have chosen
to use R. The R language is free software,

8
00:00:26,066 --> 00:00:31,276
is fairly user-friendly, but most importantly
it is heavily used by a wide variety

9
00:00:31,276 --> 00:00:33,496
of companies and statistical researchers.

10
00:00:34,406 --> 00:00:39,586
You will need 2 downloads: first
download R itself from this link;

11
00:00:40,126 --> 00:00:43,456
then secondly, download RStudio from this link.

12
00:00:43,456 --> 00:00:45,966
Install both software packages.

13
00:00:46,506 --> 00:00:51,766
We won't actually use R directly; instead,
we will use RStudio, which will call,

14
00:00:51,766 --> 00:00:53,806
and run, R in the background for us.

15
00:00:53,806 --> 00:00:58,656
After installing RStudio, please open it
on your computer and you will have a screen

16
00:00:58,656 --> 00:01:00,166
that appears similar to this one.

17
00:01:00,166 --> 00:01:04,866
Just to show you how flexible R is,
you can even run it from a website.

18
00:01:04,866 --> 00:01:09,306
If you don't want to install it, or if
you cannot install it because you are

19
00:01:09,306 --> 00:01:13,676
on a work computer, you can always go
to this link to run it interactively:

20
00:01:13,676 --> 00:01:18,006
http://yint.org/Rweb OK,
so open RStudio and start

21
00:01:18,006 --> 00:01:21,146
by creating a new R script
under the "File" menu.

22
00:01:21,146 --> 00:01:23,686
This is where you will write your commands.

23
00:01:23,686 --> 00:01:26,146
I want to emphasize 2 things.

24
00:01:26,146 --> 00:01:33,066
First, there is a common trap we have to tell
you about: commands in R are case-sensitive.

25
00:01:33,116 --> 00:01:41,436
For example, this command c(1, 2, 3, 4)
will work to create a list with 4 entries,

26
00:01:42,326 --> 00:01:46,206
but if you use a capital
C(1,2,3,4), it will not.

27
00:01:46,206 --> 00:01:50,466
For this reason, whenever
we show code in the videos,

28
00:01:50,606 --> 00:01:53,236
we will also give a link for
you to download the code.

29
00:01:53,966 --> 00:01:57,116
Go to that website link and
rather copy and paste the code.

30
00:01:57,856 --> 00:02:00,156
Initially, don't type it in yourself.

31
00:02:00,156 --> 00:02:03,726
Later on, of course, you are free to type
in the commands, when you have confidence.

32
00:02:03,726 --> 00:02:08,446
For example, all the code used in this
video is available at the link shown.

33
00:02:08,606 --> 00:02:14,326
Secondly, if you ever need help, use
the help command in R, as shown here.

34
00:02:15,106 --> 00:02:17,426
As you can see now on the
screen, there is a difference

35
00:02:17,426 --> 00:02:20,186
between lowercase and uppercase "c" commands.

36
00:02:21,366 --> 00:02:22,886
OK, are you ready to get started?

37
00:02:23,296 --> 00:02:24,926
We're actually going to work backwards.

38
00:02:24,926 --> 00:02:29,656
First create the prediction model
called "popped_corn" by saying

39
00:02:29,656 --> 00:02:35,236
"popped_corn" backwards arrow,
"lm"; open brackets; "y";

40
00:02:35,636 --> 00:02:41,146
is predicted by A + B + A*B; close brackets.

41
00:02:42,696 --> 00:02:46,826
Notice how this is similar to the prediction
model we wrote by hand in the prior video.

42
00:02:48,006 --> 00:02:52,066
Now if this is your first time with R
this can be a little bit intimidating.

43
00:02:52,856 --> 00:02:54,316
There's a few things to consider.

44
00:02:54,316 --> 00:02:56,236
First is the backwards arrow (<-).

45
00:02:56,236 --> 00:03:02,326
It is actually a less than symbol (<) next to a
dash (-), making it look like a backwards arrow.

46
00:03:03,206 --> 00:03:06,286
In R that represents the assignment operation.

47
00:03:07,016 --> 00:03:12,636
In other words, we're going to create a variable
called "popped_corn", and assign it whatever is

48
00:03:12,636 --> 00:03:15,486
on the right hand side; in
this case a linear model.

49
00:03:16,266 --> 00:03:19,546
The "lm" over there on the
right stands for "linear model",

50
00:03:19,926 --> 00:03:24,266
indicating we want a least squares model,
which is really just a type of linear model.

51
00:03:24,996 --> 00:03:29,626
And lastly, the symbol here in the middle,
The tilde (~) can be interpreted as,

52
00:03:29,946 --> 00:03:32,976
"is predicted by" or "is described by".

53
00:03:34,026 --> 00:03:35,756
Now let's try running this R command.

54
00:03:36,036 --> 00:03:40,126
Highlight the line you've just typed, and
then click that "Run" button over there.

55
00:03:41,046 --> 00:03:44,596
What you'll see is an error
message, "object 'y' not found",

56
00:03:45,276 --> 00:03:49,246
which indicates that the software does
not know what the variable "y" is.

57
00:03:50,246 --> 00:03:52,206
We haven't defined "y" just yet.

58
00:03:52,206 --> 00:03:58,656
In fact, we have also not defined variable
A and variable B. So let's go do that now.

59
00:03:59,726 --> 00:04:05,226
Once again, use the assignment operator,
the backwards arrow (<-), to do this.

60
00:04:05,226 --> 00:04:11,446
Take a look at the prior video: we saw that
variable A for the 4 experiments was -1, +1,

61
00:04:11,726 --> 00:04:13,806
-1 and +1 from the standard order table.

62
00:04:13,806 --> 00:04:16,616
Let's go put these 4 numbers in a list,

63
00:04:16,786 --> 00:04:20,186
using the lowercase "c" command,
which stands for "combine".

64
00:04:20,286 --> 00:04:28,266
So write "c", and then put those 4 numbers in
between brackets, separated by commas: c(-1, +1,

65
00:04:28,266 --> 00:04:33,776
-1, +1) In a similar way, factor B from the
table had -1, -1, +1, +1, let's type that in:

66
00:04:33,776 --> 00:04:38,506
c(-1, -1, +1, +1) Now highlight
those two commands,

67
00:04:38,506 --> 00:04:41,516
and click "Run" to see what
R does with those commands.

68
00:04:42,226 --> 00:04:43,596
We didn't get an error message.

69
00:04:43,596 --> 00:04:49,526
If we go to the console region here, and
type the letter capital A and capital B,

70
00:04:49,916 --> 00:04:53,596
we see those two lists repeated back to us.

71
00:04:53,596 --> 00:04:56,596
Actually, we can also see them up
here, in the "Environment" tab.

72
00:04:58,366 --> 00:05:00,706
We still need to create the variable called "y".

73
00:05:01,406 --> 00:05:06,406
The variable "y" contains a list of the numbers
that represents the outcome of the experiments.

74
00:05:07,856 --> 00:05:10,556
Once again, we get that from
our standard order table.

75
00:05:11,576 --> 00:05:17,146
Note that variables A, and B, and "y" have
the same logical order from that table.

76
00:05:17,146 --> 00:05:20,806
It's very easy to create these
models, because we can just go ahead

77
00:05:20,806 --> 00:05:23,566
and copy-and-paste directly
from that standard order table.

78
00:05:25,056 --> 00:05:27,766
Finally, we are ready now
to go run all the commands.

79
00:05:28,386 --> 00:05:31,796
Another nice shortcut in RStudio
is to click the "Source" button.

80
00:05:32,456 --> 00:05:34,556
That will run all our commands in one go.

81
00:05:35,196 --> 00:05:38,066
In fact, there is "Source"
and "Source with Echo".

82
00:05:38,066 --> 00:05:43,566
If you are new to R, please use the second
option, which will echo (in other words

83
00:05:43,566 --> 00:05:46,896
"rewrite") all the commands
into the console with you.

84
00:05:47,626 --> 00:05:50,016
That way, if there is a mistake
in one of your lines,

85
00:05:50,016 --> 00:05:52,026
you will see exactly where the problem is.

86
00:05:53,616 --> 00:05:58,336
Now let's got inspect the result, particularly
what that "popped_corn" variable is.

87
00:05:58,746 --> 00:06:01,716
Go down to the console window
and type "popped_corn",

88
00:06:02,326 --> 00:06:04,546
and we will see our least-squares
model over there.

89
00:06:05,336 --> 00:06:08,226
The output from that command
shows us the prediction model.

90
00:06:08,896 --> 00:06:15,206
It has an intercept of 67; a main effect for A
of 10 units; a main effect for B with a value

91
00:06:15,206 --> 00:06:18,996
of 4; and then the two factor
interaction effect AB.

92
00:06:19,946 --> 00:06:23,566
Notice that these numbers match exactly
what we calculated by hand earlier.

93
00:06:23,566 --> 00:06:28,406
So there you have it: a really quick way
to get the model with computer software.

94
00:06:29,736 --> 00:06:32,656
We can also use the "summary(...)"
command, to get more information.

95
00:06:33,486 --> 00:06:39,176
It shows us the original formula we used, when
we built the model; it shows us the residuals,

96
00:06:39,446 --> 00:06:41,936
which we'll talk about later on in the course.

97
00:06:41,936 --> 00:06:46,756
And there we see the same 4
parameters again: 67, 10, 4, and -1.

98
00:06:46,756 --> 00:06:49,896
And, there's a bit more of
statistical output down here for those

99
00:06:49,896 --> 00:06:51,286
of you that know what that is about.

100
00:06:52,596 --> 00:06:56,586
A key result I want to point out right
now, is that there are 4 experiments,

101
00:06:56,586 --> 00:07:00,716
and we estimated 4 parameters:
the intercept, the A effect,

102
00:07:00,716 --> 00:07:02,806
the B effect, and the AB interaction.

103
00:07:03,926 --> 00:07:08,006
I would also like to point out one thing
here, where we specify the linear model.

104
00:07:08,666 --> 00:07:12,656
There is a term for A, for
B, and the AB interaction.

105
00:07:13,516 --> 00:07:17,356
But you notice there isn't
actually a term for the intercept.

106
00:07:17,356 --> 00:07:19,226
R will automatically add it for you.

107
00:07:19,226 --> 00:07:24,096
So even if you see 3 terms here in your
input, you will get estimates for 4 parameters

108
00:07:24,096 --> 00:07:29,616
from R. Another nice shortcut that you can try
- and we will explain this in later videos -

109
00:07:30,036 --> 00:07:35,576
is that you can create your model by saying:
lm; open brackets; y; is predicted by (~);

110
00:07:35,576 --> 00:07:38,216
and then only write A*B; then close brackets.

111
00:07:38,216 --> 00:07:42,146
Try that out and see what you get.

112
00:07:43,656 --> 00:07:47,446
I want to end this video by stating that
you could have used other computer software

113
00:07:47,446 --> 00:07:48,896
to build the least-squares model.

114
00:07:48,896 --> 00:07:56,396
For example Excel, Python, Minitab, MATLAB,
SAS, JMP, or any of the other design

115
00:07:56,396 --> 00:07:58,816
of experiment software that
are commercially available.

116
00:08:00,396 --> 00:08:03,426
You should get exactly these same
parameters from the software.

117
00:08:04,006 --> 00:08:07,846
That's a good test if you're trying out
one of those other software packages.

118
00:08:07,846 --> 00:08:13,226
So we have learned here how to get the
basic results from a two factor experiment.

119
00:08:13,226 --> 00:08:16,406
Before we end this video, I
want to challenge you though.

120
00:08:17,236 --> 00:08:21,796
Use the R software to repeat the numeric
analysis for the ginger biscuits example

121
00:08:21,796 --> 00:08:24,576
that we had in a prior class (video 2C).

122
00:08:24,576 --> 00:08:30,296
Here are the raw experimental data and
recall, this was a predictive model for taste.

123
00:08:30,296 --> 00:08:37,536
Are you able to reproduce the results
we calculated by hand: 5.25, 1.75, 1.25,

124
00:08:37,536 --> 00:08:40,176
and 0.75 for the four parameters?

125
00:08:41,166 --> 00:08:44,426
Make sure you can reproduce this
before continuing to the next video.