1 00:00:00,306 --> 00:00:04,166 Okay, so in today's class we're going to look at how we can build a predictive model 2 00:00:04,166 --> 00:00:07,196 for our experimental system, but using computer software. 3 00:00:08,156 --> 00:00:10,826 In prior videos I showed you how we did it all by hand, 4 00:00:11,396 --> 00:00:13,806 now it's time to make the computer do the work for us. 5 00:00:15,126 --> 00:00:18,116 There are a variety of software options that we might use, 6 00:00:18,116 --> 00:00:20,206 and there are forum discussions about this. 7 00:00:20,206 --> 00:00:25,706 However, for this course we have chosen to use R. The R language is free software, 8 00:00:26,066 --> 00:00:31,276 is fairly user-friendly, but most importantly it is heavily used by a wide variety 9 00:00:31,276 --> 00:00:33,496 of companies and statistical researchers. 10 00:00:34,406 --> 00:00:39,586 You will need 2 downloads: first download R itself from this link; 11 00:00:40,126 --> 00:00:43,456 then secondly, download RStudio from this link. 12 00:00:43,456 --> 00:00:45,966 Install both software packages. 13 00:00:46,506 --> 00:00:51,766 We won't actually use R directly; instead, we will use RStudio, which will call, 14 00:00:51,766 --> 00:00:53,806 and run, R in the background for us. 15 00:00:53,806 --> 00:00:58,656 After installing RStudio, please open it on your computer and you will have a screen 16 00:00:58,656 --> 00:01:00,166 that appears similar to this one. 17 00:01:00,166 --> 00:01:04,866 Just to show you how flexible R is, you can even run it from a website. 18 00:01:04,866 --> 00:01:09,306 If you don't want to install it, or if you cannot install it because you are 19 00:01:09,306 --> 00:01:13,676 on a work computer, you can always go to this link to run it interactively: 20 00:01:13,676 --> 00:01:18,006 http://yint.org/Rweb OK, so open RStudio and start 21 00:01:18,006 --> 00:01:21,146 by creating a new R script under the "File" menu. 22 00:01:21,146 --> 00:01:23,686 This is where you will write your commands. 23 00:01:23,686 --> 00:01:26,146 I want to emphasize 2 things. 24 00:01:26,146 --> 00:01:33,066 First, there is a common trap we have to tell you about: commands in R are case-sensitive. 25 00:01:33,116 --> 00:01:41,436 For example, this command c(1, 2, 3, 4) will work to create a list with 4 entries, 26 00:01:42,326 --> 00:01:46,206 but if you use a capital C(1,2,3,4), it will not. 27 00:01:46,206 --> 00:01:50,466 For this reason, whenever we show code in the videos, 28 00:01:50,606 --> 00:01:53,236 we will also give a link for you to download the code. 29 00:01:53,966 --> 00:01:57,116 Go to that website link and rather copy and paste the code. 30 00:01:57,856 --> 00:02:00,156 Initially, don't type it in yourself. 31 00:02:00,156 --> 00:02:03,726 Later on, of course, you are free to type in the commands, when you have confidence. 32 00:02:03,726 --> 00:02:08,446 For example, all the code used in this video is available at the link shown. 33 00:02:08,606 --> 00:02:14,326 Secondly, if you ever need help, use the help command in R, as shown here. 34 00:02:15,106 --> 00:02:17,426 As you can see now on the screen, there is a difference 35 00:02:17,426 --> 00:02:20,186 between lowercase and uppercase "c" commands. 36 00:02:21,366 --> 00:02:22,886 OK, are you ready to get started? 37 00:02:23,296 --> 00:02:24,926 We're actually going to work backwards. 38 00:02:24,926 --> 00:02:29,656 First create the prediction model called "popped_corn" by saying 39 00:02:29,656 --> 00:02:35,236 "popped_corn" backwards arrow, "lm"; open brackets; "y"; 40 00:02:35,636 --> 00:02:41,146 is predicted by A + B + A*B; close brackets. 41 00:02:42,696 --> 00:02:46,826 Notice how this is similar to the prediction model we wrote by hand in the prior video. 42 00:02:48,006 --> 00:02:52,066 Now if this is your first time with R this can be a little bit intimidating. 43 00:02:52,856 --> 00:02:54,316 There's a few things to consider. 44 00:02:54,316 --> 00:02:56,236 First is the backwards arrow (<-). 45 00:02:56,236 --> 00:03:02,326 It is actually a less than symbol (<) next to a dash (-), making it look like a backwards arrow. 46 00:03:03,206 --> 00:03:06,286 In R that represents the assignment operation. 47 00:03:07,016 --> 00:03:12,636 In other words, we're going to create a variable called "popped_corn", and assign it whatever is 48 00:03:12,636 --> 00:03:15,486 on the right hand side; in this case a linear model. 49 00:03:16,266 --> 00:03:19,546 The "lm" over there on the right stands for "linear model", 50 00:03:19,926 --> 00:03:24,266 indicating we want a least squares model, which is really just a type of linear model. 51 00:03:24,996 --> 00:03:29,626 And lastly, the symbol here in the middle, The tilde (~) can be interpreted as, 52 00:03:29,946 --> 00:03:32,976 "is predicted by" or "is described by". 53 00:03:34,026 --> 00:03:35,756 Now let's try running this R command. 54 00:03:36,036 --> 00:03:40,126 Highlight the line you've just typed, and then click that "Run" button over there. 55 00:03:41,046 --> 00:03:44,596 What you'll see is an error message, "object 'y' not found", 56 00:03:45,276 --> 00:03:49,246 which indicates that the software does not know what the variable "y" is. 57 00:03:50,246 --> 00:03:52,206 We haven't defined "y" just yet. 58 00:03:52,206 --> 00:03:58,656 In fact, we have also not defined variable A and variable B. So let's go do that now. 59 00:03:59,726 --> 00:04:05,226 Once again, use the assignment operator, the backwards arrow (<-), to do this. 60 00:04:05,226 --> 00:04:11,446 Take a look at the prior video: we saw that variable A for the 4 experiments was -1, +1, 61 00:04:11,726 --> 00:04:13,806 -1 and +1 from the standard order table. 62 00:04:13,806 --> 00:04:16,616 Let's go put these 4 numbers in a list, 63 00:04:16,786 --> 00:04:20,186 using the lowercase "c" command, which stands for "combine". 64 00:04:20,286 --> 00:04:28,266 So write "c", and then put those 4 numbers in between brackets, separated by commas: c(-1, +1, 65 00:04:28,266 --> 00:04:33,776 -1, +1) In a similar way, factor B from the table had -1, -1, +1, +1, let's type that in: 66 00:04:33,776 --> 00:04:38,506 c(-1, -1, +1, +1) Now highlight those two commands, 67 00:04:38,506 --> 00:04:41,516 and click "Run" to see what R does with those commands. 68 00:04:42,226 --> 00:04:43,596 We didn't get an error message. 69 00:04:43,596 --> 00:04:49,526 If we go to the console region here, and type the letter capital A and capital B, 70 00:04:49,916 --> 00:04:53,596 we see those two lists repeated back to us. 71 00:04:53,596 --> 00:04:56,596 Actually, we can also see them up here, in the "Environment" tab. 72 00:04:58,366 --> 00:05:00,706 We still need to create the variable called "y". 73 00:05:01,406 --> 00:05:06,406 The variable "y" contains a list of the numbers that represents the outcome of the experiments. 74 00:05:07,856 --> 00:05:10,556 Once again, we get that from our standard order table. 75 00:05:11,576 --> 00:05:17,146 Note that variables A, and B, and "y" have the same logical order from that table. 76 00:05:17,146 --> 00:05:20,806 It's very easy to create these models, because we can just go ahead 77 00:05:20,806 --> 00:05:23,566 and copy-and-paste directly from that standard order table. 78 00:05:25,056 --> 00:05:27,766 Finally, we are ready now to go run all the commands. 79 00:05:28,386 --> 00:05:31,796 Another nice shortcut in RStudio is to click the "Source" button. 80 00:05:32,456 --> 00:05:34,556 That will run all our commands in one go. 81 00:05:35,196 --> 00:05:38,066 In fact, there is "Source" and "Source with Echo". 82 00:05:38,066 --> 00:05:43,566 If you are new to R, please use the second option, which will echo (in other words 83 00:05:43,566 --> 00:05:46,896 "rewrite") all the commands into the console with you. 84 00:05:47,626 --> 00:05:50,016 That way, if there is a mistake in one of your lines, 85 00:05:50,016 --> 00:05:52,026 you will see exactly where the problem is. 86 00:05:53,616 --> 00:05:58,336 Now let's got inspect the result, particularly what that "popped_corn" variable is. 87 00:05:58,746 --> 00:06:01,716 Go down to the console window and type "popped_corn", 88 00:06:02,326 --> 00:06:04,546 and we will see our least-squares model over there. 89 00:06:05,336 --> 00:06:08,226 The output from that command shows us the prediction model. 90 00:06:08,896 --> 00:06:15,206 It has an intercept of 67; a main effect for A of 10 units; a main effect for B with a value 91 00:06:15,206 --> 00:06:18,996 of 4; and then the two factor interaction effect AB. 92 00:06:19,946 --> 00:06:23,566 Notice that these numbers match exactly what we calculated by hand earlier. 93 00:06:23,566 --> 00:06:28,406 So there you have it: a really quick way to get the model with computer software. 94 00:06:29,736 --> 00:06:32,656 We can also use the "summary(...)" command, to get more information. 95 00:06:33,486 --> 00:06:39,176 It shows us the original formula we used, when we built the model; it shows us the residuals, 96 00:06:39,446 --> 00:06:41,936 which we'll talk about later on in the course. 97 00:06:41,936 --> 00:06:46,756 And there we see the same 4 parameters again: 67, 10, 4, and -1. 98 00:06:46,756 --> 00:06:49,896 And, there's a bit more of statistical output down here for those 99 00:06:49,896 --> 00:06:51,286 of you that know what that is about. 100 00:06:52,596 --> 00:06:56,586 A key result I want to point out right now, is that there are 4 experiments, 101 00:06:56,586 --> 00:07:00,716 and we estimated 4 parameters: the intercept, the A effect, 102 00:07:00,716 --> 00:07:02,806 the B effect, and the AB interaction. 103 00:07:03,926 --> 00:07:08,006 I would also like to point out one thing here, where we specify the linear model. 104 00:07:08,666 --> 00:07:12,656 There is a term for A, for B, and the AB interaction. 105 00:07:13,516 --> 00:07:17,356 But you notice there isn't actually a term for the intercept. 106 00:07:17,356 --> 00:07:19,226 R will automatically add it for you. 107 00:07:19,226 --> 00:07:24,096 So even if you see 3 terms here in your input, you will get estimates for 4 parameters 108 00:07:24,096 --> 00:07:29,616 from R. Another nice shortcut that you can try - and we will explain this in later videos - 109 00:07:30,036 --> 00:07:35,576 is that you can create your model by saying: lm; open brackets; y; is predicted by (~); 110 00:07:35,576 --> 00:07:38,216 and then only write A*B; then close brackets. 111 00:07:38,216 --> 00:07:42,146 Try that out and see what you get. 112 00:07:43,656 --> 00:07:47,446 I want to end this video by stating that you could have used other computer software 113 00:07:47,446 --> 00:07:48,896 to build the least-squares model. 114 00:07:48,896 --> 00:07:56,396 For example Excel, Python, Minitab, MATLAB, SAS, JMP, or any of the other design 115 00:07:56,396 --> 00:07:58,816 of experiment software that are commercially available. 116 00:08:00,396 --> 00:08:03,426 You should get exactly these same parameters from the software. 117 00:08:04,006 --> 00:08:07,846 That's a good test if you're trying out one of those other software packages. 118 00:08:07,846 --> 00:08:13,226 So we have learned here how to get the basic results from a two factor experiment. 119 00:08:13,226 --> 00:08:16,406 Before we end this video, I want to challenge you though. 120 00:08:17,236 --> 00:08:21,796 Use the R software to repeat the numeric analysis for the ginger biscuits example 121 00:08:21,796 --> 00:08:24,576 that we had in a prior class (video 2C). 122 00:08:24,576 --> 00:08:30,296 Here are the raw experimental data and recall, this was a predictive model for taste. 123 00:08:30,296 --> 00:08:37,536 Are you able to reproduce the results we calculated by hand: 5.25, 1.75, 1.25, 124 00:08:37,536 --> 00:08:40,176 and 0.75 for the four parameters? 125 00:08:41,166 --> 00:08:44,426 Make sure you can reproduce this before continuing to the next video.