1 00:00:03,896 --> 00:00:08,826 In this video, we try to understand and quantify how well a control charts performs. 2 00:00:09,396 --> 00:00:13,836 We use control charts, because we know that quality is not optional in our systems anymore. 3 00:00:14,336 --> 00:00:18,796 Customers are very mobile, and quickly move on to suppliers that can provide high quality, 4 00:00:18,796 --> 00:00:22,356 consistent products, and that's what process monitoring is all about: 5 00:00:22,536 --> 00:00:24,866 ensuring that consistency is in place. 6 00:00:25,486 --> 00:00:29,616 I also want to mention that process monitoring is often referred 7 00:00:29,616 --> 00:00:31,716 to as "statistical process control." 8 00:00:32,236 --> 00:00:34,216 The statistical part being the key word. 9 00:00:35,066 --> 00:00:39,176 I avoid this term, however, because of the confusion of regular process control, 10 00:00:39,336 --> 00:00:44,466 which is the principle where we apply feedback, continuously, to check for deviations 11 00:00:44,706 --> 00:00:47,916 and to make changes to our process in an automated way. 12 00:00:48,566 --> 00:00:53,806 Process monitoring is very different to feedback control, and that's why I avoid the term SPC. 13 00:00:53,936 --> 00:00:56,876 Firstly, process monitoring is not applied automatically. 14 00:00:57,386 --> 00:01:00,166 Adjustments should be made to the process infrequently, 15 00:01:00,286 --> 00:01:03,036 and only when we see evidence for it in the control charts. 16 00:01:03,316 --> 00:01:07,006 When we say that something different, a "special cause" has occurred. 17 00:01:07,566 --> 00:01:10,456 Action from a process monitoring chart is taken manually. 18 00:01:10,546 --> 00:01:12,286 Feedback control is very different. 19 00:01:12,376 --> 00:01:15,286 Feedback control is a temporary measure that is taken 20 00:01:15,286 --> 00:01:18,076 in an automated way when a deviation is detected. 21 00:01:18,246 --> 00:01:22,046 It makes a very minor adjustment, regularly, to the process. 22 00:01:22,536 --> 00:01:26,536 The thought process behind monitoring is that when you detect a deviation, 23 00:01:26,866 --> 00:01:31,126 we should figure out what the root cause is and make a permanent change to our system, 24 00:01:31,506 --> 00:01:33,926 so that that cause does not occur in the future . 25 00:01:34,466 --> 00:01:38,196 Actually, in the prior example that I showed you, with the froth monitoring, 26 00:01:38,666 --> 00:01:40,316 the operators noticed the signature 27 00:01:40,316 --> 00:01:43,906 of the bubble size decreasing and the colour increasing. 28 00:01:44,276 --> 00:01:49,296 In an ideal world they would figure out what causes this and prevent it from ever reoccuring. 29 00:01:49,806 --> 00:01:53,656 Actually in this situation it was a function of a property of the raw material, 30 00:01:53,656 --> 00:01:56,606 the ore coming out of the ground, that periodically changes. 31 00:01:56,976 --> 00:01:59,256 So this something that they cannot fix really. 32 00:01:59,686 --> 00:02:01,996 But Japanese companies do this very well. 33 00:02:02,386 --> 00:02:04,746 They are credited for their high level of quality. 34 00:02:05,176 --> 00:02:09,916 And of the key reasons is because they find a permanent change to the processes 35 00:02:10,246 --> 00:02:12,316 to avoid problems from reoccurring. 36 00:02:12,766 --> 00:02:16,406 Feedback control actually introduced variation into our system. 37 00:02:16,736 --> 00:02:21,066 It makes a very minor adjustment and does so with regularity to the process, 38 00:02:21,146 --> 00:02:25,276 with the hope that it counteracts the disturbance to keep the process on target. 39 00:02:25,986 --> 00:02:29,576 In an ideal world we would never need to apply feedback control. 40 00:02:29,986 --> 00:02:34,176 In an ideal world, we would never even have variations entering into our process 41 00:02:34,176 --> 00:02:37,206 in the first place, to cause these destabilizing effects. 42 00:02:37,536 --> 00:02:41,876 But for processes where quality is critical it is worth aiming for that standard. 43 00:02:42,436 --> 00:02:46,966 Since we don't live in such an ideal world, we must have feedback control however, 44 00:02:46,966 --> 00:02:51,666 to automatically adjust for small deviations, and then we also have process monitoring, 45 00:02:51,786 --> 00:02:56,486 sitting on top of that, at a higher level, to detect when larger deviations occur 46 00:02:56,486 --> 00:02:59,516 of very irregular, abnormal situations. 47 00:02:59,916 --> 00:03:04,096 And that is why these monitoring charts use plus and minus three sigma limits. 48 00:03:04,526 --> 00:03:07,956 Something really has to go wrong before those limits are triggered. 49 00:03:08,636 --> 00:03:12,996 We established in the prior video that such 3 sigma limits, under the assumption 50 00:03:12,996 --> 00:03:19,286 of normally distributed variations, means that 1 in 370 samples will fall outside the limit, 51 00:03:19,556 --> 00:03:23,416 even if the process is normally distributed and behaving OK. 52 00:03:23,816 --> 00:03:27,776 That value of 1 in 370 is called the "the false alarm rate." 53 00:03:28,326 --> 00:03:32,646 In other areas it is also known as the producer's risk, or if we were dealing 54 00:03:32,646 --> 00:03:36,426 with diseases, and in the medical area we would be called this a false positive. 55 00:03:36,946 --> 00:03:40,676 In the area of statistics, we give this the name of a type I error, 56 00:03:41,146 --> 00:03:44,456 and we would like to reduce type I errors by as much as possible. 57 00:03:44,936 --> 00:03:48,836 A high false alarm rate will be a very quick recipe for operators 58 00:03:48,836 --> 00:03:50,826 to simply ignore your control chart. 59 00:03:51,356 --> 00:03:52,766 There is a different type of error. 60 00:03:52,926 --> 00:03:57,616 The other situation is when the process is not stable, but the x-bar values still lie 61 00:03:57,616 --> 00:04:00,706 within the limits, meaning that we don't detect the problem. 62 00:04:01,026 --> 00:04:03,196 This is called a false negative, or also known 63 00:04:03,196 --> 00:04:06,196 as the consumer's risk, or a false acceptance rate. 64 00:04:06,556 --> 00:04:09,466 In statistics, we give this the name: a type II error. 65 00:04:09,766 --> 00:04:14,596 Similar to the type I error, we would also like to reduce type II errors as much as possible. 66 00:04:15,386 --> 00:04:17,206 Neither of these errors are desirable. 67 00:04:17,306 --> 00:04:20,616 A type I error raises an alarm where none exists, 68 00:04:20,846 --> 00:04:24,876 and a type II error does not raise an alarm when one should have been raised. 69 00:04:25,566 --> 00:04:29,126 I prefer to use the language of false alarms, or false negatives. 70 00:04:29,596 --> 00:04:32,806 I do, however, want to point out how asymmetrical they are. 71 00:04:33,396 --> 00:04:37,946 And to do that, let's use a situation we might have all encountered: a visit with a doctor 72 00:04:38,326 --> 00:04:41,586 and being diagnosed with some medical issue such as a disease. 73 00:04:42,056 --> 00:04:43,486 Which would you find preferable? 74 00:04:43,546 --> 00:04:47,366 To have a false positive, or a false negative for a diagnosis? 75 00:04:48,366 --> 00:04:52,896 Remember, a false positive would say that you have the disease, when in fact, you don't. 76 00:04:53,326 --> 00:04:58,636 A false negative would be indicate that you do not have the disease when in fact you do. 77 00:04:59,006 --> 00:05:00,666 Notice the asymmetry there. 78 00:05:00,956 --> 00:05:05,306 A false positive diagnosis would be much more preferable than a false negative. 79 00:05:05,556 --> 00:05:10,036 You can always have a second opinion, or a third opinion, but a false negative leaves you 80 00:05:10,036 --> 00:05:13,786 with the wrong impression that everything is going OK, when it in fact is not. 81 00:05:14,046 --> 00:05:16,116 Here are some further examples to think about. 82 00:05:16,236 --> 00:05:19,716 What about screening for weapons at an airport security checkpoint. 83 00:05:20,286 --> 00:05:21,996 Which do you prefer to have happen? 84 00:05:22,326 --> 00:05:26,096 A type I error or a type 2 error, especially if you are the passenger. 85 00:05:26,426 --> 00:05:29,516 Or what about a trial at a jury: which would you rather have? 86 00:05:30,026 --> 00:05:35,676 The jury making a type I error or a type II, to set a potential defendant free or not. 87 00:05:36,396 --> 00:05:37,746 But back to our processes. 88 00:05:37,966 --> 00:05:40,906 We make type I errors or type II errors. 89 00:05:41,786 --> 00:05:46,226 If you get too many false alarms, we can simply make our control limits wider: 90 00:05:46,866 --> 00:05:51,346 make your lower control limit lower, your your upper control limit even higher still. 91 00:05:51,816 --> 00:05:53,656 That will reduce your false alarms. 92 00:05:54,106 --> 00:05:58,566 Eventually, if you make them wide enough, your type I error rate will go to zero. 93 00:05:59,046 --> 00:06:03,936 Your new bounds are so wide you capture almost all the variability in the system. 94 00:06:04,506 --> 00:06:08,586 Remember, there is no rule that you have to use plus and minus three sigma. 95 00:06:08,836 --> 00:06:10,606 That is simply there by convention. 96 00:06:10,856 --> 00:06:15,516 If you wanted wider bounds, or even narrower bounds, you are absolutely free 97 00:06:15,516 --> 00:06:17,446 to adjust them where you'd like them to be. 98 00:06:17,906 --> 00:06:20,876 However, you will get to a point where you make those bounds 99 00:06:20,876 --> 00:06:23,096 so wide you will never get a false alarm. 100 00:06:23,546 --> 00:06:26,306 But in that situation, I'd like you to think what has happened 101 00:06:26,306 --> 00:06:29,526 to your type II error rate: the false negative rate. 102 00:06:30,106 --> 00:06:34,656 Remember when the process is not stable, but still lies within the limits is a type II error. 103 00:06:35,296 --> 00:06:39,146 Having those wide limits means that you have captured everything inside them, 104 00:06:39,596 --> 00:06:42,696 including problematic operation when the process is not stable. 105 00:06:43,196 --> 00:06:47,856 So your type II error rate has gone up while your type I error rate has gone down. 106 00:06:48,446 --> 00:06:50,116 There is never a free lunch. 107 00:06:50,546 --> 00:06:54,966 You cannot have low type i error, as well as low type II error. 108 00:06:55,286 --> 00:06:58,136 One is always traded off against the other. 109 00:06:58,276 --> 00:07:02,506 So finding the right level for those bounds is absolutely critical 110 00:07:02,506 --> 00:07:04,346 for an effective monitoring chart. 111 00:07:04,886 --> 00:07:09,666 I cannot stress enough how much time is spent going into varying those limits 112 00:07:09,666 --> 00:07:12,966 to get just the right type I and type II error rates. 113 00:07:13,436 --> 00:07:15,876 Calculating the limits for a control chart is easy. 114 00:07:16,206 --> 00:07:20,426 Testing the control chart to make sure that it is operating at those right levels 115 00:07:20,426 --> 00:07:22,996 of errors is hard, and time-consuming. 116 00:07:23,446 --> 00:07:26,896 Now in the next video we look at some practical implementation aspects 117 00:07:26,896 --> 00:07:28,426 of using these control charts.