In this class, we're going to talk about something called a split-plot design. This material is also in chapter 14 of the book, it's section 14.4. The split-plot is a type of multi-factor design where we can't do complete randomization. We can't completely randomize all the runs and usually, usually, it's because one or more of the factors is hard to change. That's what that's what leads to the split-plot structure. So I'm going to show you an example and this is a real example that involved paper making, paper manufacturing and the type of paper being made here, is called kraft paper and and it's used to make like paper grocery bags. And so the the manufacturer has three different pulp preparation methods that they want to investigate and basically, these pulp preparation methods differ by the amount of hardwood that's in the pulp. And there are four different temperatures that we want to consider in making the making the pulp. So each replicate would require 12 runs, okay? The experimenters have done some sample sizing here, and they believe that they need to use three replicates, so they're going to be a total of 36 runs. If this is a completely randomized design, how many batches of pulp do you need? Well, you would need 36 batches of pulp, wouldn't you, in a completely randomized design. Because you have to have a completely separate trial of each test combination in a factorial arrangement. That's just not feasible, when you make a batch of this pulp, it's a fairly big batch of material and it would be almost unimaginable to make 36 of those to be able to run this experiment. Pulp preparation methods is a hard to change factor and it's the size of the of the batch of the pulp that makes it so hard to deal with. So let's think about a different way to run this experiment. Now, remember, we want to do three replicates. So let's just consider initially, replicate 1, select a pulp preparation method, one of the three, and then prepare a batch. Then divide that batch into four sections or four samples and assign one of the temperature levels to each and you could do that assignment randomly. Now, then repeat that for each pulp preparation method, then conduct replicates 2 and replicates 3 in exactly the same manner. And so in effect, each replicate has been divided into three parts and those parts are usually called whole plots. By the way, sometimes the replicates are called blocks as well. But basically, we've taken each replicate and created three whole plots which are the preparation methods. And pulp preparation methods all the whole plot treatments. Then each whole plot has been divided into four sub plots, which we sometimes call split-plots and the temperature factor is the subplot treatment. So generally, the way this works is the hard-to-change factor ends up being assigned to the whole plot or to the whole plots. One of the advantages of this is that you don't have to change that hard-to-change factor very much. If we did a completely randomized design, there will be 36 batches of pulp. In the split plot arrangement we've described, there are only nine batches of pulp, that is assuming that we have actually three replicates. So here's the layout of the experiment. Replicate 1, here are my three whole plots, replicate 2, here are my three whole plots, replicate 3, here are my three whole plots, and for each one of those whole plots, here are my subplot or split-plot factors. And here's the the tensile strength data that results. So what sort of statistical model do we have for a split-plot design? Well, mu is the overall mean and I'm going to let tau sub i be the replicate effect, okay? Beta sub j is going to be my whole plot factor effect, that's the pulp preparation methods. Here's the interaction term between replicates and the and the whole plot factor. Here is subplot factor, okay, that's the temperature variable. Here's the interaction between replicate and temperatures and you can have an interaction between temperature replicates because every temperature level is running every replicate. Here is the interaction between the whole plot factor and the subplot factor, that is the preparation method temperature interaction. And that interaction can exist, of course, because every preparation method sees every temperature. And then finally, there's the three factor interaction term. When we when we look at the expected mean squares here, assuming that replicates are random, okay, this is what we get for the expected mean square structure. And now notice that I've divided this into a whole plot component and a subplot component. And you notice that there's a term here, that we might think of as subplot error because that term is used to test the subplot main effect, The pulp preparation method effect. Down here in the in the subplot, what are we interested in? Well, we are interested in the temperature effect and we're interested in the pulp preparation method temperature interaction. And, You notice that, The error term here really can't be estimated, it's not an estimable effect, it's not an estimable effect. This is the interaction between replicates and temperature and this is the three factor interaction. Typically, we would test the subplot factor against this interaction term, which is replicates times temperature, that can be thought of as subplot error. And then here is the, Whole plot-subplot interaction, which would be tested against the three factor interaction. So there are different error structures between the whole plot and the subplot, two error structures, a whole plot error term and a subplot error. Here's the ANOVA for the data that I showed you, the manual calculations follow that of a three-factor ANOVA with one replicate. That's why that variance component or that's why this error term that you see here, sigma square, that variance component cannot be estimated, is because essentially, this is the arithmetic, the sum of squares partition is a three factor ANOVA with one replicate. So two different error structures, here is the whole plot error and then this is the subplot error, which is replicates times the AB effect, that's used to test the AB interaction term. But there's also a separate term here, that would be used to test the main effect of temperature. So there are really sort of two subplot error structures here. There is an alternate model, that is sometimes used for the split-plot design. And here is the alternate model, it has replicates, it has the whole plot effect, it has the replicate by whole plot interaction, that's going to be your whole plot error term. It has the the main effect of your subplot factor. It has the interaction between the whole plot and the subplot factor and then this error term here, is really everything else, it's all of those other terms, and that becomes your subplot error. So so basically, it combines, The replicate times B and the replicate times AB terms to produce a whole plot error. This version of the of the split-plot model is used pretty often, probably more so than the model I showed you originally. And here is the the jump output for this particular problem and it uses that alternate model that we talked about just a moment ago. And, Here are the tests on the main effect of methods, the main effect of temperature and the interaction effect between temperature and methods, those are fixed effects test because the only random component here was the replicate. Sometimes people run the split-plot design inadvertently, they simply don't randomize the hard-to-change factor and so they end up with a split-plot structure, they don't recognize it, so they do a completely randomized design analysis, they do a CRD. And so this is what would happen if we took this pulp preparation method experiment and analyzed it as if it were a completely randomized design. So we would have the factors, methods and temperature, and we would have a method by temperature interaction and they're both fixed factors. And so here is the ANOVA that we would get and the first thing that you notice is that you don't see the method by temperature interaction, it's not significant. And in the analysis that we did previously, the split-plot analysis that we did previously, there is a strong temperature by method interaction and we don't get that, we don't see that. The error estimate we get is instead of having a whole plot error and a subplot error, the estimate that we get is just the total error is too large and that's why we get a failure to recognize the significant temperature method interaction, here. Another variation of the basic split-plot, very common variation, more than two factors. So suppose we're looking at an experiment involving, let's say semiconductor manufacturing, where we're etching wafers. So we have two factors, gas flow and temperature, that are hard to change, but another two factors, time in the chamber and wafer position, are easy to change. So how could we run an experiment like this? Well, let's let gas flow and temperature be factors A and B and let's let time and wafer position be factors C and D. So one way to do this would be to create, Four whole plots and use the four combinations of gas flow and temperature, each at two levels, and apply each one of those combinations to one of the whole plots. Then we could take the four test combinations for time and wafer position and run all four of those tests combinations in the subplot. So that would give you the structure that you see here, and then we could repeat this, we could we could replicate this. And so block 1 would be the first replicate, block 2 would be the second replicate. So this is a split-plot design with four design factors, two in the whole plot, two in the subplot. That's a pretty big design, that's a pretty big design, 32 runs. So you might be interested in other variations of this type of experiment.