Chapter 13 Design of Experiments Introduction Listening or passive statistical tools: control charts. Conversational or active tools: Experimental design. Planning of experiments A sequence of experiments 13.1 A Simple Example of Experimental Design Principles The objective is to compare 4 different brands of tires for tread wear using 16 tires (4 of each brand) and 4 cars in an experiment. Illogical Design: Randomly assign the 16 tires to the four cars Assign each car will have all 4 tires of a given brand (confounded with differences between cars, drivers, and driving conditions) Assign each car will have one tire of each brand (poor design because brands A and

B would be used only on the front of each car, and brands C and D would be used only on the rear positions. Brand effect would be confounded with the position effect. Wheel Position Car 1 2 3 4 LF A B A

B RF B A B A LR D C D C RR

C D C D 13.1 A Simple Example of Experimental Design Principles Logical Design: Each brand is used once at each position, as well as once with each car. Wheel Position Car 1 2 3 4

LF A B C D RF B A D C LR C

D A B RR D C B A 13.2 Principles of Experimental Design The need to have processes in a state of statistical control when designed experiments are carried out. It is desirable to use experimental design and statistical process control methods together. General guidelines on the design of experiments: 1. 2.

3. 4. 5. 6. 7. Recognition of and statement of the problem Choice of factors and levels Selection of the response variable(s) Choice of experimental design Conduction of the experiment Data analysis Conclusions and recommendations The levels of each factor used in an experimental run should be reset before the next experimental run. 13.3 Statistical Concepts in Experimental Design: Example Assume that the objective is to determine the effect of two different levels of temperature on process yield, where the current temperature is 250F and the experimental setting is 300F. Assume that temperature is the only factor that is to be varied.

13.3 Statistical Concepts in Experimental Design: Example Day 250F 300F M 2.4 2.6 Tu 2.7 2.4 W 2.2 2.8 Th 2.5 2.5 F 2 2.2 M Tu W Th F 2.5 2.8 2.9 2.4

2.1 2.7 2.3 3.1 2.9 2.2 13.3 Statistical Concepts in Experimental Design: Example Observations: Neither temperature setting is uniformly superior to the other over the entire test period. The fact that the lines are fairly close together would suggest that increasing temperature may not have a perceptible effect on the process yield. The yield at each temperature setting is the lowest on Friday of each week. There is considerable variability within each temperature setting. 13.4 t-Tests The t statistic is of the general form where is the parameter to be estimated is the sample statistic (estimator of )

is the estimator of the std. deviation of Let = the true average yield using 250F = the true average yield using 300F = Then (if and are known) (13.1) 13.4.1 Exact t-Test The exact t-test is of the form (13.2) where is the square root of the estimate of the (assumed) common variance () reduces to a simple average of and when Degrees of freedom = 13.4.1 Exact t-Test Example Mean Variance 2

= 250F 300F 2.45 2.57 0.0872 0.0934 H0: 1=2 H1: 1<2 2 2 1 + 1

( 1 ) 1 ( 2 ) 2 ( 1 +2 2 ) Prob(t<-.893)19=.1916 9 ( .0872 ) + 9(.0934) = =.0903 18 13.4.1 Assumptions for Exact t-Test should be checked. (This assumption is not crucial when n1=n2.) The two samples are independent. The observations are independent within each sample. 13.4.2 Approximate t-Test If n1 and n2 differ considerably and is unknown, an approximate t-test is used where the degrees of freedom is calculated as (13.3)

13.4.3 Confidence Intervals for Differences 100(1-)% Confidence Bound 100(1-)% Two-sided Confidence Interval 13.5 Analysis of Variance (ANOVA) for One Factor Experimental Variable: Factor (e.g. Temperature) Values of Experimental Variable: Levels (250, 300) Output Variable: Effect (yield) Distinguish between variation from within variation 13.5 Analysis of Variance (ANOVA) for One Factor: Example Day M Tu W Th

F M Tu W Th F Avg. 250F 2.4 2.7 2.2 2.5 2.0 2.5 2.8 2.9 2.4 2.1 2.45 300F 2.6 2.4 2.8

2.5 2.2 2.7 2.3 3.1 2.9 2.2 2.57 SS(Within) 0.0025 0.0625 0.0625 0.0025 0.2025 0.0025 0.1225 0.2025 0.0025 0.1225 0.785 0.0036 0.0009 0.0289

0.0529 0.0049 0.1369 0.0169 0.0729 0.2809 0.1089 0.1369 0.841 0.0036 1.626 0.0072 13.5 Analysis of Variance (ANOVA) for One Factor: Example Output from Excel Anova: Single Factor SUMMARY Groups Count Sum Average Variance 250F 10 24.52.450.087222222 300F 10 25.72.570.093444444

ANOVA Source of Variation Between Groups Within Groups Total SS df 0.072 1.626 1.698 19 MS F P-value F crit 1 0.0727 0.797 0.3838 4.4139 18 0.0903 13.5 Analysis of Variance (ANOVA) for One Factor: Example Output from Minitab One-way ANOVA: Yield versus Temp Source Temp Error

Total DF 1 18 19 S = 0.3006 Level 250 300 N 10 10 SS 0.0720 1.6260 1.6980 MS 0.0720 0.0903

R-Sq = 4.24% Mean 2.4500 2.5700 StDev 0.2953 0.3057 Pooled StDev = 0.3006 F 0.80 P 0.384 R-Sq(adj) = 0.00% Individual 95% CIs For Mean Based on Pooled StDev +---------+---------+---------+--------(------------*-------------) (------------*-------------) +---------+---------+---------+--------2.25 2.40 2.55 2.70

13.5 Analysis of Variance (ANOVA) for One Factor The degrees of freedom for Total will always be the total number of data values minus one. The degrees of freedom for Factor will always be equal to the number of levels of the factor minus one. The degrees of freedom for Within will always be equal to (one less than the number of observations per level) multiplied by (the number of levels). The ratio of these mean squares is a random variable of an F distribution with numerator and denominator d.f. Assumptions of normality of the population and equality of the variances 13.5.1 ANOVA for a Single Factor with More than Two Levels Assume the process has three temperature settings, and data were collected over 6 weeks, with 2 weeks at each temperature setting. 13.5.1 ANOVA for a Single Factor with More than Two Levels: Example Day M

Tu W Th F M Tu W Th F 250F 300F 350F 2.4 2.6 3.2 2.7 2.4 3.0 2.2 2.8 3.1 2.5 2.5 2.8 2 2.2 2.5 2.5 2.8 2.9 2.4 2.1 2.7 2.3 3.1

2.9 2.2 2.9 3.1 3.4 3.2 2.6 13.5.1 ANOVA for a Single Factor with More than Two Levels: Example 13.5.1 ANOVA for a Single Factor with More than Two Levels Sum of squares for factor (Temp.) where represents the total of the obs for the ith level, is the number of levels of the factor, represents the number of obs for the ith level, denotes the grand total of all obs. N is the number of total obs. For the example (13.4)

13.5.1 ANOVA for a Single Factor with More than Two Levels Total sum of squares where represents ith obs. N is the number of total obs. For the example 13.5.1 ANOVA for a Single Factor with More than Two Levels: Example Output from Excel Anova: Single Factor SUMMARY Groups 250F 300F 350F ANOVA Source of Variation Between Groups Within Groups Total

Count 10 10 10 SS Sum 24.5 25.7 29.8 df Average Variance 2.45 0.087222 2.57 0.093444 2.98 0.079556 MS 1.544667

2 0.772333 2.342 27 0.086741 3.886667 29 F 8.903928 P-value F crit 0.001072 3.354131

13.5.1 ANOVA for a Single Factor with More than Two Levels: Example Output from Minitab One-way ANOVA: Yield versus Temp Source Temp Error Total DF 2 27 29 S = 0.2945 Level 250 300 350 N 10 10 10

SS 1.5447 2.3420 3.8867 MS 0.7723 0.0867 R-Sq = 39.74% Mean 2.4500 2.5700 2.9800 StDev 0.2953 0.3057 0.2821 Pooled StDev = 0.2945 F 8.90

P 0.001 R-Sq(adj) = 35.28% Individual 95% CIs For Mean Based on Pooled StDev +---------+---------+---------+--------(-------*-------) (-------*------) (------*-------) +---------+---------+---------+--------2.25 2.50 2.75 3.00 13.5.2 Multiple Comparison Procedures 13.5.3 Sample Size Determination (13.5) where represents number of levels of a factor is the std dev of the obs. denotes the minimum pairwise difference that one wishes to detect with probability 0.90 13.5.4 Additional Terms and Concepts

in One-Factor ANOVA An experimental unit is the unit to which a treatment is applied (the days). If the temperature settings had been randomly assigned to the days, it would be a completely randomized design. Blocks: Extraneous factors that vary and have an effect on the response, but not interested. One should block on factors that could be expected to influence the response variable and randomize over factors that might be influential, but that could not be blocked. 13.5.4 Additional Terms and Concepts in One-Factor ANOVA The cars were the blocks and the variation due to cars would be isolated. have one tire of each brand Car The cars and wheel position were the blocks. Each brand is used once at each position, as well as once with each car.

Wheel Position 1 2 3 4 Wheel Position LF A B A B

RF B A B LR D C RR C D Car 1 2

3 4 LF A B C D A RF B A D C

D C LR C D A B C D RR D C

B A Randomized block design Latin square design 13.5.4 Additional Terms and Concepts in One-Factor ANOVA Regression model for One-factor ANOVA: (13.6) where j denotes the jth level of the single factor represents the ith obs for the jth level represents the effect of the jth level is a constant represents the error term If the effects were all the same, (13.7) F-test determines whether the appropriate model is (13.6) or (13.7) 13.5.4 Additional Terms and Concepts

in One-Factor ANOVA Factors are generally classified as fixed (250F, 300 F, 350 F) or random (any number from a population) Data in one-factor ANOVA are analyzed in the same way regardless of whether the factor is fixed or random, but the interpretation does differ. is a constant if the factor is fixed, and a random variable if the factor is random. The error term is NID(0, 2) in both cases. are assumed to be normally distributed in both cases are not independent in the random-factor case. 13.5.4 Additional Terms and Concepts in One-Factor ANOVA The data in the temperature example were balanced in that there was the same number of obs for each level of the factor. 13.6 Regression Analysis of Data from Designed Experiments Regression and ANOVA both could be used as methods of analysis. Regression provides the tools for residual analysis, and the estimation of parameters. For fixed factors, ANOVA should be supplemented or

supplanted. 13.6 Regression Analysis of Data from Designed Experiments The least squares estimator in regression analysis resulted from minimizing the sum of squared errors. so that (13.8) Assumption: the levels of the factor are fixed, balanced data. 13.6 Regression Analysis of Data from Designed Experiments The effect can be thought as a deviation from the overall mean . where is the expected value of the response variable for the jth level of the factor is the average of components So This restriction on the components allows and each to be estimated using least squares. 13.6 Regression Analysis of Data from Designed Experiments Minimizing produces

and where denotes the average of all obs is the avg of the obs for the jth factor level Then, The residuals are defined as 13.6 Regression Analysis of Data from Designed Experiments: Example Day M Tu W Th F Sum M Tu W Th F Sum Avg 250F 2.4

2.7 2.2 2.5 2.0 2.5 2.8 2.9 2.4 2.1 24.5 2.45 Res. -0.05 0.25 -0.25 0.05 -0.45 -0.45 0.05 0.35 0.45 -0.05 -0.35 0.45

Res^2 0.0025 0.0625 0.0625 0.0025 0.2025 0.0025 0.1225 0.2025 0.0025 0.1225 0.785 300F Res. Res^2 350F 2.6 0.03 0.0009 3.2 2.4 -0.17 0.0289 3.0 2.8 0.23 0.0529 3.1 2.5 -0.07 0.0049 2.8 2.2 -0.37 0.1369 2.5 -0.35 2.7 0.13 0.0169 2.9 2.3 -0.27 0.0729 3.1 3.1 0.53 0.2809 3.4 2.9 0.33 0.1089 3.2

2.2 -0.37 0.1369 2.6 25.7 0.35 0.841 29.8 2.57 2.98 Res. 0.22 0.02 0.12 -0.18 -0.48 -0.30 -0.08 0.12 0.42 0.22 -0.38 0.30 Res^2 0.0484 0.0004 0.0144 0.0324 0.2304

0.0064 0.0144 0.1764 0.0484 0.1444 0.716 2.342 13.6 Regression Analysis of Data from Designed Experiments The production is higher for the 2nd week at each temperature setting. The production is especially high during Wednesday of the week. The more ways we look at data, the more we are apt to discover. 13.6 Regression Analysis of Data from Designed Experiments 13.6 Regression Analysis of Data from Designed Experiments 13.6 Regression Analysis of Data from Designed Experiments

13.6 Regression Analysis of Data from Designed Experiments 13.7 ANOVA for Two Factors Example now includes two factors: weeks and temperature. In a factorial design (or cross-classified design), each level of every factor is crossed with each level of every other factor. (If there are a levels of one factor and b levels of a second factor, there are ab combinations of factor levels.) In a nested factor design, one factor is nested within another factor. 13.7 ANOVA for Two Factors Model for nested factor design where (temperature settings) (the week) (replicate factor, days) indicates j factor (week) is nested within factor (temperature) indicates that the replicate factor is nested within each (,) combination) combination The nested factor design is also called hierarchical design and is used for estimating components of variance.

13.7.1 ANOVA with Two Factors: Factorial Designs Why not study each factor separately rather than simultaneously? Interaction among factors 35 30 25 20 P1 P2 15 10 5 0 T1 T2 13.7.1.1 Conditional Effects Factor effects are generally called main effects. Conditional effects (simple effects): the effects of one factor at each level of another factor.

13.7.2 Effect Estimates Temperature effect: (Effect of changing Temp from T1 to T2 at P1 and P2. 25 20 15 P1 P2 10 Pressure Effect: 5 0 T1 T2 13.7.2 Effect Estimates Interaction effect:

35 30 25 20 P1 P2 15 10 5 0 T1 T2 13.7.2 Effect Estimates Temperature effect: (Effect of changing Temp from T1 to T2 at P1 and P2. 25 20 15 P1 P2

10 Pressure Effect: Interaction Effect T=P=0, TP=-10 5 0 T1 T2 13.7.3 ANOVA Table for Unreplicated Two-Factor Design ANOVA Source of Variation T P TP (residual) Total SS 0 0

100 100 df 1 1 1 3 MS 0 0 100 F When both factors are fixed, the main effects and the interaction are tested against the residual. When both factors are random, the main effects are tested against the interaction effect, and the interaction effect is tested against the residual. When one factor is fixed and the other random, the fixed factor is tested against the interaction, the random factor is tested against the residual, and the interaction is tested against the residual.

13.7.4 Yatess Algorithm For any design, where is the number of factors and 2 is the number of levels of each factor, any treatment combination can be represented by the presence or absence of each of lowercase letters, where presence would denote the high level, and absence the low level. For example, if = (A high, B high); = (A high, B low); = (A low, B high); = (A low, B low) A B Low High Low 10, 12, 16 8, 10, 13 High 14, 12, 15

12, 15, 16 13.7.4 Yatess Algorithm The procedure is initiated by writing down the treatment combinations in standard order: 1 is always written first The other combinations are listed relative to the natural ordering, including combinations of letters The procedure can be employed using either the totals or averages for each treatment combination. 13.7.4 Yatess Algorithm A B Treatment Combination Total 38 38 31 31

41 41 43 43 Low High Low 10, 12, 16 8, 10, 13 High 14, 12, 15 12, 15, 16 (1) (2)

SS 13.7.4 Yatess Algorithm The columns designated by (1) and (2) are columns in which addition and subtraction are performed for each ordered pair of numbers. (In general, there will be such columns for factors.) Specifically, the number in each pair are first added, and then the first number in each pair is subtracted from the second number. 13.7.4 Yatess Algorithm Treatment Combination Treatment Combination Total (1) 38 31 31

69=38+31 84=41+43 84=41+43 41 41 43 43 -7=31-38 -7=31-38 2=43-41 2=43-41 (2) SS SS Total (1)

(2) 38 69 153=69+84 31 84 -5=-7+2 41 -7 15=84-69 43 2 9=2-(-7)

13.7.4 Yatess Algorithm The process is continued on each new column that is created until the number of such columns is equal to the number of factors. The last column that is created by these operations is used to compute the sum of squares for each effect. Specifically, each number (except the first) is squared and divided by the number of replicates times . 13.7.4 Yatess Algorithm Treatment Combination Total (1) (2) SS 38 69

153 31 84 -5 (-5)2/(3*22)=2.08 (A) 41 -7 15 (15)2/(3*22)=18.75 (B) 43 2 9

(9)2/(3*22)=6.75 (AB) 13.7.4 Yatess Algorithm The first number in the last column is actually the sum of all of the obs. ( ANOVA Source of Variation A B AB Residual Total SS 2.08 18.75 6.75 44.67 72.25 df 1 1 1

8 11 MS 2.08 18.75 6.75 5.58 F <1 3.36 1.21 1,8,.95 =5.32 13.7.4 Yatess Algorithm Two-way ANOVA: Yield versus B, A Source B A Interaction Error Total S = 2.363

DF 1 1 1 8 11 SS 18.7500 2.0833 6.7500 44.6667 72.2500 R-Sq = 38.18% MS 18.7500 2.0833 6.7500 5.5833 F 3.36

0.37 1.21 P 0.104 0.558 0.304 R-Sq(adj) = 14.99%