(click to download)

One-way analysis of variance is a hypothesis-testing technique that is used to compare the means of three or more populations. Analysis of variance is usually abbreviated **ANOVA**.

The hypothesis-testing procedure using ANOVA involves the same five steps that were used in earlier chapters. To begin a one-way analysis of variance test, you should first state the null and alternate hypotheses. For a one-way ANOVA test, the null and alternate hypotheses will be:

\[ \eqalign{ H_0: &\quad \mu_1=\mu_2=\mu_3=\dots=\mu_k \quad \text{ (all population means are equal)} \cr H_1: &\quad \text{ at least one mean is different from the others} } \]

When you reject the null hypothesis in a one-way ANOVA test, you can conclude that at least one of the means is different from the others. Without performing more statistical tests, however, you cannot determine which of the means is different.

Before performing a one-way ANOVA test, you must check that these three conditions are satisfied:

- The populations from which the samples were obtained must be normally or approximately normally distributed.
- The samples must be independent of one another.
- The variances of the populations must be equal.

- the
**variance between samples**, denoted $MS_B$, and - the
**variance within samples**, denoted $MS_W$.

The variance between samples, $MS_B$, gives an estimate of $\sigma^2$ based on the variation among the means of samples taken from different populations. The variance within samples, $MS_W$, gives an estimate of $\sigma^2$ that is calculated from using all of the data from different samples.

The test statistic for a one-way ANOVA test is the ratio of the two estimates for population variance: \[ \text{Test Statistic }=\frac{\text{Variance between samples}}{\text{Variance within samples}} \] If the assumptions given above are met, then the sampling distribution for the test is approximated by the $F$-distribution. The graph of an $F$-distribution is skewed right like the graph given below.

The test statistic is a value of $F$ given by the formula \[ F= \frac{MS_B}{MS_W} \] The values of $F$ that make up the sampling distribution lie along the horizontal axis in the above graph and will range between 0 and $\infty$. The relative frequency with which these values occur is represented by the height of the sampling distribution curve.

If there is little or no difference between the means, then $MS_B$ will be approximately equal to $MS_W$ and the test statistic will be approximately 1. Values of $F$ close to 1 suggest that you should fail to reject the null hypothesis. However, if one of the means differs significantly from the others, then $MS_B$ will be greater then $MS_W$ and the test statistic will be greater than 1. Values of $F$ significantly greater than one suggest that you would reject the null hypothesis.

The one-way ANOVA test is always a right-tailed test. The p-value for the hypothesis test is the probability that $F$ is greater than or equal to the value of $F$ obtained from the samples. This is equal to the area under the sampling distribution, right of a vertical line fixed at the value of $F$ you obtain from your samples. As usual, if the p-value is less than $\alpha$, then reject $H_0$.

Hypothesis Test Conclusion | This means that... |
---|---|

Reject $H_0$ | The sample evidence suggest that at least one mean is different from the the others. If $H_0$ were true, the sample data would be very surprising. |

Fail to Reject $H_0$ | There is not convincing sample evidence to conclude that there is a difference in the means.
If $H_0$ were true, the sample data would not be considered surprising. |

**From time to time, unknown to its employees, the research department at Main Street Bank observes
various employees for their work productivity. Recently this department wanted to check
whether the four tellers at a branch of this bank serve, on average, the same number of customers
per hour. The research manager observed each of the four tellers for a certain number of hours. The following table gives the number of customers served by the four tellers during
each of the observed hours.**

Teller 1 | Teller 2 | Teller 3 | Teller 4 |
---|---|---|---|

19 | 14 | 11 | 24 |

21 | 16 | 14 | 19 |

26 | 14 | 21 | 21 |

24 | 13 | 13 | 26 |

18 | 17 | 16 | 20 |

13 | 18 |

**What statements represent the null and alternative hypotheses?****What value of $\alpha$ should you use for the hypothesis test?****What is the formula for the test statistic?****What is the value of the test statistic? Round to the hundredths.****What p-value is obtained from performing one-way ANOVA test? Round to the ten-thousandths.****Sketch a graph of the sampling distribution and the p-value.****Did you reject $H_0$ or fail to reject $H_0$?****Interpret the decision.**

**What statements represent the null and alternative hypotheses?**

\[ \eqalign{ H_0: &\quad \mu_1=\mu_2=\mu_3=\mu_4 \quad \text{ (all population means are equal)} \cr H_1: &\quad \text{ at least one mean is different from the others} } \]

**What value of $\alpha$ should you use for the hypothesis test?**

\[ \alpha=0.05 \]

**What is the formula for the test statistic?**

\[ F= \frac{MS_B}{MS_W} \]

**What is the value of the test statistic? Round to the hundredths.**

## One-Way Analysis of Variance (ANOVA)

- Enter the data into L1, L2, L3, etc.
- Press
**STAT**and move the cursor to**TESTS**. - Arrow Up/Down until the cursor highlights
**ANOVA**. Press**ENTER**. - Type each list followed by a comma. End with
**)**and press**ENTER**.

**What p-value is obtained from performing one-way ANOVA test? Round to the ten-thousandths.**

**Sketch a graph of the sampling distribution and the p-value.**

**Did you reject $H_0$ or fail to reject $H_0$?**

Since the pvalue is less than $\alpha$, we reject $H_0$.

**Interpret the decision in the context of the data.**

The mean number of customers served per hour of the four tellers is not the same. The sample evidence suggest that at least one mean is different from the the others.

The way that a sample is selected is called the

Other research involves

An **experimental unit** is the object on which a measurement (or measurements) is taken.

A**factor** is an independent variable whose values are controlled and varied by the experimenter.

A**level** is the intensity setting of a factor

A**treatment** is a specific combination of factor levels

The**response** is the variable being measured by the experimenter

A

A

A

The

- 'age' at two levels: ages 20 to 29 and ages 30 to 39
- 'meal' at two levels: breakfast and no breakfast

In this more complex experiment, there are four

In this section, we will concentrate on an experiment that involves one factor set at $k$ levels, and we will use a technique called the

You can better understand the logic underlying an analysis of variance by looking at a simple experiment. Consider two sets of samples randomly selected from populations 1 (white ovals) and 2 (black triangles), each with the same number of pairs of means, $\bar{x_1}$ and $\bar{x_2}$. The two sets are shown in the figure below.

Is it easier to detect the difference in the two means when you look at set A or set B? You will probably agree that set A shows the difference much more clearly. In set A, the variability of the measurements

The comparison you have just done intuitively is formalized by the analysis of variance. Moreover, the analysis of variance can be used not only to compare two means but also to make comparisons of more than two population means and to determine the effects of various factors in more complex experimental designs. The analysis of variance relies on statistics with sampling distributions that are modeled by the F distribution of Secion 10.3 in your textbook.

- The observations within each population are normally distributed with a common variance, $\sigma^2$
- Assumptions regarding the sampling procedure are specified for each experimental design.

How can you select these $k$ random samples? Sometimes the populations actually exist in fact, and you can use a computerized random number generator or a random number table to randomly select the samples. For example, in a study to compare the average sizes of health insurance claims in four different states, you could use a computer database provided by the health insurance companies to select random samples from the four states.

Whether by random selection or random assignment, both of these examples result in completely randomized design, or one-way classification, for which the analysis of variance is used.

Suppose you want to compare $k$ population means, $\mu_1, \mu_2, ..., \mu_k,$ based on independent random samples of size $n_1, n_2, ..., n_k$ from normal populations with a common variance, $\sigma^2$. That is, each of the normal populations has the same shape but their locations migh be different, as shown in the figure below.

$$
\begin{array}{c|c|c|c|c}
\color{blue}{Variation \ Source} & \color{blue}{ df } & \color{blue}{ Sum \ of \ Squares (SS) } & \color{blue}{Mean \ of \ Squares (MS)} &\color{blue}{F} \\\hline
Treatments & k-1 & SS_B& MS_B& MS_B/MS_W\\
Error & n-k & SS_W & MS_W&\\ \hline
Total & n-1 & Total \ SS & &
\end{array}
$$

Where $$\begin{align*} Total \ SS & = \sum x^2 -CM \\ &\\ & = (\text{sum of squares of all x values}) - CM \end{align*}$$ with $$ \begin{array}{cccc} CM = \dfrac{(\sum x)^2}{n} = \dfrac{G^2}{n} &&&\\ & & &\\ SS_B = \sum\dfrac{T_i^2}{n_i}-CM & & & MS_B=\dfrac{SS_B}{k-1} \\ & & &\\ SS_W = Total \ SS - SS_B &&& MS_W=\dfrac{SS_W}{n-k} \\ \end{array} $$

and $$\begin{align*} G & = \text{Grand total of all $n$ observations}\\ T_i &= \text{Total of all observations in sample $i$}\\ n_i &= \text{Number of observations in sample $i$}\\ n &= n_1+n_2+\dots+n_k\\ \end{align*}$$

Where $$\begin{align*} Total \ SS & = \sum x^2 -CM \\ &\\ & = (\text{sum of squares of all x values}) - CM \end{align*}$$ with $$ \begin{array}{cccc} CM = \dfrac{(\sum x)^2}{n} = \dfrac{G^2}{n} &&&\\ & & &\\ SS_B = \sum\dfrac{T_i^2}{n_i}-CM & & & MS_B=\dfrac{SS_B}{k-1} \\ & & &\\ SS_W = Total \ SS - SS_B &&& MS_W=\dfrac{SS_W}{n-k} \\ \end{array} $$

and $$\begin{align*} G & = \text{Grand total of all $n$ observations}\\ T_i &= \text{Total of all observations in sample $i$}\\ n_i &= \text{Number of observations in sample $i$}\\ n &= n_1+n_2+\dots+n_k\\ \end{align*}$$

$$ \begin{array}{ccc} No \ Breakfast & Light \ Breakfast & Full \ Breakfast\\\hline 75 & 75 & 72 \\ 76 & 73 & 68 \\ 71 & 76 & 69 \\ 74 & 72 & 72 \\ 77 & 74 & 71 \\ \hline T_1=373& T_2=370 & T_3=352 \\ \end{array} $$

\begin{align*} Total \ SS & = (75^2+76^2+71^2+\dots75^2)-CM\\ & =80,031-79,935\\ & =96\\ \end{align*} with $(n-1)=15-1=14$ degrees of freedom. Next,

\begin{align*} SS_B & = \sum\frac{T_i}{n_i}-CM\\ & = \biggr(\frac{373^2}{5}+\frac{370^2}{5}+\frac{352^2}{5} \biggr)-79,935\\ & =51.6\\ \end{align*} with $(k-1)=3-1=2$ degrees of freedom. Using subtraction, \begin{align*} SS_W & = Total \ SS - SS_B\\ & = 96-51.6\\ &=44.4 \end{align*} with $(n-k)=15-3=12$ degrees of freedom. Then, \begin{align*} MS_B & =\frac{SS_B}{k-1}\\ & =\frac{51.6}{2}\\ & = 25.8\\ \end{align*} and \begin{align*} MS_W & =\frac{SS_W}{n-k}\\ & =\frac{44.4}{12}\\ & = 3.7\\ \end{align*} Finally, $F=\frac{MS_B}{MS_W}=\frac{25.8}{3.7}=6.729729729...$

- Enter the data into L1, L2, L3, etc.
- Press
**STAT**and move the cursor to**TESTS**. - Arrow Up/Down until the cursor highlights
**ANOVA**. Press**ENTER**. - Type each list followed by a comma. End with
**)**and press**ENTER**.