For example, a craft brewery wants to see if consumers have any preference among four new flavors of beer. A sample of opinions from 100 people provided these data: $$ \begin{array}{c|c|c|c} Red \ Pill & Crackalicious & Bavarian & Stumbling \ Cletus \\ Brew & Schwartzbier & Dankenweizen & Amber \ Ale \\ \hline 22 & 28 & 20 & 30\\ \end{array} $$

The **observed frequency O** of a category is the frequency for the category observed in the sample data.

The**expected frequency E** of a category is the calculated frequency for the category. Expected frequencies are found by using the expected (or hypothesized)
distribution and the sample size. The expected frequency for the $i^{th}$ category is
\[
E_i=n\cdot p_i
\]
where $n$ is the number of trials (the sample size) and $p_i$ is the assumed probability of the $i^{th}$ category.

The

If there were no preference, you would expect each flavor of beer to be selected with equal frequency, or with a 25% frequency. That is, approximately 25 people in the sample would have selected each flavor of craft beer, had there been no preference.

Since the frequencies for each flavor were obtained from a sample, these actual frequencies are called the

$$ \begin{array}{c|c|c|c|c} & Red \ Pill & Crackalicious & Bavarian & Stumbling \ Cletus \\ & Brew & Schwartzbier & Dankenweizen & Amber \ Ale \\ \hline Observed & 22 & 28 & 20 & 30\\ \\ \hline Expected & 25 & 25 & 25 & 25\\ \end{array} $$ The observed frequencies will almost always differ from the expected frequencies due to

**State the null and alternative hypotheses.****Find the value of your test statistic**with the Chi-Square with the formula $\qquad\chi^2=\displaystyle\sum\frac{(O-E)^2}{E}$**Draw a picture**of the Chi-Square Distribution (centered at df=number of categories minus1), and mark the location of your test statistic value along the x-axis in the right tail of your graph.**Find the p-value.**This will always be the area to the right of your test statistic. Find this value with your calculator's $\chi^2 \ cdf$ command. Use \[ \chi^2 \ cdf(\text{test stat value}, 10^9, \ df) \qquad\text{with $df=$ number of categories minus 1} \]**Use your p-value to make a decision about how to conclude the test.**- If the $p-val\leq\alpha$, reject $H_0$
- If the $p-val>\alpha$, fail to reject $H_0$ and conclude that the sample evidence suggests that the percentages are not significantly different from those given in the null hypothesis.

- The data are obtained from a random sample.
- The expected frequency for each category must be 5 or more.

When there is perfect agreement between the observed and the expected values, $\chi^2=0$. Also, $\chi^2$ can never be negative. Finally, the test is right-tailed because "$H_0$: Good fit" and "$H_a$: Not a good fit” mean that $\chi^2$ will be small in the first case and large in the second case.

Is there enough evidence to reject the claim that there is no preference in the selection of craft beers using the sample data given below? Use $\alpha=0.05.$

$$ \begin{array}{c|c|c|c} Red \ Pill & Crackalicious & Bavarian & Stumbling \ Cletus \\ Brew & Schwartzbier & Dankenweizen & Amber \ Ale \\ \hline \color{red}{22} & \color{green}{28} & \color{blue}{20} & \color{orange}{30}\\ \end{array} $$In the goodness-of-fit test, the degrees of freedom are equal to the number of categories minus 1. For this example, there are four categories (4 different flavors of beer); hence, the degrees of freedom are $4-1=3$. This is so because the number of subjects in each of the first three categories is free to vary. But in order for the sum to be 100 -- the total number of subjects -- the number of subjects in the last category is fixed.

We notice that $\alpha=0.05$ and the $p-val=0.4368$.

Since $p-val>\alpha$, fail to reject $H_0$ and conclude that the percentages are not significantly different from those given in the null hypothesis.

- Enter the observed frequencies in $L1$ and the expected frequencies in $L2$.
- Press 2nd [QUIT] to return to the home screen.
- Press 2nd [LIST], move the cursor to MATH, and press 5 for sum(.
- Type $(L1 - L2)^2/L2$, then press ENTER.

To calculate the P-value:

Press 2nd [DISTR] then press 7 to get $\chi^2 cdf($. (Use 8 on the TI-84)

For this P-value, the $\chi^2 cdf($ command has form $\chi^2 cdf($ test statistic, $10^9$ , degrees of freedom).

For this example use $\chi^2 \ cdf(2.72, \ 10^9, \ 3)$

A researcher claims that the numbers of cups of coffee U.S. adults drink per day are distributed as shown in the figure. You randomly select 1600 U.S. adults and ask them how many cups of coffee they drink per day. The table shows the results. At $\alpha = 0.05$, test the researcher’s claim.

$$ \text{ Survey Results}\\ \begin{array}{c|c} Response & Frequency, f \\\hline 0 \text{ cups} & 570 \\ 1 \text{ cup} & 432 \\ 2 \text{ cups} & 282 \\ 3 \text{ cups} & 152 \\ 4 \text{ or more cups} & 164 \\ \end{array} $$

Define the following population proportions: \[ \eqalign{ p_1 = & \text{the proportion of US adult who drink 0 cups per day} \cr p_2 = & \text{the proportion of US adult who drink 1 cups per day} \cr p_3 = & \text{the proportion of US adult who drink 2 cups per day} \cr p_4 = & \text{the proportion of US adult who drink 3 cups per day} \cr p_5 = & \text{the proportion of US adult who drink 4 or more cups per day} } \] Then the hypotheses are as follows: \[ \eqalign{ H_0: \quad & p_1 =36\%, \quad p_2 = 26\%, \quad p_3 =19\%, \quad p_4 =9\%, \quad p_5=10\% \cr H_a: \quad & \text{The distribution of the number of cups of coffee U.S. adults drink per day is different from the one given in the null hypothesis} } \]

$$ \begin{array}{c|c|c} Response & Observed \ Frequency, O & Expected \ Frequency, E \\\hline 0 \text{ cups} & \color{red}{570} & 1600\cdot(0.36)=\color{blue}{576} \\ 1 \text{ cup} & \color{green}{432}& 1600\cdot(0.26)=\color{blue}{416} \\ 2 \text{ cups} & \color{orange}{282} & 1600\cdot(0.19)=\color{blue}{304}\\ 3 \text{ cups} & \color{red}{152} & 1600\cdot(0.09)=\color{blue}{144}\\ 4 \text{ or more cups} & \color{green}{164} & 1600\cdot(0.10)=\color{blue}{160}\\ \end{array} $$

Then the test statistic value is \[ \eqalign{ \chi^2 & = \sum\frac{(O-E)^2}{E} \cr\cr & = \frac{( \color{red}{570}-\color{blue}{576})^2}{\color{blue}{576}}+ \frac{( \color{green}{432}-\color{blue}{416})^2}{\color{blue}{416}}+ \frac{( \color{orange}{282}-\color{blue}{304})^2}{\color{blue}{304}}+ \frac{( \color{red}{152}-\color{blue}{144})^2}{\color{blue}{144}}+ \frac{( \color{green}{164}-\color{blue}{160})^2}{\color{blue}{160}} \cr\cr & = 2.814 \qquad\text{ chi-square values are always rounded to the thousandths} } \]

In the goodness-of-fit test, the degrees of freedom are equal to the number of categories minus 1. For this example, there are five categories; hence, the degrees of freedom are $5-1=4$. This is so because the number of subjects in each of the first three categories is free to vary. But in order for the sum to be 100 -- the total number of subjects -- the number of subjects in the last category is fixed.

We notice that $\alpha=0.05$ and the $p-val=0.5894$.

Since $p-val>\alpha$, fail to reject $H_0$ and conclude that the percentages are not significantly different from those given in the the claim.

- Enter the observed frequencies in $L1$ and the expected frequencies in $L2$.
- Press 2nd [QUIT] to return to the home screen.
- Press 2nd [LIST], move the cursor to MATH, and press 5 for sum(.
- Type $(L1 - L2)^2/L2$, then press ENTER.

To calculate the P-value:

Press 2nd [DISTR] then press 7 to get $\chi^2 cdf($. (Use 8 on the TI-84)

For this P-value, the $\chi^2 cdf($ command has form $\chi^2 cdf($ test statistic, $10^9$ , degrees of freedom).

For this example use $\chi^2 \ cdf(2.814, \ 10^9, \ 4)$