Carrying Out a | AP Stats Unit 8 Study Guide

Quick Summary

This guide will equip you to perform a Chi-Square Test for Goodness of Fit, a powerful statistical tool used to determine if the distribution of a single categorical variable in a sample matches a hypothesized distribution in the population. You will learn to state appropriate hypotheses, verify the necessary conditions, calculate the chi-square test statistic and p-value, and draw a statistically sound conclusion in the context of a problem. This test allows you to answer questions like, "Do the colors of candies in this bag match the company's claimed percentages?"

Key Concepts

The Chi-Square (χ^2) Test for Goodness of Fit is an inference procedure used to test a claim about the distribution of a single categorical variable in a population. We compare the observed counts from a sample to the expected counts we would see if the null hypothesis were true.

1. The Purpose and Hypotheses

Purpose: To determine if a sample distribution of a categorical variable is consistent with a claimed population distribution.
Null Hypothesis (H₀): The stated distribution of the categorical variable in the population is correct. This is written by listing the claimed proportions for each category.
- Example (Equal Proportions): H₀: p₁ = 0.25, p₂ = 0.25, p₃ = 0.25, p₄ = 0.25.
- Example (Specified Proportions): H₀: p_red = 0.30, p_blue = 0.20, p_green = 0.50.
Alternative Hypothesis (Hₐ): The stated distribution is not correct. This is always written in words.
- Example: Hₐ: At least one of the proportions is different from the value stated in H₀.

2. Conditions for Inference

For the results of the test to be valid, three conditions must be met and checked.

Random: The data must come from a well-designed random sample or a randomized experiment. This ensures the sample is representative of the population.
10% Condition (Independence): When sampling without replacement, the sample size n should be no more than 10% of the population size (n \le 0.10N). This allows us to assume independence between observations.
Large Counts: All expected counts must be at least 5. This condition ensures that the chi-square sampling distribution is a good approximation. You must calculate and list all expected counts to verify this condition.

3. The Mechanics of the Test

The core of the test involves calculating a single statistic that measures the total difference between the observed and expected counts.

Calculating Expected Counts: For each category, the expected count is found by multiplying the total sample size (n) by the hypothesized population proportion (pᵢ) for that category.
- Formula: Expected Count = n * pᵢ
The Chi-Square Test Statistic (χ^2): This statistic is the sum of the squared differences between observed and expected counts, divided by the expected counts, for all categories. A larger χ^2 value indicates a greater discrepancy between what we observed and what we expected, providing more evidence against H₀.
- Formula:χ^2 = Σ [ (Observed - Expected)^2 / Expected ]
Degrees of Freedom (df): The shape of the chi-square distribution depends on the degrees of freedom. For a goodness of fit test, the calculation is simple.
- Formula:df = number of categories - 1

4. The Chi-Square Distribution and P-value

The Distribution: The chi-square distribution is a family of density curves that are:
- Always non-negative (the χ^2 statistic cannot be negative).
- Skewed to the right.
- The specific shape is determined by the degrees of freedom (df). As df increases, the curve becomes less skewed and more symmetric.
- [Image: A right-skewed chi-square distribution curve showing the p-value as the area to the right of the calculated chi-square statistic.]
Finding the P-value: The p-value is the probability of getting a χ^2 statistic as extreme or more extreme than the one calculated from the sample, assuming H₀ is true. For all chi-square tests, the p-value is the area in the right tail of the distribution.
- P-value = P(χ^2 \ge calculated test statistic)
Drawing a Conclusion:
- If p-value \le α (where α is the significance level), we reject H₀. We have convincing evidence that at least one of the population proportions is different from the claimed value.
- If p-value > α, we fail to reject H₀. We do not have convincing evidence that the population proportions differ from the claimed values.

Key Vocabulary

Chi-Square Test for Goodness of Fit: A statistical test used to determine whether a sample distribution of a single categorical variable fits a claimed population distribution.
Observed Counts: The actual number of observations from the sample that fall into each category.
Expected Counts: The number of observations we would anticipate seeing in each category if the null hypothesis (H₀) were true.
Chi-Square Test Statistic (χ^2): A measure of how far the observed counts are from the expected counts. It summarizes the total deviation for all categories into a single number.
Degrees of Freedom (df): For this test, it is the number of categories minus one. It defines the specific chi-square distribution curve used to calculate the p-value.
Chi-Square Distribution: A family of right-skewed probability distributions, indexed by degrees of freedom, used to model the sampling distribution of the chi-square statistic.

Calculator Tech (TI-84)

The TI-84 makes performing a Chi-Square Goodness of Fit test very efficient.

Step 1: Enter Your Data

Press STAT -> 1: Edit....
Enter the Observed Counts into list L1.
Enter the Expected Counts into list `L2$. You must calculate these first ( $n * p_{i}$ ).

Step 2: Perform the Test

Press STAT -> TESTS.
Scroll down to D: χ^2-GOF Test... and press ENTER.
You will see the following input screen:
- Observed: Enter `L1$ (by pressing $2 n d$ -> $1$ ).
- Expected: Enter `L2$ (by pressing $2 n d$ -> $2$ ).
- df: Enter the degrees of freedom (number of categories - 1).
Select $C a l c u l a t e$ and press ENTER.

Step 3: Interpret the Output

The calculator will display:

$χ^{2}$ : The calculated chi-square test statistic.
$p$ : The p-value.
$df$ : The degrees of freedom you entered.
`CNTRB$: A list of the components, $(O b ser v e d - E x p ec t e d)^{2} / E x p ec t e d$ , for each category. This is useful for follow-up analysis to see which categories contributed most to the χ^2 value.

How to Show Work on the FRQ

To receive full credit on an AP Free Response Question involving a Chi-Square Test for Goodness of Fit, you must use the four-step State-Plan-Do-Conclude framework.

State

Hypotheses: Define the parameters (p₁, p₂, etc.) in context and state the null (H₀) and alternative (Hₐ) hypotheses.
- Template:
  - H₀: The company's claimed distribution of colors is correct (p_red = 0.20, p_blue = 0.30, ...).
  - Hₐ: At least one of the color proportions is different from the company's claim.
Significance Level: State the significance level, α. If not given, use α = 0.05.

Plan

Name the Test: Identify the procedure you are using.
- Template: We will perform a Chi-Square Test for Goodness of Fit.
Check Conditions: Check the three conditions (Random, 10%, Large Counts) in context.
- Template:
  - Random: The problem states that a random sample of [sample size] [items] was taken.
  - 10% Condition: It is reasonable to assume the population of [items] is greater than [10 * sample size].
  - Large Counts: All expected counts are at least 5. The expected counts are: [List all calculated expected counts here, e.g., Red: 25, Blue: 30, ...].

Do

Calculate Test Statistic:
- Write the general formula: χ^2 = Σ [ (Observed - Expected)^2 / Expected ].
- Show the substitution for the first two terms, then use "..." and provide the final value from your calculator.
- Template: χ^2 = (Observed₁ - Expected₁)^2 / Expected₁ + (Observed₂ - Expected₂)^2 / Expected₂ + ...
- χ^2 = (18 - 25)^2 / 25 + (35 - 30)^2 / 30 + ... = [Final χ^2 value].
Provide Technical Details:
- State the degrees of freedom: df = [categories] - 1 = [df value].
- State the p-value from your calculator.
- Optional but good practice: Sketch the χ^2 curve, label the df, shade the area corresponding to the p-value, and label the χ^2 statistic.

Conclude

Decision: Compare the p-value to α and make a decision about H₀.
- Template: Because the p-value of [p-value] is [less than / greater than] α = [α value], we [reject / fail to reject] H₀.
Contextual Conclusion: State your conclusion in the context of the problem, addressing the alternative hypothesis.
- Template (if rejecting H₀): We have convincing evidence that the true distribution of [context] is different from the one claimed.
- Template (if failing to reject H₀): We do not have convincing evidence that the true distribution of [context] is different from the one claimed.

Practice Problems

Problem 1: A casino is concerned that one of its six-sided dice is not fair. They roll the die 300 times and record the outcomes. The results are shown in the table below. Does this data provide convincing evidence at the α = 0.05 level that the die is unfair?

Outcome	1	2	3	4	5	6
Frequency	41	62	53	45	58	41

Solution:

State:

We want to test if the die is unfair. A fair die would have an equal probability for each outcome (1/6). Let pᵢ be the true proportion of rolls for outcome i.
H₀: The die is fair (p₁ = p₂ = p₃ = p₄ = p₅ = p₆ = 1/6).
Hₐ: The die is not fair (at least one of the proportions is not 1/6).
We will use a significance level of α = 0.05.

Plan:

We will perform a Chi-Square Test for Goodness of Fit.
Conditions:
- Random: The problem does not state the rolls were random, but we will assume the 300 rolls are representative of all possible rolls of this die.
- 10% Condition: We are not sampling without replacement. The number of possible rolls is infinite, so this condition is met.
- Large Counts: The sample size is n = 300. For a fair die, we expect each outcome to occur 1/6 of the time. The expected count for each outcome is 300 * (1/6) = 50. Since all expected counts are 50, which is \ge 5, the condition is met.

Do:

The observed counts are given in the table. The expected count for each of the 6 categories is 50.
Test Statistic:
- χ^2 = Σ [ (Observed - Expected)^2 / Expected ]
- χ^2 = (41-50)^2/50 + (62-50)^2/50 + (53-50)^2/50 + (45-50)^2/50 + (58-50)^2/50 + (41-50)^2/50
- χ^2 = 1.62 + 2.88 + 0.18 + 0.50 + 1.28 + 1.62 = 8.08
Degrees of Freedom: df = number of categories - 1 = 6 - 1 = 5.
P-value: Using a TI-84 ( $χ^{2} c df (8.08, 1 E 99, 5)$ ) or the $χ^{2} - GOFT es t$ function, the p-value is 0.1518.

Conclude:

Because the p-value of 0.1518 is greater than α = 0.05, we fail to reject H₀.
We do not have convincing statistical evidence to conclude that the die is unfair. The observed differences from the expected counts of 50 could plausibly be due to random chance.

Problem 2: The U.S. Census Bureau reports the following age distribution for a certain state: 18-24 years (15%), 25-44 years (40%), 45-64 years (30%), and 65+ years (15%). A local city council suspects their city's age distribution is different from the state's. They take a random sample of 500 city residents and find 65 are 18-24, 220 are 25-44, 150 are 45-64, and 65 are 65+. Is there convincing evidence that the city's age distribution differs from the state's? Use α = 0.01.