PrepGo

AP Statistics Unit 8: Inference for Categorical Data: Chi-Square

Written by AP Content Team, Verified for 2026 AP Exams, Last updated: April 13, 2026

The Big Picture

Welcome to the world of categorical data! So far, we've focused on inference for means (quantitative data) and proportions (categorical data with only two outcomes, like success/failure). But what happens when our categories are more complex? What if we want to know if the color distribution in a bag of M&M's matches the company's claim, or if there's a relationship between a person's favorite music genre and their grade level?

This unit introduces the Chi-Square (χ²) test, a powerful tool designed specifically for these situations. Think of it as a way to measure the "overall gap" between the data you actually observed in your sample and the data you expected to see based on a hypothesis. If the gap between observation and expectation is too large to be explained by random chance alone, we'll have evidence to reject our initial hypothesis. This single, versatile test allows us to tackle questions about how well a sample distribution fits a claim and whether two categorical variables are related to each other.

Key Questions

  • How can we test if the distribution of a single categorical variable in our sample matches a known or hypothesized distribution for the population?

  • How do we determine if there is a statistically significant association between two categorical variables?

  • When comparing multiple groups, how can we test whether the distribution of a categorical variable is the same across all of them?

  • What are the specific conditions required for a Chi-Square test, and how do they differ from the conditions for tests on proportions and means?

Your Learning Path

1. Testing a Single Distribution: The Goodness-of-Fit Test

Topic 8.1 - 8.3: From Observation to Conclusion

You'll begin by learning the fundamental Chi-Square procedure: the Goodness-of-Fit test. This is the starting point for the entire unit. You will learn how to take a single sample from one population and compare the distribution of its categorical variable to a claimed or hypothesized distribution. This involves stating hypotheses in words, calculating the expected counts for each category, checking the necessary conditions, and finally, calculating the Chi-Square test statistic and P-value to draw a conclusion.

2. Analyzing Two-Way Tables: Homogeneity and Independence

Topic 8.4 - 8.6: Comparing Multiple Groups or Multiple Variables

Next, you'll expand the Chi-Square test to analyze data organized in two-way tables. You'll first learn how to calculate expected counts in this new format. Then, you'll apply the test to two distinct scenarios. The Chi-Square Test for Homogeneity is used when you have samples from two or more independent populations (or groups) and want to see if the distribution of a single categorical variable is the same across all of them. The Chi-Square Test for Independence is used when you have a single sample and are measuring two different categorical variables on each individual to see if there is an association between them. While the calculations are identical, the data collection methods, hypotheses, and conclusions are different.

3. Choosing the Right Tool: The Final Skill

Topic 8.7: Selecting the Correct Categorical Inference Procedure

This is where it all comes together. You've learned z-tests for one and two proportions, and now three different Chi-Square tests. In this final topic, you will practice analyzing a problem and selecting the appropriate statistical test. Distinguishing between a two-proportion z-test, a Chi-Square Goodness-of-Fit test, and a Chi-Square test for a two-way table is a critical skill for success on the AP Exam.

How to Succeed in This Unit

  • Conditions are Crucial and Different: The main condition for all Chi-Square tests is the "Large Counts" condition, which states that all expected counts must be at least 5. Do not check the observed (actual) counts. You must show your work by listing or calculating the expected counts to receive full credit.

  • Hypotheses Must Be in Words: Unlike tests for means or proportions, you will not use statistical symbols like p or μ in your hypotheses for Chi-Square tests. You must write them out in clear sentences. For example, "The company's claimed distribution of candy colors is correct" or "There is no association between gender and political affiliation."

  • Distinguish Homogeneity vs. Independence: These two tests feel identical, but their setup is different. The key is to ask: "How was the data collected?" If the researchers took separate, independent samples from different populations (e.g., a sample of 100 freshmen and a separate sample of 100 seniors), it's a test for Homogeneity. If they took one single sample and measured two variables on each person (e.g., a sample of 200 students, asking each their grade level and favorite music), it's a test for Independence.

  • Show Your Work for the Test Statistic: Even though your calculator will compute the Chi-Square statistic, you are required to show the formula (χ² = Σ (Observed - Expected)² / Expected) and substitute values for at least the first two terms. Also, be sure to correctly state the degrees of freedom (df = number of categories - 1 for GOF, or df = (rows - 1)(columns - 1) for two-way tables).