PrepGo

Setting Up a Chi-Square Test for Homogeneity or Independence - AP Statistics Study Guide

Written by AP Content Team, Verified for 2026 AP Exams, Last updated: May 2026

Learn with study guides reviewed by top AP teachers. This guide takes about 17 minutes to read.

Quick Summary

This guide will equip you to set up a chi-square test for either homogeneity or independence. You will learn to distinguish between these two tests based on the data collection method, correctly formulate the null and alternative hypotheses for each, and verify the necessary conditions for inference, including the crucial calculation of expected counts.

Key Concepts

Chi-square (χ^2) tests for two-way tables are used to determine if there is a statistically significant relationship between two categorical variables. The core idea is to compare the observed counts (the actual data collected) in each cell of a two-way table to the expected counts—the counts we would anticipate if the null hypothesis were true. Topic 8.5 focuses on the critical "setup" phase: identifying the correct test, stating hypotheses, and checking conditions.

Distinguishing Between Homogeneity and Independence

The key to telling these two tests apart lies in the sampling method. While the calculations are identical, the research question and data collection method are different.

  • Chi-Square Test for Homogeneity:

    • Question: Are the distributions of a single categorical variable the same across several different populations or treatment groups?

    • Sampling Method: You have two or more independent random samples (or groups in a randomized experiment), one from each population/group. You then measure the same categorical variable for each sample.

    • Example: A researcher wants to know if the distribution of favorite music genres (Pop, Rock, Hip-Hop, Country) is the same for students from three different high schools. They take a separate random sample of 100 students from each of the three schools and ask their favorite genre. Here, the three schools are the different populations.

  • Chi-Square Test for Independence:

    • Question: Is there an association (or relationship) between two categorical variables within a single population?

    • Sampling Method: You have one single random sample from one population. You then classify each individual in the sample according to two different categorical variables.

    • Example: A researcher wants to know if there is an association between a student's grade level (Freshman, Sophomore, Junior, Senior) and their preferred method of news consumption (Social Media, TV, Print) at a single large high school. They take one random sample of 400 students from the school and record both variables for each student.

Stating Hypotheses

The wording of your hypotheses must match the type of test you are performing.

  • For a Test for Homogeneity:

    • H₀ (Null Hypothesis): The distribution of [Categorical Variable] is the same for all populations or groups.

    • Hₐ (Alternative Hypothesis): The distribution of [Categorical Variable] is not the same for all populations or groups.

  • For a Test for Independence:

    • H₀ (Null Hypothesis): There is no association between [Variable 1] and [Variable 2] in the population of interest. (Alternatively: [Variable 1] and [Variable 2] are independent.)

    • Hₐ (Alternative Hypothesis): There is an association between [Variable 1] and [Variable 2] in the population of interest. (Alternatively: [Variable 1] and [Variable 2] are not independent.)

Conditions for Inference

Before proceeding with calculations, you must verify three conditions. These are the same for both tests.

  1. Random: The data must come from well-designed random samples or a randomized experiment.

    • For Homogeneity: You must have independent random samples from each population.

    • For Independence: You must have one random sample from the population of interest.

  2. 10% Condition (Independence of observations): When sampling without replacement, the sample size (n) should be no more than 10% of the population size (N) for each sample. This ensures that individual observations are approximately independent. (i.e., n \le 0.10N).

  3. Large Counts: All expected counts must be at least 5. It is NOT sufficient for the observed counts to be \ge 5. You must calculate and report the expected counts to check this condition.

Calculating Expected Counts

The expected count for any cell in a two-way table represents the number of observations we would expect in that cell if the null hypothesis (H₀) were true.

  • Formula:

    Expected Count = (Row Total × Column Total) / Grand Total

[Image: A generic 2x2 two-way table. The cells are labeled as observed counts. The margins are labeled "Row 1 Total," "Row 2 Total," "Column 1 Total," "Column 2 Total." The bottom-right corner is labeled "Grand Total."]

Key Vocabulary

  • Two-Way Table: A table that displays the relationship between two categorical variables. The rows represent one variable, and the columns represent the other.

  • Observed Counts: The actual frequencies or counts of individuals recorded in each cell of a two-way table from the sample data.

  • Expected Counts: The theoretical frequencies or counts that would be expected in each cell of a two-way table if the null hypothesis were true.

  • Chi-Square Test for Homogeneity: An inference procedure used to determine if the distribution of a single categorical variable is the same across two or more distinct populations or treatment groups.

  • Chi-Square Test for Independence: An inference procedure used to determine if there is a statistically significant association between two categorical variables within a single population.

  • Categorical Variable: A variable that places an individual into one of several groups or categories (e.g., eye color, political affiliation, grade level).

Calculator Tech (TI-84)

While the full test is run in the next lesson, your calculator is essential for checking the Large Counts condition efficiently. It can calculate the entire matrix of expected counts for you.

Steps to find Expected Counts:

  1. Enter the Observed Counts into a Matrix:

    • Press 2nd -> [MATRIX].

    • Arrow over to EDIT and select a matrix, for example, 1:[A].

    • Enter the dimensions of your table (rows x columns). Do NOT include the "Total" rows or columns.

    • Type in your observed counts, pressing ENTER after each one.

    • Press 2nd -> MODE [QUIT] to return to the home screen.

  2. Run the Test to Generate the Expected Matrix:

    • Press STAT -> TESTS.

    • Scroll down and select C:χ^2-Test....

    • For `Observed:[A], choose a different matrix where the calculator will store the results (e.g., ). The calculator will create this matrix automatically.

    • Select or .

  3. View the Expected Counts:

    • The test results (χ^2 statistic, p-value) will be displayed. We will use these in the next lesson.

    • To see the expected counts, press 2nd -> [MATRIX].

    • Arrow over to EDIT and select the matrix you designated for the expected counts (e.g., 2:[B]).

    • You can now view all the calculated expected counts and verify that they are all \ge 5.

How to Show Work on the FRQ

For any inference procedure, you must use the State-Plan-Do-Conclude (SPDC) framework. For Topic 8.5, we focus on the State and Plan steps, plus the initial calculation in the Do step.

State (Hypotheses and Significance Level)

  1. Define Parameters: Clearly define the populations and the categorical variable(s) in the context of the problem.

  2. State Hypotheses: Choose the correct test (Homogeneity or Independence) and write the hypotheses in words.

    • Homogeneity Template:

      • H₀: The distribution of [categorical variable] is the same for the populations of [population 1], [population 2], etc.

      • Hₐ: The distribution of [categorical variable] is not the same for the populations of [population 1], [population 2], etc.

    • Independence Template:

      • H₀: There is no association between [variable 1] and [variable 2] for the population of [population name].

      • Hₐ: There is an association between [variable 1] and [variable 2] for the population of [population name].

  3. Significance Level: State the alpha (α) level if given, or choose α = 0.05 if not.

Plan (Name the Test and Check Conditions)

  1. Name the Test: "We will perform a chi-square test for [homogeneity OR independence]."

  2. Check Conditions:

    • Random: Confirm that the data came from random samples or a randomized experiment, as described in the problem.

    • 10% Condition: If sampling without replacement, show that each sample size is less than 10% of its respective population size. (e.g., "100 high school seniors is likely less than 10% of all high school seniors in the district.")

    • Large Counts: State that all expected counts must be \ge 5. You MUST show the calculation for at least one expected count by hand using the formula: . Then, you can state that all other expected counts were found using a calculator and list them (often in a table) or simply state that all are \ge 5.

Do (Calculations)

For Topic 8.5, the key "Do" step is calculating the expected counts, which you already did in the "Plan" step to check the Large Counts condition. The full test statistic and p-value are covered in the next topic.

Practice Problems

Problem 1:

A university administrator is interested in whether the distribution of primary transportation methods (Car, Public Transit, Walk/Bike) is the same for undergraduate and graduate students. They take a random sample of 200 undergraduate students and a separate random sample of 100 graduate students and record their primary transportation method. The results are below.

CarPublic TransitWalk/BikeTotal
Undergraduate1105040200
Graduate404515100
Total1509555300

Set up an appropriate hypothesis test. State the hypotheses and check the conditions for inference.

Solution:

State:

We want to test if the distribution of transportation methods is the same for undergraduate and graduate students at this university. We will use a significance level of α = 0.05.

  • H₀: The distribution of primary transportation methods is the same for undergraduate and graduate students at this university.

  • Hₐ: The distribution of primary transportation methods is not the same for undergraduate and graduate students at this university.

Plan:

The data come from two separate random samples (undergraduates and graduates), and we are comparing the distribution of a single categorical variable (transportation method). Therefore, we will perform a chi-square test for homogeneity.

  • Random: The problem states that the data come from a random sample of 200 undergraduates and a separate random sample of 100 graduate students.

  • 10% Condition: It is reasonable to assume that 200 is less than 10% of all undergraduate students and 100 is less than 10% of all graduate students at this university.

  • Large Counts: We must check that all expected counts are \ge 5. The expected counts are calculated as .

    • Expected (Undergrad, Car): (200 * 150) / 300 = 100

    • Expected (Undergrad, Transit): (200 * 95) / 300 = 63.33

    • Expected (Undergrad, Walk/Bike): (200 * 55) / 300 = 36.67

    • Expected (Grad, Car): (100 * 150) / 300 = 50

    • Expected (Grad, Transit): (100 * 95) / 300 = 31.67

    • Expected (Grad, Walk/Bike): (100 * 55) / 300 = 18.33

    All expected counts (100, 63.33, 36.67, 50, 31.67, 18.33) are \ge 5. The conditions for inference are met.


Problem 2:

A market research firm surveyed a single random sample of 500 adults in a large city to investigate a possible association between age group and preferred coffee shop. The results are summarized in the table below.

Age GroupLocal CafeChainNo PreferenceTotal
18-34906020170
35-54759515185
55+557020145
Total22022555500

Set up an appropriate hypothesis test. State the hypotheses and check the conditions for inference.

Solution:

State:

We want to test for an association between age group and preferred coffee shop for adults in this city. We will use a significance level of α = 0.05.

  • H₀: There is no association between age group and preferred coffee shop for adults in this city.

  • Hₐ: There is an association between age group and preferred coffee shop for adults in this city.

Plan:

The data come from a single random sample, and each individual is classified by two categorical variables (age group and coffee preference). Therefore, we will perform a chi-square test for independence.

  • Random: The problem states the data were collected from a single random sample of 500 adults.

  • 10% Condition: It is reasonable to assume that 500 adults is less than 10% of all adults in a large city.

  • Large Counts: We must check that all expected counts are \ge 5.

    • We must show at least one calculation by hand:

      Expected (18-34, Local Cafe) = (170 * 220) / 500 = 74.8

    • Using a calculator for the rest, the expected counts are:

Age GroupLocal CafeChainNo Preference
18-3474.876.518.7
35-5481.483.2520.35
55+63.865.2515.95
All expected counts are well above 5. The conditions for inference are met.

Common Mistakes to Avoid

  1. Confusing Homogeneity and Independence: This is the most common error. Remember the key: Homogeneity involves multiple samples from different populations. Independence involves one sample with two variables measured. Always identify the sampling method first.

  2. Incorrect Hypotheses: Do not use symbols like μ or p in chi-square hypotheses. The hypotheses are always written in words about distributions or association. Make sure your wording matches the test type (e.g., "same distribution" for homogeneity, "no association" for independence).

  3. Checking Conditions on Observed Counts: The Large Counts condition applies to EXPECTED counts, not the observed data. You can have an observed count of 0, 1, or 2, but as long as the corresponding expected count is at least 5, the condition is met.

  4. Not Showing an Expected Count Calculation: On the FRQ, you must show the formula and substitution for at least one expected count to receive full credit for the conditions check, even if you use your calculator to find the rest. Don't just list the expected counts without showing how you got one of them.