Quick Summary
This guide covers the foundational process of calculating expected counts for two-way tables, a critical first step in performing a chi-square test for independence or homogeneity. You will learn the logic behind expected counts, master the formula for their calculation, and understand their role as a baseline for comparison against observed data. By the end of this lesson, you will be able to compute expected counts by hand and with a calculator, and know precisely how to present this work on the AP exam to earn full credit.
Key Concepts
The core idea behind a chi-square test is to compare what we actually observed in our sample (the observed counts) with what we would expect to see if a null hypothesis were true (the expected counts). A large difference between these observed and expected counts provides evidence against the null hypothesis. Topic 8.4 focuses exclusively on how to find these expected counts.
The Logic of Expected Counts
Imagine we are testing for an association between a student's grade level and their preferred type of movie. The null hypothesis (H₀) would be that there is no association between grade level and movie preference.
If there truly is no association (i.e., the variables are independent), then the distribution of movie preferences should be the same for every grade level. For example, if 25% of all students in the sample prefer comedies, we would expect that 25% of freshmen, 25% of sophomores, 25% of juniors, and 25% of seniors prefer comedies. The expected counts are the numbers that reflect this "no association" scenario perfectly.
The Formula for Expected Counts
To calculate the expected count for any specific cell in a two-way table, we use the following formula:
Formula:
Let's break this down with an example.
Example Table: Observed Counts
A survey asked 200 high school students about their primary source of news.
[Image: A 2x3 two-way table showing the observed counts for news source (Social Media, TV/Cable, Online News) versus grade level (Underclassman, Upperclassman).]
| Social Media | TV/Cable | Online News | Row Total | |
|---|---|---|---|---|
| Underclassman | 60 | 15 | 15 | 90 |
| Upperclassman | 40 | 25 | 45 | 110 |
| Column Total | 100 | 40 | 60 | 200 (Grand Total) |
Let's calculate the expected count for the cell "Underclassman and Social Media".
Identify the Row Total: The total for the "Underclassman" row is 90.
Identify the Column Total: The total for the "Social Media" column is 100.
Identify the Grand Total: The total number of students surveyed is 200.
Apply the Formula:
Interpretation: If there were no association between grade level and news source, we would have expected exactly 45 underclassmen to list social media as their primary source. We observed 60, which is higher than expected.
The Complete Table of Expected Counts
You must calculate an expected count for every cell in the table.
Underclassman & TV/Cable: (90 × 40) / 200 = 18
Underclassman & Online News: (90 × 60) / 200 = 27
Upperclassman & Social Media: (110 × 100) / 200 = 55
Upperclassman & TV/Cable: (110 × 40) / 200 = 22
Upperclassman & Online News: (110 × 60) / 200 = 33
Example Table: Expected Counts
| Social Media | TV/Cable | Online News | Row Total | |
|---|---|---|---|---|
| Underclassman | 45 | 18 | 27 | 90 |
| Upperclassman | 55 | 22 | 33 | 110 |
| Column Total | 100 | 40 | 60 | 200 (Grand Total) |
Important Note: The row and column totals for the expected counts table must be the same as in the original observed counts table. This is a great way to check your work.
Key Vocabulary
Two-Way Table: A table that displays the frequency distribution of two categorical variables. The rows represent the categories of one variable, and the columns represent the categories of the other.
Observed Counts: The actual frequencies collected from the sample data and recorded in each cell of a two-way table. This is your raw data.
Expected Counts: The theoretical frequencies we would anticipate in each cell of a two-way table if the null hypothesis (of no association or no difference in distributions) were true.
Cell: The intersection of a specific row and column in a two-way table. Each cell represents a unique combination of the two categorical variables.
Marginal Totals: The sums of the counts in each row and each column of a two-way table. These are found in the "margins" of the table (the Row Total and Column Total).
Calculator Tech (TI-84)
The TI-84 can calculate the entire table of expected counts for you, which is a huge time-saver. The process involves entering the observed counts into a matrix and running a chi-square test, which automatically generates the expected counts.
Step 1: Enter the Observed Counts into a Matrix
Press
2nd-> [MATRIX].Arrow over to
EDITand select1: [A].Enter the dimensions of your table (rows x columns). For our example, it's a 2x3 table. Press
ENTER.Type in the observed counts for each cell, pressing
ENTERafter each one. Do NOT enter the totals.```5. Press `2nd` -> `MODE` [QUIT] to return to the home screen. **Step 2: Run the Chi-Square Test to Generate Expected Counts** 1. Press `STAT` -> `TESTS`. 2. Scroll down and select $C: χ^2-Test....
You will see a menu.
should be (or whichever matrix you used).
should be . The calculator will automatically store the calculated expected counts in this matrix.
Arrow down to and press
ENTER.The calculator will display the results of the chi-square test (χ^2, p-value, df). For this topic, we ignore these results and focus on what the calculator did behind the scenes.
Step 3: View the Matrix of Expected Counts
Press
2nd->x⁻¹[MATRIX].Arrow over to and select .
You will now see the complete matrix of expected counts that the calculator computed.
MATRIX[B] 2x3
[ 45 18 27 ]
[ 55 22 33 ]
`` ## How to Show Work on the FRQ Calculating expected counts is a component of the "Do" step in the State-Plan-Do-Conclude (SPDC) framework for a full Chi-Square Test for Independence or Homogeneity. To receive full credit, you cannot simply copy the table of expected counts from your calculator. **You must show the calculation for at least one cell.** Here is the required template for the "Do" step: **DO:** The expected count for each cell is calculated using the formula: `Expected Count = (Row Total × Column Total) / Grand Total` For example, the expected count for [*describe the cell in context*] is: `Expected Count = ([Row Total Value] × [Column Total Value]) / [Grand Total Value] = [Result]` The complete table of expected counts is: [Re-create the full table of expected counts here. You can copy the values from matrix [B] on your calculator after performing the calculation as shown above.] **Example using the News Source data:** **DO:** The expected count for each cell is calculated using the formula: `Expected Count = (Row Total × Column Total) / Grand Total` For example, the expected count for Underclassmen who prefer Social Media is: `Expected Count = (90 × 100) / 200 = 45` The complete table of expected counts is: | | Social Media | TV/Cable | Online News | | :--- | :---: | :---: | :---: | | **Underclassman** | 45 | 18 | 27 | | **Upperclassman** | 55 | 22 | 33 | ## Practice Problems **Problem 1:** A random sample of 300 adults was asked about their highest level of education and whether they have a pet. The results are summarized in the table below. | | Has a Pet | No Pet | **Row Total** | | :--- | :---: | :---: | :---: | | **High School Diploma** | 82 | 58 | **140** | | **College Degree** | 91 | 69 | **160** | | **Column Total** | **173** | **127** | **300** | (a) Calculate the expected count of adults with a College Degree who have a pet. (b) Interpret this expected count in context. **Solution:** (a) To calculate the expected count for the "College Degree and Has a Pet" cell, we use the formula: $Expected Count = (Row Total × Column Total) / Grand Total
Row Total (College Degree) = 160
Column Total (Has a Pet) = 173
Grand Total = 300
Expected Count = (160 × 173) / 300 = 27680 / 300 \approx 92.27` (b) If there were no association between an adult's highest level of education and whether they have a pet, we would expect approximately 92.27 adults with a college degree in a sample of 300 to have a pet. --- **Problem 2:** A car dealership manager wants to know if there is an association between the type of vehicle sold and the gender of the buyer. She collects data from the last 250 sales. | | Sedan | SUV | Truck | **Row Total** | | :--- | :---: | :---: | :---: | :---: | | **Male** | 45 | 60 | 40 | **145** | | **Female** | 65 | 35 | 5 | **105** | | **Column Total** | **110** | **95** | **45** | **250** | Calculate the complete table of expected counts, assuming there is no association between vehicle type and buyer gender. Show your work for the expected count of Females who purchased an SUV. **Solution:** First, we show the sample calculation for one cell as required for an FRQ. The expected count for Females who purchased an SUV is: $Expected Count = (Row Total for Female × Column Total for SUV) / Grand Total
$Expected Count = (105 × 95) / 250 = 9975 / 250 = 39.9`
Next, we calculate the remaining expected counts:
Male & Sedan: (145 × 110) / 250 = 63.8
Male & SUV: (145 × 95) / 250 = 55.1
Male & Truck: (145 × 45) / 250 = 26.1
Female & Sedan: (105 × 110) / 250 = 46.2
Female & Truck: (105 × 45) / 250 = 18.9
The complete table of expected counts is:
| Sedan | SUV | Truck | |
|---|---|---|---|
| Male | 63.8 | 55.1 | 26.1 |
| Female | 46.2 | 39.9 | 18.9 |
Common Mistakes to Avoid
Using Observed Counts in the Formula: A very common error is to grab a number from inside the table (an observed count) instead of the row and column totals for the formula. Always use the numbers from the margins.
Not Showing the Formula on the FRQ: Relying solely on the calculator and just writing down the final table of expected counts will result in lost points. You must demonstrate you know the formula by showing the calculation for at least one cell.
Confusing Observed vs. Expected: Remember, observed is the real data you collected. Expected is the hypothetical data you calculate based on the null hypothesis being true. Don't mix them up when setting up your chi-square test statistic later.
Incorrectly Identifying Totals: Double-check that you are using the correct row total, column total, and the grand total. It's easy to mix them up in a larger table. Always trace the row and column for the cell you are calculating.
Minor Calculation Errors: Be careful with your arithmetic. A great way to check your work is to ensure the row and column totals of your new expected counts table match the totals from the original observed counts table. If they don't, you've made a calculation error.