PrepGo

Representing Two Categorical Variables - AP Statistics Study Guide

Written by AP Content Team, Verified for 2026 AP Exams, Last updated: May 2026

Learn with study guides reviewed by top AP teachers. This guide takes about 14 minutes to read.

Quick Summary

This guide will equip you to analyze the relationship between two categorical variables. You will learn to organize data in two-way tables, calculate and interpret joint, marginal, and conditional probabilities, and create powerful visual displays like segmented and side-by-side bar charts. Ultimately, you will be able to use this evidence to determine and clearly describe whether an association exists between the two variables.

Key Concepts

When we collect data on two different categorical variables for the same group of individuals, we need special tools to explore the potential relationship between them.

1. Two-Way Tables (Contingency Tables)

A two-way table is the primary tool for organizing two categorical variables. One variable's categories form the rows, and the other's form the columns.

  • Explanatory Variable: Often placed as the columns. This is the variable we think might influence or explain the other.

  • Response Variable: Often placed as the rows. This is the outcome variable we are interested in.

  • Cells: The intersections of rows and columns, containing a count (frequency) for that specific combination of categories.

  • Totals: The sum of each row and column are called marginal totals. The grand total of all observations is in the bottom-right corner.

Example: A survey asked 200 high school students about their primary source of news and their class year.

FreshmanSophomoreJuniorSeniorTotal
Social Media3530201095
TV/Online News510152555
Family/Friends1010151550
Total50505050200

[Image: A clearly labeled two-way table showing the news source data.]

2. Distributions from Two-Way Tables

From a single table, we can calculate three types of distributions.

  • Joint Relative Frequency: The proportion of the entire sample that falls into a specific cell.

    • Formula:

    • Question: What proportion of all students are Juniors who get news from Social Media?

    • Calculation: or 10%.

  • Marginal Relative Frequency: The proportion of the entire sample that falls into a specific row or column category. It describes the distribution of one variable alone, ignoring the other.

    • Formula:

    • Question: What proportion of all students are Seniors?

    • Calculation: or 25%.

  • Conditional Relative Frequency: The proportion of individuals in a specific category of one variable who also fall into a specific category of the other variable. This is the most important calculation for finding associations. The "condition" becomes your denominator.

    • Formula:

    • Question: What proportion of Seniors get their news from Social Media? (The condition is "being a Senior").

    • Calculation: or 20%.

    • Compare: What proportion of Freshmen get their news from Social Media?

    • Calculation: or 70%.

3. Visualizing Two Categorical Variables

We use special bar charts to visualize the conditional distributions we calculated above.

  • Side-by-Side Bar Chart: Displays bars for each category of the response variable, clustered together for each category of the explanatory variable. It's excellent for direct height comparison.

    [Image: A side-by-side bar chart. The x-axis is "Class Year". For "Freshman", there are three bars side-by-side representing the % who chose Social Media, TV/News, and Family. This is repeated for Sophomore, Junior, and Senior.]

  • Segmented Bar Chart: Each category of the explanatory variable is represented by a single bar that totals 100%. Each bar is divided into segments corresponding to the conditional relative frequency of the response variable categories. This is excellent for seeing how the proportional breakdown changes across categories.

    [Image: A segmented bar chart. The x-axis is "Class Year". There are four bars, one for each class year, all reaching 100%. The "Freshman" bar is segmented to show 70% Social Media, 10% TV/News, and 20% Family. The other bars are segmented with their respective percentages.]

4. Describing Association

The core goal is to determine if there is a relationship, or association, between the two variables.

  • An association exists if knowing the value of one variable helps you predict the value of the other. We find this by comparing conditional distributions.

    • In our example: The conditional probability of getting news from Social Media is 70% for Freshmen but only 20% for Seniors. Because these percentages are different, knowing a student's class year helps us predict their likely news source. Therefore, there is an association between class year and news source.
  • Independence is the opposite of association. Two variables are independent if the conditional distributions of the response variable are the same (or nearly the same) for all categories of the explanatory variable.

Key Vocabulary

  • Two-Way Table: A table that displays the frequency counts for two categorical variables simultaneously.

  • Marginal Distribution: The distribution of values of one of the categorical variables in a two-way table of counts, without regard to the values of the other variable. Calculated as .

  • Joint Relative Frequency: The proportion of observations that have specific values for both categorical variables. Calculated as .

  • Conditional Distribution: The distribution of a response variable for a specific category of an explanatory variable. It describes the values of one variable among individuals who have a specific value of another variable. Calculated as .

  • Association: A relationship between two variables where the value of one variable can help predict the value of the other. This is evident when conditional distributions are different.

  • Independence: The state where there is no relationship between two variables. This is evident when conditional distributions are the same.

  • Segmented Bar Chart: A graph used to compare conditional distributions, where each category of the explanatory variable is represented by a single bar segmented into parts representing the proportions of the response variable.

Calculator Tech (TI-84)

While most calculations for this topic are simple proportions, you can use the matrix function on your TI-84 to store two-way tables. This is excellent for organization and is a required skill for Chi-Squared tests in Unit 9.

To enter a two-way table into a matrix:

  1. Press 2nd -> .

  2. Navigate to the EDIT menu at the top.

  3. Select a matrix, for example, 1: [A].

  4. Enter the dimensions of your table (Rows x Columns). For our news source example, it's a 3x4 table (don't include the totals).

  5. Enter the counts from the table cells, pressing ENTER after each one. The calculator will automatically move from left to right across each row.

  6. Press 2nd -> to exit. The data is now stored in Matrix [A].

This doesn't perform calculations for you now, but it keeps your data organized for later use.

How to Show Work on the FRQ

On the AP Exam, you will be asked to determine if an association exists between two categorical variables based on a table or graph. To get full credit, your response must be a clear, concise argument that includes numerical evidence.

FRQ Template: Describing Association

  1. Claim: State clearly whether an association exists.

    • "Yes, there appears to be an association between [Variable 1 in context] and [Variable 2 in context]."
  2. Evidence & Comparison: Provide specific, calculated conditional probabilities as evidence. You must explicitly compare at least two of these values.

    • "To support this, we can compare the conditional distributions of [Response Variable] for different categories of [Explanatory Variable]. For example, the proportion of [Category A of Explanatory Var] that are [Category X of Response Var] is (calculation = X%), while the proportion of [Category B of Explanatory Var] that are [Category X of Response Var] is (calculation = Y%)."
  3. Conclusion in Context: Link the difference in percentages back to the definition of association.

    • "Because these proportions (X% and Y%) are different, it shows that knowing the [Explanatory Variable] helps predict the [Response Variable]. Therefore, an association exists."

Practice Problems

Problem 1:

A random sample of 300 adults was asked about their highest level of education and their primary way of consuming books. The results are in the table below.

E-ReaderPhysical BookAudiobookTotal
High School20601090
Bachelor's407020130
Graduate50201080
Total11015040300

(a) What proportion of adults with a Bachelor's degree prefer physical books?

(b) Create a segmented bar chart to display the relationship between education level and book consumption method.

(c) Based on your chart, is there an association between education level and book consumption method? Justify your answer.

Solution:

(a) This is a conditional relative frequency. The condition is "having a Bachelor's degree."

  • The number of Bachelor's degree adults who prefer physical books is 70.

  • The total number of adults with a Bachelor's degree is 130.

  • Proportion = 70 / 130 \approx 0.538 or 53.8%

(b) First, calculate the conditional percentages for each education level (the rows).

  • High School: E-Reader (20/90=22.2%), Physical (60/90=66.7%), Audiobook (10/90=11.1%)

  • Bachelor's: E-Reader (40/130=30.8%), Physical (70/130=53.8%), Audiobook (20/130=15.4%)

  • Graduate: E-Reader (50/80=62.5%), Physical (20/80=25%), Audiobook (10/80=12.5%)

Now, create the chart:

[Image: A segmented bar chart with "Education Level" on the x-axis. There are three bars (High School, Bachelor's, Graduate), each 100% tall. The "High School" bar is segmented to show 66.7% Physical, 22.2% E-Reader, 11.1% Audiobook. The other two bars are segmented with their respective calculated percentages.]

(c) Using the FRQ template:

Claim: Yes, there is an association between education level and primary book consumption method for adults in this sample.

Evidence & Comparison: The conditional distributions of book consumption method are different across the education levels. For example, the proportion of adults with a Graduate degree who prefer E-Readers is 62.5% (50/80), which is much higher than the proportion of adults with a High School diploma who prefer E-Readers, at 22.2% (20/90).

Conclusion in Context: Because these percentages are notably different, knowing an adult's education level helps us predict their preferred method of book consumption.


Problem 2:

The side-by-side bar chart below shows the results of a survey that asked students at a large university whether they lived on-campus or off-campus, separated by their primary mode of transportation.

[Image: A side-by-side bar chart. The x-axis has three categories: "Car", "Bus", "Walk/Bike". For each category, there are two bars: "On-Campus" and "Off-Campus".

  • For "Car": On-Campus is 10%, Off-Campus is 90%.

  • For "Bus": On-Campus is 30%, Off-Campus is 70%.

  • For "Walk/Bike": On-Campus is 80%, Off-Campus is 20%.]

Is there an association between a student's primary mode of transportation and their housing status (on-campus vs. off-campus)? Provide statistical evidence to support your answer.

Solution:

Claim: Yes, there is a strong association between a student's primary mode of transportation and their housing status at this university.

Evidence & Comparison: The conditional distributions of housing status differ significantly across the transportation types. For students who primarily walk or bike, 80% live on-campus. This is drastically different from students who primarily drive a car, of whom only 10% live on-campus.

Conclusion in Context: Because the percentage of students living on-campus is very different depending on their mode of transportation (80% for walkers/bikers vs. 10% for drivers), knowing a student's transportation method allows us to make a much better prediction about their housing status. Therefore, an association exists.

Common Mistakes to Avoid

  • Denominator Error: The most common mistake is using the wrong denominator. For a conditional probability "P(A given B)", the denominator MUST be the total for condition B (a row or column total), NOT the grand total.

  • Comparing Raw Counts Instead of Proportions: Stating "More Bachelor's degree holders prefer physical books (70) than High School graduates (60)" is not a valid argument for association. The group sizes are different (130 vs. 90). You MUST compare percentages or proportions to account for different group sizes.

  • Failing to Make an Explicit Comparison: Simply listing percentages is not enough. For example, "22.2% of High School grads prefer E-Readers and 62.5% of Graduate degree holders prefer E-Readers." You must explicitly state that these percentages are different to justify your claim of an association.

  • Citing Joint or Marginal Frequencies as Evidence: When asked about association, you must use conditional frequencies. Stating that "the largest group in the survey was Bachelor's degree holders who read physical books" (a joint frequency) does not provide evidence for or against an association.