PrepGo

Carrying Out a Test for the Difference of Two Population Means - AP Statistics Study Guide

Written by AP Content Team, Verified for 2026 AP Exams, Last updated: May 2026

Learn with study guides reviewed by top AP teachers. This guide takes about 18 minutes to read.

Quick Summary

This guide will equip you to perform a significance test for the difference between two population means. You will learn to identify the correct inference procedure, verify the necessary conditions for a two-sample t-test, and calculate the test statistic and P-value. Ultimately, you will be able to draw a statistically sound conclusion about the difference between two means in the context of a real-world problem.

Key Concepts

A two-sample t-test for the difference of two population means (μ₁ - μ₂) is the appropriate procedure when we want to compare the means of a quantitative variable from two independent groups. For example, we might want to compare the average battery life of two different smartphone brands or the mean effectiveness of a new drug compared to a placebo.

The Four-Step Inference Process

We use the State-Plan-Do-Conclude framework for all significance tests.

1. State: Hypotheses and Significance Level

  • The goal is to assess evidence for a claim about the difference between two population means, μ₁ - μ₂.

  • The null hypothesis (H₀) always states that there is no difference between the population means. It is a statement of "no effect" or "no change."

    • H₀: μ₁ - μ₂ = 0 (or equivalently, H₀: μ₁ = μ₂)
  • The alternative hypothesis (Hₐ) is what we are trying to find evidence for. It can be one-sided or two-sided.

    • One-Sided (greater than): Hₐ: μ₁ - μ₂ > 0 (or Hₐ: μ₁ > μ₂)

    • One-Sided (less than): Hₐ: μ₁ - μ₂ < 0 (or Hₐ: μ₁ < μ₂)

    • Two-Sided: Hₐ: μ₁ - μ₂ \neq 0 (or Hₐ: μ₁ \neq μ₂)

  • You must also define your parameters (μ₁ and μ₂) in context and state the significance level (α), which is usually given in the problem (e.g., α = 0.05).

2. Plan: Name the Test and Check Conditions

  • Name the Test: You must explicitly state the name of the procedure: Two-sample t-test for a difference of means (μ₁ - μ₂).

  • Check the Conditions:

    • Random: The data must come from two independent random samples or from two groups in a randomized experiment. This is crucial for generalizing results to the populations.

    • 10% Condition (Independence within samples): When sampling without replacement, check that the sample sizes are no more than 10% of their respective population sizes. (n₁ \le 0.10N₁ and n₂ \le 0.10N₂). This ensures that individual observations are reasonably independent.

    • Normal/Large Sample: The sampling distribution of the difference in sample means (x̄₁ - x̄₂) must be approximately Normal. This is met if any of the following are true:

      • Both population distributions are stated to be Normal.

      • Both sample sizes are large (n₁ \ge 30 and n₂ \ge 30), by the Central Limit Theorem.

      • If sample sizes are small (n < 30), you must examine graphs of the sample data (like a boxplot or histogram) and confirm there is no strong skewness or outliers in either group.

3. Do: Calculations

  • If conditions are met, you calculate the test statistic and the P-value.

  • General Formula for a Test Statistic:

  • Formula for the Two-Sample t-Test Statistic:

    t = ( (x̄₁ - x̄₂) - (μ₁ - μ₂) ) / sqrt( (s₁^2/n₁) + (s₂^2/n₂) )

    • Where:

      • x̄₁ and x̄₂ are the sample means.

      • s₁ and s₂ are the sample standard deviations.

      • n₁ and n₂ are the sample sizes.

      • (μ₁ - μ₂) is the hypothesized difference from H₀, which is almost always 0.

  • Degrees of Freedom (df): The t-distribution requires degrees of freedom. You have two options:

    • Technology (Recommended): Use your calculator's function, which calculates a precise, complex formula for df.

    • Conservative Method (By Hand): Use the smaller of the two values: df = min(n₁ - 1, n₂ - 1). This is acceptable on the AP exam but will result in a slightly larger P-value.

  • P-value: The P-value is the probability of getting a test statistic as extreme or more extreme than the one observed, assuming the null hypothesis is true. You find this using the t-distribution with the calculated df.

    [Image: A t-distribution curve showing the shaded area in the tail(s) corresponding to the P-value for a one-sided and two-sided test.]

4. Conclude: Interpret the Results

  • Your conclusion has two parts:

    • Decision: Compare the P-value to your significance level (α).

      • If P-value \le α, you reject the null hypothesis (H₀).

      • If P-value > α, you fail to reject the null hypothesis (H₀).

    • Interpretation in Context: State your conclusion in the context of the problem. Explain what the decision means regarding the alternative hypothesis.

      • If you reject H₀, state: "We have convincing evidence that [alternative hypothesis in words]."

      • If you fail to reject H₀, state: "We do not have convincing evidence that [alternative hypothesis in words]."

Key Vocabulary

  • Two-Sample t-test for Means: A statistical inference procedure used to determine if there is a significant difference between the means of two independent populations.

  • Independent Samples: Samples where the selection of individuals for one sample has no bearing on the selection of individuals for the other sample.

  • Standard Error of the Difference: An estimate of the standard deviation of the sampling distribution of the difference between two sample means (x̄₁ - x̄₂). The formula is .

  • Degrees of Freedom (df): For a two-sample t-test, it's a value that determines the specific t-distribution to use. It is calculated using a complex formula by technology or approximated by the smaller of n₁-1 and n₂-1.

  • P-value: The probability of observing a difference in sample means as extreme or more extreme than the one actually observed, assuming the null hypothesis (that there is no difference in population means) is true.

Calculator Tech (TI-84)

The function is your primary tool for this topic.

STAT -> TESTS -> 4: 2-SampTTest...

You will have two input options: or .

1. Using (when you have summary statistics):

  • Inpt: Select .

  • x̄1: Mean of the first sample.

  • Sx1: Standard deviation of the first sample.

  • n1: Sample size of the first sample.

  • x̄2: Mean of the second sample.

  • Sx2: Standard deviation of the second sample.

  • n2: Sample size of the second sample.

  • μ1: Choose the relationship from your alternative hypothesis (Hₐ): , , or .

  • Pooled:ALWAYS select . The AP Statistics curriculum does not require or use pooled standard deviations for the two-sample t-test for means.

  • Calculate: Press ENTER.

2. Using (when you have raw data in lists):

  • Enter the data for the first group into list L1 and the second group into list L2.

  • Inpt: Select .

  • List1: L1 (or whichever list you used).

  • List2: L2 (or whichever list you used).

  • Freq1: 1 (unless you have a frequency list).

  • Freq2: 1 (unless you have a frequency list).

  • μ1: Choose the relationship from your alternative hypothesis (Hₐ): , , or .

  • Pooled:ALWAYS select .

  • Calculate: Press ENTER.

The output screen will give you the t-statistic, the P-value, the degrees of freedom (df), and the sample statistics.

How to Show Work on the FRQ

To earn full credit on a Free Response Question involving a two-sample t-test, you must use the State-Plan-Do-Conclude framework. Follow this template precisely.


FRQ Response Template: Two-Sample t-Test for μ₁ - μ₂

STATE:

  • H₀: μ₁ - μ₂ = 0

  • Hₐ: μ₁ - μ₂ [ >, <, or \neq ] 0

  • Parameters:

    • μ₁ = the true mean [context of group 1].

    • μ₂ = the true mean [context of group 2].

  • Significance Level: α = [level stated in problem, e.g., 0.05].

PLAN:

  • Name of Procedure: We will perform a two-sample t-test for a difference of population means (μ₁ - μ₂).

  • Check Conditions:

    • Random: The data come from [state how: two independent random samples OR two groups in a randomized experiment].

    • 10% Condition: When sampling without replacement, we assume the population sizes are at least [10 * n₁] and [10 * n₂] respectively. (e.g., "It is reasonable to assume there are more than 1035=350 widgets of Brand A and 1040=400 widgets of Brand B.")

    • Normal/Large Sample:

      • (If n₁ \ge 30 and n₂ \ge 30): "Since both sample sizes (n₁ = __ and n₂ = __) are at least 30, the sampling distribution of (x̄₁ - x̄₂) is approximately Normal by the Central Limit Theorem."

      • (If n₁ < 30 or n₂ < 30): "Since the sample sizes are small, we must check graphs of the sample data. [Sketch or describe the boxplots/dotplots]. The plots show no strong skewness or outliers, so we can assume the sampling distribution is approximately Normal."

DO:

  • Test Statistic:

  • P-value:

    • Using df = [value from calculator or conservative df], the P-value is [value].

    • It is essential to report the t-statistic, degrees of freedom, and P-value. You can get these directly from your calculator's output.

CONCLUDE:

  • Decision: Because the P-value of [value] is [ > or \le ] our significance level of α = [value], we [fail to reject / reject] the null hypothesis.

  • Interpretation in Context: We [do not have / have] convincing statistical evidence to conclude that [state the alternative hypothesis, Hₐ, in the context of the problem].


Practice Problems

Problem 1:

A company that manufactures LED light bulbs wants to know if there is a difference in the average lifespan of bulbs produced at two different factories, Factory A and Factory B. They take a random sample of 40 bulbs from Factory A and find a mean lifespan of 2,200 hours with a standard deviation of 150 hours. A second random sample of 50 bulbs from Factory B has a mean lifespan of 2,250 hours with a standard deviation of 180 hours. Do these data provide convincing evidence at the α = 0.05 level of a difference in the mean lifespans of bulbs from the two factories?

Solution:

STATE:

  • H₀: μₐ - μₑ = 0

  • Hₐ: μₐ - μₑ \neq 0

  • Parameters:

    • μₐ = the true mean lifespan of all LED bulbs from Factory A.

    • μₑ = the true mean lifespan of all LED bulbs from Factory B.

  • Significance Level: α = 0.05.

PLAN:

  • Name of Procedure: We will perform a two-sample t-test for a difference of population means (μₐ - μₑ).

  • Check Conditions:

    • Random: The problem states the data come from two independent random samples of bulbs from each factory.

    • 10% Condition: It is reasonable to assume that the total number of bulbs produced at Factory A is at least 10 * 40 = 400 and at Factory B is at least 10 * 50 = 500.

    • Normal/Large Sample: Since both sample sizes (nₐ = 40 and nₑ = 50) are at least 30, the sampling distribution of (x̄ₐ - x̄ₑ) is approximately Normal by the Central Limit Theorem.

DO:

  • Using the TI-84 with input:

    • x̄1=2200, Sx1=150, n1=40

    • x̄2=2250, Sx2=180, n2=50

    • μ1: \neqμ2, Pooled: No

  • Test Statistic:

  • P-value:

    • Using df = 94.8 (from calculator), the P-value is 0.154.

CONCLUDE:

  • Decision: Because the P-value of 0.154 is greater than our significance level of α = 0.05, we fail to reject the null hypothesis.

  • Interpretation in Context: We do not have convincing statistical evidence to conclude that there is a difference in the true mean lifespans of LED bulbs produced at Factory A and Factory B.


Problem 2:

A physical therapist wants to determine if a new stretching regimen is more effective at increasing flexibility than a traditional regimen. 20 clients are randomly assigned to two groups of 10. Group 1 (New) uses the new regimen, and Group 2 (Traditional) uses the old one. After four weeks, their increase in flexibility is measured in inches. The results are:

  • Group 1 (New): x̄₁ = 4.8 in, s₁ = 1.2 in

  • Group 2 (Trad): x̄₂ = 3.9 in, s₂ = 1.4 in

Assume that dotplots of the data show no strong skewness or outliers. Is there convincing evidence that the new regimen produces a higher mean increase in flexibility? Use a significance level of α = 0.01.

Solution:

STATE:

  • H₀: μ₁ - μ₂ = 0

  • Hₐ: μ₁ - μ₂ > 0

  • Parameters:

    • μ₁ = the true mean increase in flexibility for all clients using the new regimen.

    • μ₂ = the true mean increase in flexibility for all clients using the traditional regimen.

  • Significance Level: α = 0.01.

PLAN:

  • Name of Procedure: We will perform a two-sample t-test for a difference of population means (μ₁ - μ₂).

  • Check Conditions:

    • Random: The 20 clients were randomly assigned to the two treatment groups (new and traditional regimens).

    • 10% Condition: This is a randomized experiment, not sampling without replacement. The 10% condition does not apply.

    • Normal/Large Sample: Since both sample sizes (n₁ = 10 and n₂ = 10) are small, we must rely on the problem statement. The problem states that dotplots of the sample data show no strong skewness or outliers, so we can proceed as if the sampling distribution is approximately Normal.

DO:

  • Using the TI-84 with input:

    • x̄1=4.8, Sx1=1.2, n1=10

    • x̄2=3.9, Sx2=1.4, n2=10

    • μ1: >μ2, Pooled: No

  • Test Statistic:

  • P-value:

    • Using df = 17.8 (from calculator), the P-value is 0.070.

CONCLUDE:

  • Decision: Because the P-value of 0.070 is greater than our significance level of α = 0.01, we fail to reject the null hypothesis.

  • Interpretation in Context: We do not have convincing statistical evidence to conclude that the new stretching regimen produces a higher true mean increase in flexibility than the traditional regimen.

Common Mistakes to Avoid

  • Confusing Two-Sample vs. Paired Data: A two-sample test is for two independent groups (e.g., men vs. women, treatment vs. control). A paired t-test (Topic 7.7) is for two measurements on the same subject (e.g., before and after) or on matched pairs. Misidentifying the design leads to the wrong test.

  • Forgetting to Check Conditions for BOTH Groups: The Normal/Large Sample condition must be met for both samples. If one sample is large (n \ge 30) but the other is small (n < 30), you must still check a graph of the small sample's data for skewness/outliers.

  • Using Sample Statistics in Hypotheses: Hypotheses are always about population parameters (μ₁, μ₂). Never write H₀: x̄₁ - x̄₂ = 0. This is a fatal error.

  • Pooling Data: For the AP exam, you should never pool the data for a two-sample t-test for means. Always select "Pooled: No" on the calculator. This is because the condition that the two populations have equal variances is difficult to verify.

  • Stating an Incorrect Conclusion: Do not "accept the null hypothesis." You can only fail to reject it. This is a subtle but critical distinction. Failing to find evidence against H₀ is not the same as proving H₀ is true.