Quick Summary
This guide will equip you to compare two distinct populations or treatment groups by estimating the true difference between their proportions. You will learn to construct and interpret a two-sample z-interval for the difference of two proportions, a fundamental skill for making statistically sound comparisons. This involves verifying the necessary conditions, calculating the interval using the correct formula, and communicating your conclusions clearly and in context.
Key Concepts
The primary goal of this procedure is to estimate the true difference between two population proportions, p₁ - p₂. For example, we might want to estimate the difference in the proportion of men and women who support a certain policy, or the difference in the recovery rate for patients receiving a new drug versus a placebo.
The Confidence Interval Formula
The general formula for any confidence interval is:
Point Estimate ± (Critical Value) × (Standard Error)
For the difference of two proportions, this specific formula is:
(p̂₁ - p̂₂) ± z ⋅ √[ (p̂₁(1-p̂₁)/n₁) + (p̂₂(1-p̂₂)/n₂) ]*
Let's break down each component:
Point Estimate (p̂₁ - p̂₂): This is our best guess for the true difference between the population proportions. It's calculated directly from our sample data.
p̂₁ = x₁/n₁ is the sample proportion for group 1 (number of successes / sample size).
p̂₂ = x₂/n₂ is the sample proportion for group 2.
Critical Value (z):* This value determines the width of our interval and is based on the desired confidence level. It tells us how many standard errors we need to go out from the point estimate to capture the true difference with a certain level of confidence. You find this using the standard Normal distribution.
Common z* values:
90% confidence: z* = 1.645
95% confidence: z* = 1.96
99% confidence: z* = 2.576
[Image: A bell-shaped Normal curve with the central C% area shaded, and the z* and -z* values marking the boundaries of that area.]
Standard Error of the Difference (SE): This formula estimates the standard deviation of the sampling distribution of (p̂₁ - p̂₂). It measures the typical variation we expect in the difference of sample proportions from one set of samples to another.
SE = √[ (p̂₁(1-p̂₁)/n₁) + (p̂₂(1-p̂₂)/n₂) ]
Important: We use the individual sample proportions (p̂₁ and p̂₂) to calculate the standard error for a confidence interval. We do NOT pool the proportions together.
Conditions for Inference
Before we can trust our calculated interval, we must verify three conditions. These ensure that our methods are valid.
Random Condition: The data must come from two independent random samples or two groups in a randomized experiment.
"Independent samples" means the selection of individuals for one group has no impact on the selection for the other group.
This condition is crucial for the validity of the standard error formula.
10% Condition (Independence of Observations): When sampling without replacement, we must ensure our sample sizes are not too large relative to the population sizes. This preserves the independence of individual observations within each sample.
Check: n₁ \le (1/10)N₁ and n₂ \le (1/10)N₂, where N₁ and N₂ are the population sizes.
If this condition is not met, the calculated standard error will be inaccurate. If data comes from a randomized experiment, this condition is not necessary.
Large Counts Condition (Normality): The sampling distribution of (p̂₁ - p̂₂) must be approximately Normal. We check this by ensuring there are enough "successes" and "failures" in each group.
Check: n₁p̂₁ \ge 10, n₁(1-p̂₁) \ge 10, n₂p̂₂ \ge 10, and n₂(1-p̂₂) \ge 10.
Note that these are the counts (number of individuals), not the proportions.
Key Vocabulary
Point Estimate: The statistic calculated from sample data used to estimate a population parameter. For this topic, the point estimate is (p̂₁ - p̂₂).
Standard Error of the Difference: An estimate of the standard deviation of the sampling distribution of (p̂₁ - p̂₂). It quantifies the sample-to-sample variability of the difference in sample proportions.
Margin of Error: The "plus or minus" part of the confidence interval that is added to and subtracted from the point estimate. It is calculated as (Critical Value) × (Standard Error).
Two-Sample z-Interval for a Difference of Proportions: The specific name of the inference procedure used to estimate the true difference between two population proportions.
Confidence Level (C%): The success rate of the method used to construct the interval. In the long run, C% of all intervals constructed with this method will capture the true parameter.
Calculator Tech (TI-84)
You can and should use your calculator to compute the interval quickly and accurately, especially after showing the initial formula setup on an FRQ.
Function:
Keystrokes:
STAT -> TESTS -> B: 2-PropZInt...
Inputs:
: Number of successes in sample 1. Must be an integer.
: Sample size of sample 1.
: Number of successes in sample 2. Must be an integer.
: Sample size of sample 2.
: The confidence level as a decimal (e.g., 0.95 for 95%).
: Press ENTER.
The calculator will output the confidence interval and will also remind you of the sample proportions p̂₁ and p̂₂.
How to Show Work on the FRQ
To earn full credit on an inference FRQ, you must use the four-step State-Plan-Do-Conclude (SPDC) process.
State
Parameter: Define the parameter of interest in context. For this topic, it is p₁ - p₂.
- Template: "We want to estimate the true difference in proportions, p₁ - p₂, where p₁ = the true proportion of [context for group 1] and p₂ = the true proportion of [context for group 2]."
Confidence Level: State the confidence level.
- Template: "We will construct a [C%] confidence interval."
Plan
Procedure Name: Name the inference procedure.
- Template: "The appropriate procedure is a two-sample z-interval for a difference of proportions."
Check Conditions: Check the Random, 10%, and Large Counts conditions in context.
Random: "The problem states the data come from two independent random samples of [group 1] and [group 2]." OR "The problem states this was a randomized experiment."
10%: "It is reasonable to assume that n₁ = [value] is less than 10% of all [population 1] and n₂ = [value] is less than 10% of all [population 2]."
Large Counts: "We check for normality:
n₁p̂₁ = [x₁] \ge 10
n₁(1-p̂₁) = [n₁-x₁] \ge 10
n₂p̂₂ = [x₂] \ge 10
n₂(1-p̂₂) = [n₂-x₂] \ge 10
Since all counts are at least 10, the sampling distribution of (p̂₁ - p̂₂) is approximately Normal."
Do
General Formula: Write the general formula for a confidence interval.
- Template: Point Estimate ± (Critical Value) × (Standard Error)
Specific Formula & Substitution: Write the specific formula for this procedure and plug in the values from the problem.
Final Answer: Give the final interval calculated by your TI-84.
- Template: "The resulting interval is ([lower bound], [upper bound])."
Conclude
Interpret the Interval: Interpret the interval in the context of the problem using the standard script.
- Template: "We are [C%] confident that the interval from [lower bound] to [upper bound] captures the true difference in the proportion of [context for group 1] and [context for group 2] (p₁ - p₂)."
Interpret "0": Based on the interval, make a conclusion about whether there is a significant difference between the two proportions.
If 0 is in the interval: "Because 0 is a plausible value in our interval, we do not have convincing evidence of a difference between the true proportions of [context for group 1] and [context for group 2]."
If 0 is NOT in the interval: "Because 0 is not a plausible value in our interval, we have convincing evidence of a difference between the true proportions. Specifically, we are [C%] confident that the proportion of [context for group 1] is [higher/lower] than the proportion of [context for group 2]." (Choose higher if the interval is all positive, lower if all negative).
Practice Problems
Problem 1:
A polling agency wants to investigate whether there is a difference in the proportion of men and women in a certain city who support a new public transportation initiative. A random sample of 250 men found that 135 supported the initiative. An independent random sample of 300 women found that 180 supported it. Construct and interpret a 95% confidence interval for the difference in the proportion of all men and all women in the city who support the initiative.
Solution:
State: We want to estimate the true difference in proportions, p_M - p_W, where p_M is the true proportion of all men in the city who support the initiative, and p_W is the true proportion of all women in the city who support it. We will construct a 95% confidence interval.
Plan: The appropriate procedure is a two-sample z-interval for a difference of proportions.
Random: The problem states we have independent random samples of 250 men and 300 women.
10%: It is reasonable to assume that 250 is less than 10% of all men in the city, and 300 is less than 10% of all women in the city.
Large Counts: We must check the counts of successes and failures for both groups.
p̂_M = 135/250 = 0.54
p̂_W = 180/300 = 0.60
Men: n_M*p̂_M = 135 \ge 10 and n_M(1-p̂_M) = 250 - 135 = 115 \ge 10.
Women: n_W*p̂_W = 180 \ge 10 and n_W(1-p̂_W) = 300 - 180 = 120 \ge 10.
Since all counts are at least 10, the sampling distribution of (p̂_M - p̂_W) is approximately Normal.
Do:
Point Estimate ± (Critical Value) × (Standard Error)
(p̂_M - p̂_W) ± z* ⋅ √[ (p̂_M(1-p̂_M)/n_M) + (p̂_W(1-p̂_W)/n_W) ]
= (0.54 - 0.60) ± 1.96 ⋅ √[ (0.54(0.46)/250) + (0.60(0.40)/300) ]
= -0.06 ± 1.96 ⋅ √(0.0009936 + 0.0008)
= -0.06 ± 0.083
Using with x1=135, n1=250, x2=180, n2=300, C-Level=0.95, we get the interval (-0.143, 0.023).
Conclude:
We are 95% confident that the interval from -0.143 to 0.023 captures the true difference in the proportion of all men and all women in the city who support the initiative (Men - Women). Because 0 is a plausible value in our interval, we do not have convincing evidence of a difference between the true proportions of men and women in the city who support the initiative.
Problem 2:
To test the effectiveness of a new mosquito repellent, researchers conducted a randomized experiment. They recruited 100 volunteers. A randomly selected 50 volunteers were given the new repellent (Group A), and the other 50 were given a placebo (Group B). After two hours in a mosquito-infested area, 10 people in Group A reported mosquito bites, while 22 people in Group B reported bites. Construct and interpret a 90% confidence interval for the difference in the proportion of people who would get bitten (Placebo - New Repellent).
Solution:
State: We want to estimate the true difference in proportions, p_B - p_A, where p_B is the true proportion of all people who would get bitten using the placebo, and p_A is the true proportion of all people who would get bitten using the new repellent. We will construct a 90% confidence interval.
Plan: The appropriate procedure is a two-sample z-interval for a difference of proportions.
Random: The problem states this was a randomized experiment where volunteers were randomly assigned to the repellent and placebo groups.
10%: This condition is not necessary as the data comes from a randomized experiment, not sampling from a larger population.
Large Counts: We must check the counts of successes (bites) and failures (no bites).
p̂_A = 10/50 = 0.20
p̂_B = 22/50 = 0.44
Group A (Repellent): n_A*p̂_A = 10 \ge 10 and n_A(1-p̂_A) = 40 \ge 10.
Group B (Placebo): n_B*p̂_B = 22 \ge 10 and n_B(1-p̂_B) = 28 \ge 10.
Since all counts are at least 10, the sampling distribution of (p̂_B - p̂_A) is approximately Normal.
Do:
Point Estimate ± (Critical Value) × (Standard Error)
(p̂_B - p̂_A) ± z* ⋅ √[ (p̂_B(1-p̂_B)/n_B) + (p̂_A(1-p̂_A)/n_A) ]
= (0.44 - 0.20) ± 1.645 ⋅ √[ (0.44(0.56)/50) + (0.20(0.80)/50) ]
= 0.24 ± 1.645 ⋅ √(0.004928 + 0.0032)
= 0.24 ± 0.148
Using with x1=22, n1=50, x2=10, n2=50, C-Level=0.90, we get the interval (0.092, 0.388).
Conclude:
We are 90% confident that the interval from 0.092 to 0.388 captures the true difference in the proportion of people who would get bitten (Placebo - New Repellent). Because 0 is not a plausible value in our interval (the entire interval is positive), we have convincing evidence that the new repellent is effective. We are 90% confident that the true proportion of people who would get bitten is between 9.2 and 38.8 percentage points higher with the placebo than with the new repellent.
Common Mistakes to Avoid
Pooling Proportions: NEVER pool the sample proportions (i.e., combine them into one big proportion) when creating a confidence interval. The standard error formula for a two-proportion confidence interval always uses the individual sample proportions, p̂₁ and p̂₂. Pooling is only used for the significance test for p₁ - p₂.
Incorrect Condition Checks: Forgetting to check the Large Counts condition for all four counts (successes and failures in both groups). Also, stating the 10% condition is met without comparing the sample size to a plausible population size.
Misinterpreting the Confidence Level: Do not say, "There is a 95% probability that the true difference is in our interval." This is incorrect. The true difference is a fixed value; it's either in the interval or it's not. The 95% refers to the long-run success rate of the method used to generate the interval.
Confusing Counts and Proportions: The calculator requires the number of successes (x₁ and x₂) as inputs, which must be integers. Do not enter the sample proportions (p̂₁ and p̂₂) into the and fields.
Ignoring the Order of Subtraction: The order in which you subtract the proportions (p₁ - p₂) matters for your conclusion. If you calculate p_A - p_B and get an all-negative interval, the interpretation is different than if you calculate p_B - p_A and get an all-positive interval. Always clearly define the order in your State step and stick with it.