Setting Up a | AP Stats Unit 6 Study Guide

Quick Summary

This guide covers the essential first steps for conducting a significance test for the difference between two population proportions. You will learn how to correctly formulate the null and alternative hypotheses to compare two distinct groups and how to rigorously verify the three necessary conditions—Random, 10% Condition, and Large Counts—that allow us to proceed with the test. Mastering this setup is the critical foundation for drawing valid conclusions about whether a true difference exists between two populations.

Key Concepts

The goal of a two-proportion z-test is to determine if there is statistically significant evidence of a difference between the proportions of "successes" in two distinct populations. Before we can calculate a test statistic or a p-value, we must properly set up the test by defining our hypotheses and verifying the conditions for inference.

1. Defining the Hypotheses

The first step in any significance test is to state the hypotheses. These are statements about the population parameters, not the sample statistics. For this test, our parameters are p₁ and p₂, the true proportions of successes in Population 1 and Population 2, respectively.

The Null Hypothesis (H₀): The "No Difference" Claim
- The null hypothesis always states that there is no difference between the two population proportions. It's the baseline assumption we try to find evidence against.
- It can be written in two equivalent ways:
  - H₀: p₁ - p₂ = 0 (The difference between the proportions is zero.)
  - H₀: p₁ = p₂ (The proportions are equal.)
- You must always use the parameters p₁ and p₂, never the sample statistics p̂₁ and p̂₂.
The Alternative Hypothesis (Hₐ): The Research Claim
- The alternative hypothesis is what we are trying to find evidence for. It's typically derived from the wording of the research question. There are three possibilities:
  1. Right-Tailed Test: We suspect Population 1's proportion is greater than Population 2's.
    - Hₐ: p₁ - p₂ > 0 (or Hₐ: p₁ > p₂)
  2. Left-Tailed Test: We suspect Population 1's proportion is less than Population 2's.
    - Hₐ: p₁ - p₂ < 0 (or Hₐ: p₁ < p₂)
  3. Two-Tailed Test: We suspect the proportions are simply different, with no preconceived direction.
    - Hₐ: p₁ - p₂ \neq 0 (or Hₐ: p₁ \neq p₂)

2. Verifying the Conditions for Inference

We can only use the two-proportion z-test if certain conditions are met. These conditions ensure that our calculations are valid and that the sampling distribution of the difference in sample proportions, p̂₁ - p̂₂, is approximately Normal.

Condition 1: Random
- The data must come from two independent sources. This can be satisfied in one of two ways:
  - Two Independent Random Samples: The data are from two separate simple random samples (SRSs), one from each population of interest.
  - Randomized Experiment: The subjects were randomly assigned to two treatment groups.
- This condition is crucial for generalizing our findings from the samples to the larger populations or for making cause-and-effect conclusions in an experiment.
Condition 2: 10% Condition (for Independence)
- This condition is only necessary when we are sampling without replacement from two distinct populations. It is not needed for randomized experiments.
- The purpose is to ensure that the individual selections within each sample are nearly independent.
- The rule is: The sample size must be no more than 10% of the population size for each group.
  - n₁ \le (1/10)N₁ and n₂ \le (1/10)N₂
  - Where n₁ and n₂ are the sample sizes, and N₁ and N₂ are the population sizes.
Condition 3: Large Counts (for Normality)
- This condition checks if the sampling distribution of p̂₁ - p̂₂ is approximately Normal.
- Crucial Point: Because our null hypothesis assumes p₁ = p₂, we should use a single, better estimate for this common proportion by combining, or pooling, our two samples.
- Step 1: Calculate the Pooled (Combined) Sample Proportion (p̂_c).
  - Let x₁ and x₂ be the number of successes in each sample.
  - Let n₁ and n₂ be the sample sizes.
  - Formula:p̂_c = (x₁ + x₂) / (n₁ + n₂)
- Step 2: Check the "Large Counts" using p̂_c.
  - We must check that the expected number of successes and failures are all at least 10 for both groups, using our pooled proportion.
  - You must check all four of these calculations:
    - n₁ * p̂_c \ge 10
    - n₁ * (1 - p̂_c) \ge 10
    - n₂ * p̂_c \ge 10
    - n₂ * (1 - p̂_c) \ge 10
- If all four values are 10 or more, the Normal condition is met.

[Image: A bell-shaped curve labeled "Sampling Distribution of p̂₁ - p̂₂". The center of the curve is marked at 0, representing the assumption from the null hypothesis that p₁ - p₂ = 0.]

Key Vocabulary

Two-Proportion z-test: A significance test used to determine if there is a statistically significant difference between two population proportions.
Difference of Proportions (p₁ - p₂): The parameter of interest in a two-proportion z-test. It represents the true difference between the proportions of successes in two populations.
Null Hypothesis (H₀): The hypothesis of "no effect" or "no difference." For this test, it is always H₀: p₁ - p₂ = 0.
Alternative Hypothesis (Hₐ): The research hypothesis that there is a difference between the two population proportions (can be >, <, or \neq).
Pooled (Combined) Sample Proportion (p̂_c): An estimate of the common population proportion, calculated by combining the data from two samples. It is used specifically to check the Large Counts condition for a two-proportion z-test.
Independent Samples: Samples selected from two populations in such a way that the selection of individuals in one sample has no bearing on the selection of individuals in the other.

Calculator Tech (TI-84)

While this topic focuses on the setup, the TI-84's $2 - P ro pZT es t$ function is where you input these initial values. Knowing the inputs helps you understand the setup.

Press STAT.
Arrow over to the TESTS menu.
Select 6: 2-PropZTest...

You will see the following input screen:

x1: Number of successes in sample 1.
n1: Sample size of sample 1.
x2: Number of successes in sample 2.
n2: Sample size of sample 2.
p1: This line allows you to choose the alternative hypothesis.
- Select $\neqp 2$ for a two-tailed test (Hₐ: p₁ \neq p₂).
- Select $< p 2$ for a left-tailed test (Hₐ: p₁ < p₂).
- Select $> p 2$ for a right-tailed test (Hₐ: p₁ > p₂).
Calculate or Draw: $C a l c u l a t e$ will run the test and give you the z-statistic and p-value. $Dr a w$ will show the shaded Normal curve.

For topic 6.10, the key is knowing how to correctly identify x₁, n₁, x₂, n₂, and the form of the alternative hypothesis from a word problem.

How to Show Work on the FRQ

To get full credit for setting up a significance test on the AP exam, you must clearly communicate your hypotheses and your verification of the conditions. Use the State and Plan steps of the four-step process.

State

Define Hypotheses: State the null (H₀) and alternative (Hₐ) hypotheses using population parameters (p₁ and p₂).
Define Parameters: Clearly define p₁ and p₂ in the context of the problem. This is a frequently missed step!
State Alpha Level: Identify the significance level (α), which is usually 0.05 unless stated otherwise.

FRQ Template (State):

"We want to perform a significance test of:

H₀: p₁ - p₂ = 0

Hₐ: p₁ - p₂ [choose >, <, or \neq] 0

Where:

p₁ = the true proportion of [describe population 1 in context].
p₂ = the true proportion of [describe population 2 in context].

We will use a significance level of α = 0.05."

Plan

Name the Procedure: Identify the test by its full name.
Check the Conditions: Check the Random, 10%, and Large Counts conditions, showing your work and explicitly connecting them to the problem.

FRQ Template (Plan):

"The appropriate inference procedure is a two-sample z-test for a difference of proportions. We must verify the following conditions:

Random: The data come from [two independent random samples of... OR two groups in a randomized experiment where treatments were randomly assigned], as stated in the problem.
10% Condition: (Use only if sampling without replacement). We assume the total number of [context of population 1] is at least 10 * n₁ = [10 * sample size 1] and the total number of [context of population 2] is at least 10 * n₂ = [10 * sample size 2]. This is reasonable.
Large Counts: We first calculate the pooled sample proportion:
p̂_c = (x₁ + x₂) / (n₁ + n₂) = ([successes 1] + [successes 2]) / ([size 1] + [size 2]) = [value].
Now we check the expected counts:
- n₁p̂_c = [size 1] * [p̂_c value] = [result] \ge 10
- n₁(1-p̂_c) = [size 1] * (1 - [p̂_c value]) = [result] \ge 10
- n₂p̂_c = [size 2] * [p̂_c value] = [result] \ge 10
- n₂(1-p̂_c) = [size 2] * (1 - [p̂_c value]) = [result] \ge 10
Since all four expected counts are at least 10, the sampling distribution of p̂₁ - p̂₂ is approximately Normal."

Practice Problems

Problem 1:

A pharmaceutical company has developed a new drug to reduce the side effects of a certain medical treatment. They conduct an experiment where 200 volunteers are randomly assigned to two groups. Group A (the treatment group) receives the new drug, and Group B (the control group) receives a placebo. Of the 100 people in Group A, 18 reported side effects. Of the 100 people in Group B, 32 reported side effects. Is there convincing evidence that the true proportion of patients who experience side effects is lower for those taking the new drug than for those taking the placebo? State the hypotheses and check the conditions for a significance test.

Solution:

State:

We want to perform a significance test of:

H₀: p_drug - p_placebo = 0

Hₐ: p_drug - p_placebo < 0

Where:

p_drug = the true proportion of all patients similar to those in the study who would experience side effects when taking the new drug.
p_placebo = the true proportion of all patients similar to those in the study who would experience side effects when taking the placebo.

We will use a significance level of α = 0.05.

Plan:

The appropriate inference procedure is a two-sample z-test for a difference of proportions. We must verify the following conditions:

Random: The 200 volunteers were randomly assigned to the two groups (new drug and placebo). This condition is met.
10% Condition: This was a randomized experiment, not sampling without replacement from a finite population, so the 10% condition is not required.
Large Counts: We first calculate the pooled sample proportion of patients reporting side effects.
Here, x_drug = 18, n_drug = 100, and x_placebo = 32, n_placebo = 100.
p̂_c = (x_drug + x_placebo) / (n_drug + n_placebo) = (18 + 32) / (100 + 100) = 50 / 200 = 0.25.
Now we check the expected counts:
- n_drug * p̂_c = 100 * 0.25 = 25 \ge 10
- n_drug * (1-p̂_c) = 100 * (1 - 0.25) = 75 \ge 10
- n_placebo * p̂_c = 100 * 0.25 = 25 \ge 10
- n_placebo * (1-p̂_c) = 100 * (1 - 0.25) = 75 \ge 10
Since all four expected counts are at least 10, the sampling distribution of p̂_drug - p̂_placebo is approximately Normal.

Problem 2:

A polling agency wants to investigate if there is a difference in the proportion of male and female voters in a large city who support a certain ballot initiative. They take a simple random sample of 150 male voters and find that 81 support the initiative. They take an independent simple random sample of 200 female voters and find that 124 support the initiative. State the appropriate hypotheses and check the conditions to perform a significance test.

Solution:

State:

We want to perform a significance test of:

H₀: p_male - p_female = 0

Hₐ: p_male - p_female \neq 0

Where:

p_male = the true proportion of all male voters in the city who support the initiative.
p_female = the true proportion of all female voters in the city who support the initiative.

We will use a significance level of α = 0.05.

Plan:

The appropriate inference procedure is a two-sample z-test for a difference of proportions. We must verify the following conditions:

Random: The data come from two independent simple random samples—one of 150 male voters and one of 200 female voters. This condition is met.
10% Condition: We are sampling without replacement. It is reasonable to assume that there are at least 10 * 150 = 1500 male voters and at least 10 * 200 = 2000 female voters in a large city. This condition is met.
Large Counts: We first calculate the pooled sample proportion of voters who support the initiative.
Here, x_male = 81, n_male = 150, and x_female = 124, n_female = 200.
p̂_c = (x_male + x_female) / (n_male + n_female) = (81 + 124) / (150 + 200) = 205 / 350 \approx 0.586.
Now we check the expected counts:
- n_male * p̂_c = 150 * 0.586 = 87.9 \ge 10
- n_male * (1-p̂_c) = 150 * (1 - 0.586) = 62.1 \ge 10
- n_female * p̂_c = 200 * 0.586 = 117.2 \ge 10
- n_female * (1-p̂_c) = 200 * (1 - 0.586) = 82.8 \ge 10
Since all four expected counts are at least 10, the sampling distribution of p̂_male - p̂_female is approximately Normal.

Common Mistakes to Avoid

Using Sample Statistics in Hypotheses: Never write hypotheses using p̂₁ and p̂₂ (e.g., H₀: p̂₁ - p̂₂ = 0). Hypotheses are always about the unknown population parameters, p₁ and p₂.
Incorrectly Checking the Large Counts Condition: This is the most common error. For a significance test, you must use the pooled proportion (p̂_c) to check the expected counts. Do not check n₁p̂₁ \ge 10, n₁(1-p̂₁) \ge 10, etc. That method is for confidence intervals, not hypothesis tests.
Forgetting to Define Parameters: On an FRQ, you will lose points if you do not define p₁ and p₂ in the context of the problem. Simply writing "p₁ = proportion for group 1" is not specific enough.
Stating Conditions Without Checking Them: Do not just write "The Large Counts condition is met." You must show the calculation of the pooled proportion (p̂_c) and then show the four calculations (n₁p̂_c, etc.) and confirm they are all at least 10.
Confusing Experiments and Samples: Be clear about whether the data come from a randomized experiment or from two independent random samples. This determines whether you need to check the 10% condition.

Setting Up a Test for the Difference of Two Population Proportions - AP Statistics Study Guide

Quick Summary

Key Concepts

1. Defining the Hypotheses

2. Verifying the Conditions for Inference

Key Vocabulary

Calculator Tech (TI-84)

How to Show Work on the FRQ

State

Plan

Practice Problems

Common Mistakes to Avoid