Sampling Distributions for | AP Stats Unit 5 Study Guide

Quick Summary

This guide will empower you to describe the sampling distribution for the difference between two independent sample means ( $\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}$ ). You will learn to verify the necessary conditions to model this distribution, calculate its mean and standard deviation, and use this model to determine the probability of observing a specific difference between two sample means. This skill is foundational for comparing two groups, a central task in statistical inference.

Key Concepts

When we take independent random samples from two distinct populations, the statistic $\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}$ (the difference in sample means) becomes our variable of interest. Just like a single sample mean, this statistic has a sampling distribution with a predictable shape, center, and spread.

[Image: A diagram showing two separate population distributions (Population 1 with mean $μ_{1}$ and Population 2 with mean $μ_{2}$ ), with arrows indicating random samples being drawn from each. These samples lead to sample means $\overset{x}{ˉ}_{1}$ and $\overset{x}{ˉ}_{2}$ . The differences ( $\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}$ ) from many pairs of samples are then plotted to form a new, approximately Normal distribution labeled "Sampling Distribution of $\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}$ ".]

To fully describe this sampling distribution, we must address its shape, center, and spread by checking three key conditions.

1. Shape: The Normal/Large Sample Condition

The sampling distribution of the difference in sample means, $\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}$ , will be Approximately Normal if one of the following is true:

Both populations are stated to be Normally distributed. If we know the underlying populations are Normal, the sampling distribution of the difference will also be Normal, regardless of sample size.
The Central Limit Theorem (CLT) applies. If we don't know the shape of the populations, the sampling distribution will be approximately Normal as long as both sample sizes are sufficiently large ( $n_{1} \geq 30$ and $n_{2} \geq 30$ ).
A combination of the two. It is also acceptable if one population is Normal (making its sampling distribution Normal) and the other sample is large ( $n \geq 30$ ).

2. Center: The Mean of the Sampling Distribution

The mean of the sampling distribution of $\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}$ is the true difference between the population means, $μ_{1} - μ_{2}$ .

Formula: $μ_{\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}} = μ_{1} - μ_{2}$
Interpretation: This means that the difference in sample means is an unbiased estimator of the difference in population means. On average, the difference we see in our samples will equal the true difference between the populations.

3. Spread: The Standard Deviation of the Sampling Distribution

The standard deviation of the sampling distribution of $\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}$ measures the typical distance of the statistic ( $\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}$ ) from its mean ( $μ_{1} - μ_{2}$ ). To calculate this, two conditions must be met.

Condition 1: Independent Samples. The data must come from two independent random samples or from two groups in a randomized experiment. This ensures that the results from one group do not influence the results from the other.
Condition 2: The 10% Condition. When sampling without replacement, we must ensure our sample sizes are not too large relative to their populations. We must verify this for both samples:
- $n_{1} \leq \frac{1}{10} N_{1}$ (the first sample is no more than 10% of its population)
- $n_{2} \leq \frac{1}{10} N_{2}$ (the second sample is no more than 10% of its population)
- This condition allows us to treat the observations as independent, even though we are sampling without replacement.
Formula: If the conditions are met, the standard deviation is:
- $σ_{\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}} = \frac{σ _{1}^{2}}{n _{1}} + \frac{σ _{2}^{2}}{n _{2}}$
- Crucial Note: Notice that we add the variances ( $σ^{2}$ ) inside the square root. When we combine two independent random variables, their variances add. We never add their standard deviations.

Key Vocabulary

Sampling Distribution of a Difference: The distribution of all possible values of the statistic $\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}$ from all possible pairs of samples of size $n_{1}$ and $n_{2}$ drawn from two populations.
Unbiased Estimator: A statistic whose sampling distribution has a mean equal to the true value of the parameter it is intended to estimate. Here, $\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}$ is an unbiased estimator of $μ_{1} - μ_{2}$ .
Central Limit Theorem (CLT): The principle that states that for a sufficiently large sample size ( $n \geq 30$ ), the sampling distribution of a sample mean ( $\overset{x}{ˉ}$ ) will be approximately Normal, regardless of the shape of the parent population. This principle extends to the difference of two sample means.
Independent Samples: Samples drawn from two populations in such a way that the selection of individuals in one sample has no bearing on the selection of individuals in the other sample.
10% Condition: A rule of thumb used when sampling without replacement to ensure that observations within a sample can be treated as independent. It requires the sample size to be no more than 10% of the population size.

Calculator Tech (TI-84)

No new major calculator functions are required for this topic. The primary calculation involves finding probabilities using the Normal model. You will use the $n or ma l c df ()$ function, which you should already be familiar with:

$n or ma l c df (l o w e r_{b} o u n d, u pp e r_{b} o u n d, m e an, s t an d a r d_{d} e v ia t i o n)$
For this topic, the $m e an$ will be $μ_{1} - μ_{2}$ and the $s t an d a r d_{d} e v ia t i o n$ will be $σ_{\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}}$ .

How to Show Work on the FRQ

On the AP exam, you must be able to clearly describe the sampling distribution of $\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}$ before you calculate any probabilities. This requires a methodical check of conditions and a clear statement of the distribution's parameters. Use the following template to structure your response.

Template for Describing the Sampling Distribution of $\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}$

Shape:
- "Because the population of [Group 1] is Normal and the population of [Group 2] is Normal..."
- "...OR, because both sample sizes are large ( $n_{1} = [value] \geq 30$ and $n_{2} = [value] \geq 30$ ), the Central Limit Theorem applies."
- "Therefore, the sampling distribution of the difference in sample means, $\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}$ , is Approximately Normal."
Center:
- "The mean of the sampling distribution is $μ_{\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}} = μ_{1} - μ_{2} = [value] - [value] = [result]$ ."
Spread:
- "The problem states we have independent random samples."
- "We must check the 10% condition for both samples. It is reasonable to assume the total number of [Population 1 subjects] is at least $10 \times n_{1}$ and the total number of [Population 2 subjects] is at least $10 \times n_{2}$ ."
- "Therefore, the standard deviation of the sampling distribution is $σ_{\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}} = \frac{σ _{1}^{2}}{n _{1}} + \frac{σ _{2}^{2}}{n _{2}} = \frac{( value ) ^{2}}{value} + \frac{( value ) ^{2}}{value} = [result]$ ."
Summary Statement (Optional but Recommended):
- "In summary, the sampling distribution of $\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}$ is Approximately Normal with a mean of [result from Center] and a standard deviation of [result from Spread]."

Practice Problems

Problem 1:

The mean height of adult men in a certain country is 69 inches with a standard deviation of 2.5 inches. The mean height of adult women is 64 inches with a standard deviation of 2.3 inches. Both populations are approximately Normally distributed. If a random sample of 50 men and a random sample of 60 women are selected, what is the probability that the sample mean height of the men will be at least 6 inches greater than the sample mean height of the women?

Solution:

Let $\overset{x}{ˉ}_{M}$ be the sample mean height of men and $\overset{x}{ˉ}_{W}$ be the sample mean height of women. We want to find $P (\overset{x}{ˉ}_{M} - \overset{x}{ˉ}_{W} \geq 6)$ . First, we must describe the sampling distribution of the difference, $\overset{x}{ˉ}_{M} - \overset{x}{ˉ}_{W}$ .

Shape: Because both parent populations of men's and women's heights are stated to be approximately Normal, the sampling distribution of the difference in sample means, $\overset{x}{ˉ}_{M} - \overset{x}{ˉ}_{W}$ , is also Approximately Normal.
Center: The mean of the sampling distribution is $μ_{\overset{x}{ˉ}_{M} - \overset{x}{ˉ}_{W}} = μ_{M} - μ_{W} = 69 - 64 = 5$ inches.
Spread: The problem states we have independent random samples. We assume the number of adult men in the country is at least $10 \times 50 = 500$ and the number of adult women is at least $10 \times 60 = 600$ , so the 10% condition is met. The standard deviation is $σ_{\overset{x}{ˉ}_{M} - \overset{x}{ˉ}_{W}} = \frac{σ _{M}^{2}}{n _{M}} + \frac{σ _{W}^{2}}{n _{W}} = \frac{2. 5 ^{2}}{50} + \frac{2. 3 ^{2}}{60} \approx 0.125 + 0.0882 \approx 0.4617$ inches.

Now, we can calculate the probability using this Normal distribution: $N (5, 0.4617)$ . The z-score for a difference of 6 inches is $z = \frac{6 - 5}{0.4617} \approx 2.166$ . The probability is $P (Z \geq 2.166) = normalcdf (6, \infty, 5, 0.4617) \approx 0.0151$ . There is about a 1.51% chance that the sample mean height of men will be at least 6 inches greater than the sample mean height of the women.

Problem 2:

A company produces batteries at two different factories, A and B. The mean lifespan of batteries from Factory A is 150 hours with a standard deviation of 10 hours. The mean lifespan of batteries from Factory B is 145 hours with a standard deviation of 8 hours. An inspector takes a random sample of 40 batteries from Factory A and 35 batteries from Factory B. What is the probability that the sample mean lifespan from Factory A is less than the sample mean lifespan from Factory B?

Solution:

Let $\overset{x}{ˉ}_{A}$ be the sample mean lifespan from Factory A and $\overset{x}{ˉ}_{B}$ be the sample mean lifespan from Factory B. We want to find $P (\overset{x}{ˉ}_{A} < \overset{x}{ˉ}_{B})$ , which is equivalent to finding $P (\overset{x}{ˉ}_{A} - \overset{x}{ˉ}_{B} < 0)$ . First, we describe the sampling distribution of the difference, $\overset{x}{ˉ}_{A} - \overset{x}{ˉ}_{B}$ .

Shape: The shapes of the populations are unknown. However, because both sample sizes are large ( $n_{A} = 40 \geq 30$ and $n_{B} = 35 \geq 30$ ), the Central Limit Theorem applies. Therefore, the sampling distribution of the difference in sample means, $\overset{x}{ˉ}_{A} - \overset{x}{ˉ}_{B}$ , is Approximately Normal.
Center: The mean of the sampling distribution is $μ_{\overset{x}{ˉ}_{A} - \overset{x}{ˉ}_{B}} = μ_{A} - μ_{B} = 150 - 145 = 5$ hours.
Spread: The problem states we have independent random samples. It is reasonable to assume that both factories produce at least $10 \times 40 = 400$ and $10 \times 35 = 350$ batteries, respectively, so the 10% condition is met. The standard deviation is $σ_{\overset{x}{ˉ}_{A} - \overset{x}{ˉ}_{B}} = \frac{σ _{A}^{2}}{n _{A}} + \frac{σ _{B}^{2}}{n _{B}} = \frac{1 0 ^{2}}{40} + \frac{8 ^{2}}{35} \approx 2.5 + 1.8286 \approx 2.0805$ hours.

Now, we calculate the probability using the Normal distribution $N (5, 2.0805)$ . The z-score for a difference of 0 is $z = \frac{0 - 5}{2.0805} \approx - 2.403$ . The probability is $P (Z < - 2.403) = normalcdf (- \infty, 0, 5, 2.0805) \approx 0.0081$ . There is a very small 0.81% chance that the sample mean lifespan from Factory A will be less than that from Factory B.

Common Mistakes to Avoid

Adding Standard Deviations: Never add standard deviations. You must add the variances ( $σ^{2} / n$ ) and then take the square root of the sum. A common incorrect formula is $\frac{σ _{1}}{n _{1}} + \frac{σ _{2}}{n _{2}}$ . This is wrong.
Forgetting to Check Conditions for BOTH Samples: The Normal/Large Sample condition ( $n \geq 30$ or Normal population) and the 10% condition must be verified for both groups independently. It is not sufficient for only one sample to meet the criteria.
Confusing the Sample with the Sampling Distribution: Do not state that "the sample is approximately Normal." The correct statement is that the "sampling distribution of $\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}$ is approximately Normal." The samples themselves might not be Normal at all, especially if the CLT is invoked.
Forgetting to Divide by n in the Variance Formula: A frequent error is to calculate $σ_{1}^{2} + σ_{2}^{2}$ . You must use the variances of the sample means, which are $σ_{1}^{2} / n_{1}$ and $σ_{2}^{2} / n_{2}$ . The $n$ in the denominator is critical.
Subtracting Variances: Even though we are looking at the distribution of a difference, the variances of independent random variables always add. The formula for the standard deviation has a $+$ sign under the radical, not a $-$ sign.

Sampling Distributions for Differences in Sample Means - AP Statistics Study Guide

Quick Summary

Key Concepts

1. Shape: The Normal/Large Sample Condition

2. Center: The Mean of the Sampling Distribution

3. Spread: The Standard Deviation of the Sampling Distribution

Key Vocabulary

Calculator Tech (TI-84)

How to Show Work on the FRQ

Practice Problems

Common Mistakes to Avoid