Sampling Distributions for | AP Stats Unit 5 Study Guide

Quick Summary

This guide covers the sampling distribution of a sample proportion ( $\overset{p}{^}$ ). You will learn how to describe the shape, center, and spread of the distribution of sample proportions that arises when we take many random samples from a population. By understanding these characteristics and verifying specific conditions, you will be able to calculate the probability of obtaining a certain sample proportion, a foundational skill for statistical inference.

Key Concepts

The core idea is to understand what happens when we take many, many random samples of the same size from a population and calculate the sample proportion, $\overset{p}{^}$ , for each one. The distribution of all these $\overset{p}{^}$ values is the sampling distribution of the sample proportion.

The Statistic vs. The Parameter:
- The population proportion, denoted by $p$ , is a parameter. It is a fixed, and usually unknown, value that describes the entire population (e.g., the true proportion of all U.S. adults who support a certain law).
- The sample proportion, denoted by $\overset{p}{^}$ (read "p-hat"), is a statistic. It is calculated from a sample ( $\overset{p}{^} = \frac{count of successes in sample}{sample size} = \frac{x}{n}$ ) and is used to estimate the parameter $p$ . The value of $\overset{p}{^}$ will vary from sample to sample.
Describing the Sampling Distribution of $\overset{p}{^}$ : To use this distribution to make predictions, we must describe its shape, center, and spread. This requires checking three important conditions.
[Image: A diagram showing a large population (e.g., a jar of beads with a proportion $p$ of red beads). Multiple samples of size $n$ are drawn. For each sample, $p - ha t$ is calculated. A histogram of all the $p - ha t$ values is shown, centered at $p ‘ an d ha v in g ana pp ro x ima t e l y N or ma l s ha p e .] * * 1. C e n t er (M e an) : * * * T h e m e an o f t h es am pl in g d i s t r ib u t i o n o f$ \hat{p} $i se q u a lt o t h e t r u e p o p u l a t i o n p ro p or t i o n, ‘ p ‘. * F or m u l a :$ \mu_{\hat{p}} = p$
- This tells us that $\overset{p}{^}$ is an unbiased estimator of $p$ . While any single $\overset{p}{^}$ might be higher or lower than $p$ , on average, the sample proportions will center on the true proportion.
2. Spread (Standard Deviation):
- The standard deviation of the sampling distribution of $\overset{p}{^}$ measures how much the sample proportion typically varies from the population proportion in repeated sampling.
- Formula: $σ_{\overset{p}{^}} = \frac{p ( 1 - p )}{n}$
- Condition: This formula is only accurate if the 10% Condition is met: the sample size $n$ must be no more than 10% of the population size $N$ ( $n \leq 0.10 N$ ). When we sample without replacement, the observations are not truly independent. This condition ensures that our sample is small enough relative to the population that we can treat the selections as practically independent, making the standard deviation calculation valid.
3. Shape (Normality):
- The shape of the sampling distribution of $\overset{p}{^}$ becomes approximately Normal as the sample size n increases.
- Condition: We can assume an approximately Normal shape if the Large Counts Condition is met: the expected number of successes and failures in the sample are both at least 10.
- Check: $n p \geq 10$ and $n (1 - p) \geq 10$ .
- Important: If the sample size is too small, the sampling distribution will be skewed. For example, if $p$ is very close to 0 or 1, we need a very large $n ‘ t os a t i s f y t h eco n d i t i o nan d a c hi e v e a pp ro x ima t e N or ma l i t y . * * * S u mma ryo f C o n d i t i o n s : * * * * * R an d o m S am pl e : * * T h e d a t am u s t co m e f ro ma w e ll - d es i g n e d r an d o m s am pl e (e . g ., an SRS) t o minimi ze bia s an d e n s u re t h es am pl e i sre p rese n t a t i v e . T hi s i s a p rere q u i s i t e f or a ll in f ere n ce . * * * 10$ ):** Allows us to calculate the standard deviation of the sampling distribution. It ensures independence of observations.
- Large Counts Condition ( $n p \geq 10$ and $n (1 - p) \geq 10$ ): Allows us to use a Normal distribution to model the sampling distribution of $\overset{p}{^}$ and calculate probabilities.

Key Vocabulary

Parameter: A number that describes a characteristic of a population (e.g., p, the true proportion of voters who favor a candidate).
Statistic: A number that is calculated from a sample and is used to estimate a parameter (e.g., $\overset{p}{^}$ , the proportion of voters in a sample who favor a candidate).
Sampling Distribution: The distribution of values taken by a statistic in all possible samples of the same size from the same population.
Sample Proportion ( $\overset{p}{^}$ ): The proportion of individuals in a sample that have a certain characteristic, calculated as $\overset{p}{^} = x / n$ .
Unbiased Estimator: A statistic whose sampling distribution has a mean that is equal to the true value of the parameter being estimated. $\overset{p}{^}$ is an unbiased estimator of p.
10% Condition: The requirement that the sample size $n$ is no more than 10% of the population size $N`. This validates the use of the standard deviation formula for samples drawn without replacement. - **Large Counts Condition**: The requirement that the expected number of successes ($np$ ) and failures ( $n (1 - p)$ ) are both at least 10. This validates the use of a Normal approximation for the sampling distribution of $\overset{p}{^}$ .

Calculator Tech (TI-84)

The primary calculator function for this topic is normalcdf(), used to find the area (probability) under a Normal curve.

Function:normalcdf(lower bound, upper bound, mean, standard deviation)
Location:[2nd] -> [VARS] (DISTR) -> 2: normalcdf(
Example: Suppose the sampling distribution of $\overset{p}{^}$ is approximately Normal with a mean $μ_{\overset{p}{^}} = 0.60$ and a standard deviation $σ_{\overset{p}{^}} = 0.05$ . To find the probability of getting a sample proportion of 0.68 or higher, P(\hat{p} \ge 0.68), you would enter:
normalcdf(0.68, 1E99, 0.60, 0.05)
(Note: 1E99 is used to represent positive infinity.)

How to Show Work on the FRQ

On Free Response Questions, you must communicate your reasoning clearly. For problems involving calculating a probability for a sample proportion, use the four-step State, Plan, Do, Conclude process.

Template for Calculating a Probability for $\overset{p}{^}$

1. STATE:

Define the parameter $p$ in the context of the problem.
Define the statistic $\overset{p}{^}$ in the context of the problem.
State the probability you are trying to find in symbols (e.g., $P (\overset{p}{^} < 0.22) ‘) . * * 2. P L A N : * * * I d e n t i f y t h e p roce d u re : " W e w i ll u se a N or ma l m o d e lt o f in d t h e p ro babi l i t y ." * C h ec k t h eco n d i t i o n s f or t h es am pl in g d i s t r ib u t i o n o f$ \hat{p} $: * **Random:** "The problem states that the `n` individuals were randomly selected." * **10% Condition:** "The sample size `n` = [value] is less than 10% of the total population of [context]. (e.g., 50 < 0.10 * (all U.S. teenagers)). This allows us to calculate the standard deviation." * **Large Counts Condition:** "We check $np = [\text{value}] \ge 10$ and $n (1 - p) = [value] \geq 10$ . Since both are at least 10, the sampling distribution of $\overset{p}{^}$ is approximately Normal."

3. DO:

Calculate the mean and standard deviation of the sampling distribution:
- $μ_{\overset{p}{^}} = p = [value]$
- $σ_{\overset{p}{^}} = \frac{p ( 1 - p )}{n} = \frac{[ value ] ( 1 - [ value ])}{[ value ]} = [result]$
Calculate the z-score for the boundary value of $\overset{p}{^}$ :
- $z = \frac{p ^ - μ _{\overset{p}{^}}}{σ _{\overset{p}{^}}} = \frac{[ value ] - [ value ]}{[ value ]} = [result]$
Find the probability. Sketching a Normal curve is highly recommended.
- Show the calculation using the z-score or calculator syntax: $P (Z < z - score) = [probability]$ or $n or ma l c df (...) = [probability] ‘. * * 4. CONC LU D E : * * * W r i t e a se n t e n ce t ha t an s w ers t h eor i g ina lq u es t i o nin co n t e x t . * " T h ere i s a [p ro babi l i t y] p ro babi l i t yo f o b t ainin g a s am pl e p ro p or t i o n o f [co n t e x t] o f [v a l u eo f$ \hat{p} $] or [less/more], assuming the true proportion of [context] is `p`." ## Practice Problems **Problem 1:** A large national polling organization reports that 38% of all U.S. high school students have a part-time job. The principal of a large local high school, which has 2,500 students, suspects the proportion is lower at their school. They take a simple random sample of 150 students and find that 45 of them have a part-time job. If the national report is true, what is the probability that a random sample of 150 students would result in a sample proportion of 0.30 or less? **Solution:** **State:** We want to find the probability of observing a sample proportion of 0.30 or less. Let $p$ = the true proportion of all U.S. high school students with a part-time job, so $p = 0.38$ . Let $\overset{p}{^}$ be the proportion of students in a sample of 150 who have a part-time job. We want to find $P (\overset{p}{^} \leq 0.30)$ .

Plan: We will use a Normal model to approximate the sampling distribution of $\overset{p}{^}$ . We must check the conditions.

Random: The problem states a simple random sample of 150 students was taken.
10% Condition: The sample size $n = 150$ is less than 10% of the 2,500 students at this specific high school (150 < 0.10 * 2500 = 250), and certainly less than 10% of all U.S. high school students. This allows us to calculate the standard deviation.
Large Counts Condition: We check $n p = 150 (0.38) = 57 \geq 10$ and $n (1 - p) = 150 (1 - 0.38) = 150 (0.62) = 93 \geq 10$ . Since both expected counts are at least 10, the sampling distribution of $\overset{p}{^}$ is approximately Normal.

Do:

The mean of the sampling distribution is $μ_{\overset{p}{^}} = p = 0.38$ .

The standard deviation is $σ_{\overset{p}{^}} = \frac{p ( 1 - p )}{n} = \frac{0.38 ( 0.62 )}{150} = 0.0015706... \approx 0.0396$ .

We calculate the z-score for $\overset{p}{^} = 0.30$ : $z = \frac{0.30 - 0.38}{0.0396} = \frac{- 0.08}{0.0396} \approx - 2.02$ .

Using a z-table or calculator, $P (Z \leq - 2.02) \approx 0.0217$ .

Conclude:

Assuming the true proportion of high school students with a part-time job is 38%, there is approximately a 0.0217 probability of obtaining a sample proportion of 0.30 or less in a random sample of 150 students.

Problem 2:

According to a candy manufacturer, 20% of the candies in its bags are green. You purchase a large bag of candies and take a random sample of 200. What is the probability that the proportion of green candies in your sample is within 3 percentage points of the manufacturer's claim?

Solution:

State: We want to find the probability that the sample proportion $\overset{p}{^}$ is between 0.17 and 0.23. Let $p$ = the true proportion of green candies, so $p = 0.20$ . Let $\overset{p}{^}$ be the proportion of green candies in a sample of 200. We want to find $P (0.17 \leq \overset{p}{^} \leq 0.23) ‘. * * Pl an : * * W e w i ll u se a N or ma l m o d e lt o a pp ro x ima t e t h es am pl in g d i s t r ib u t i o n o f$ \hat{p} $. We must check the conditions. * **Random:** The problem states a random sample of 200 candies was taken. * **10% Condition:** The sample size `n = 200` is surely less than 10% of all candies produced by the manufacturer. This allows us to calculate the standard deviation. * **Large Counts Condition:** We check $np = 200(0.20) = 40 \ge 10$ and $n (1 - p) = 200 (0.80) = 160 \geq 10$ . Since both expected counts are at least 10, the sampling distribution of $\overset{p}{^}$ is approximately Normal.

Do:

The mean of the sampling distribution is $μ_{\overset{p}{^}} = p = 0.20$ .

The standard deviation is $σ_{\overset{p}{^}} = \frac{p ( 1 - p )}{n} = \frac{0.20 ( 0.80 )}{200} = 0.0008 \approx 0.0283$ .

We need to find the probability between $\overset{p}{^} = 0.17$ and $\overset{p}{^} = 0.23$ . We can use normalcdf() or calculate two z-scores.

$z_{l o w er} = \frac{0.17 - 0.20}{0.0283} \approx - 1.06$

$z_{u pp er} = \frac{0.23 - 0.20}{0.0283} \approx 1.06$

Using a calculator: normalcdf(0.17, 0.23, 0.20, 0.0283) \approx 0.7088.