Quick Summary
This lesson introduces the foundational concepts of statistical inference by distinguishing between populations and samples. You will learn to identify the key numerical summaries for each—parameters for populations and statistics for samples—and understand why the value of a statistic naturally changes from one sample to another, a concept known as sampling variability. This understanding is the first step toward using sample data to make reliable conclusions about a larger population.
Key Concepts
This topic lays the groundwork for the entire unit on sampling distributions. Mastering the distinction between a parameter and a statistic is one of the most critical skills in AP Statistics.
Population and Sample
A population is the entire group of individuals we want information about. For example, all eligible voters in the United States, all teenagers in a specific high school, or all ball bearings produced by a factory in a given day.
A sample is a subset of individuals from the population from which we actually collect data. We use data from a sample to draw conclusions about the entire population. For example, a survey of 1,200 eligible voters, a questionnaire given to 100 students at the high school, or a quality test on 50 randomly selected ball bearings.
Parameter and Statistic
A parameter is a number that describes a characteristic of a population.
Parameters are typically represented by Greek letters.
Common parameters are the population mean (μ), population proportion (p), and population standard deviation (σ).
In reality, the value of a parameter is usually unknown because we can rarely collect data from the entire population. It is a fixed value that we want to estimate.
A statistic is a number that describes a characteristic of a sample.
Statistics are typically represented by Roman letters or symbols with a "hat".
Common statistics are the sample mean (x̄, read "x-bar"), sample proportion (p̂, read "p-hat"), and sample standard deviation (s).
The value of a statistic is always known because it is calculated directly from our sample data. We use a statistic to estimate an unknown parameter.
[Image: A large circle labeled "Population" with the Greek letter μ inside. An arrow points from it to a smaller circle labeled "Sample" with the Roman letter x̄ inside. A caption reads: "We use the statistic x̄ from the sample to estimate the parameter μ of the population."]
Sampling Variability
This is the core concept that answers the question, "Why is my sample not like yours?"
Sampling variability is the natural, expected variation in the values of a statistic from one random sample to another.
If you and a friend both take a random sample of 50 students from your school and calculate the average GPA, you will almost certainly get slightly different values for x̄.
This is not an error. It happens because different random samples will contain different individuals, leading to different calculated statistics. This variability is the foundation of statistical inference; we will learn to quantify it to understand how close our estimate is likely to be to the true parameter.
The Sampling Distribution
This is a crucial, and often challenging, idea that we will build upon throughout this unit.
Imagine we could take every possible random sample of the same size (n) from a population.
For each sample, we calculate the same statistic (e.g., the sample proportion, p̂).
The sampling distribution of that statistic is the distribution of all these calculated statistic values.
It is not a distribution of individual data points. It is a theoretical distribution composed entirely of statistics.
The sampling distribution allows us to see how a statistic behaves, telling us what values it is likely to take and how much those values tend to vary. This is the key to determining if a result from our single sample is surprising or expected.
[Image: A diagram showing a large population (e.g., a jar of red and blue marbles). Multiple arrows point from the population to small sample groups of marbles. For each sample, a p̂ value is calculated. These p̂ values are then plotted on a dotplot or histogram, which is labeled "The Sampling Distribution of p̂."]
Key Vocabulary
Population: The entire group of individuals about which we want to draw conclusions.
Sample: A subset of the population from which we actually collect data.
Parameter: A number that describes a characteristic of a population (e.g., the true mean μ, the true proportion p). It is a fixed, unknown value.
Statistic: A number that describes a characteristic of a sample (e.g., the sample mean x̄, the sample proportion p̂). It is a known value used to estimate a parameter.
Sampling Variability: The natural tendency of a statistic's value to vary from one random sample to another. This is not a mistake; it is an expected property of sampling.
Sampling Distribution: The theoretical probability distribution of a statistic, obtained by considering all possible samples of a given size from a population.
Calculator Tech (TI-84)
No major calculator functions are required for this topic. The concepts are foundational and do not involve calculation.
How to Show Work on the FRQ
While Topic 5.1 is conceptual, it builds the foundation for describing sampling distributions, a common task on Free Response Questions. When asked to describe a sampling distribution, you must always address its Shape, Center, and Spread. Here is the template you will use throughout Unit 5.
How to Describe the Sampling Distribution of a Sample Proportion (p̂):
Shape: "The sampling distribution of p̂ is Approximately Normal because the Large Counts Condition is met. We check this by showing that np \ge 10 and n(1-p) \ge 10."
Center: "The mean of the sampling distribution of p̂ is μ_p̂ = p." (State the value of p from the problem).
Spread: "The standard deviation of the sampling distribution of p̂ is σ_p̂ = √[p(1-p)/n]." (State the value after checking the 10% condition: "We can use this formula because the sample size n is less than 10% of the population size.").
How to Describe the Sampling Distribution of a Sample Mean (x̄):
Shape: "The sampling distribution of x̄ is Approximately Normal." Justify this by stating ONE of the following:
"...because the population distribution was stated to be Normal."
"...because the sample size is large (n \ge 30), so the Central Limit Theorem applies."
Center: "The mean of the sampling distribution of x̄ is μ_x̄ = μ." (State the value of μ from the problem).
Spread: "The standard deviation of the sampling distribution of x̄ is σ_x̄ = σ/√n." (State the value after checking the 10% condition: "We can use this formula because the sample size n is less than 10% of the population size.").
Practice Problems
Problem 1: A national polling organization is interested in the proportion of U.S. adults who believe that the country is heading in the right direction. They conduct a telephone poll of 1,025 randomly selected adults. The poll finds that 287 of the respondents believe the country is heading in the right direction. Identify the population, sample, parameter, and statistic in this context.
Solution:
The population is all U.S. adults, as this is the entire group the polling organization is interested in. The sample is the 1,025 adults who were actually selected and responded to the poll. The parameter of interest is p, the true proportion of all U.S. adults who believe the country is heading in the right direction. This value is unknown. The statistic is p̂, the sample proportion of respondents who believe the country is heading in the right direction. This value is known and is calculated from the sample data as p̂ = 287/1025 \approx 0.28.
Problem 2: A large high school has 2,500 students. The school administration wants to estimate the true mean number of hours students spend on homework per week. Two AP Statistics students, Maria and David, are assigned to investigate. Maria selects a simple random sample of 40 students and finds a sample mean of 7.5 hours. David takes a separate simple random sample of 40 students and finds a sample mean of 8.1 hours. Explain why it is not surprising that their two sample means are different. Use appropriate statistical terminology.
Solution:
It is not surprising that Maria and David obtained different sample means because of sampling variability. This is the natural and expected variation in the value of a statistic (in this case, the sample mean x̄) from one random sample to another. Because Maria's and David's samples were chosen randomly and separately, they almost certainly consist of different groups of 40 students. The students in Maria's sample happened to have a slightly lower average homework time than the students in David's sample. This difference is not an indication of an error in their methods but is a fundamental property of random sampling. Both 7.5 hours and 8.1 hours are sample statistics (x̄) being used to estimate the same unknown population parameter (μ), the true mean homework time for all 2,500 students.
Common Mistakes to Avoid
Confusing Parameter and Statistic: This is the most common error. Remember the alliteration: Parameter describes a Population, and Statistic describes a Sample. A parameter is a fixed, unknown number (like the true proportion of voters), while a statistic is a known number calculated from your sample data (like the proportion of voters in your poll).
Using Incorrect Notation: Notation is not optional; it communicates meaning. Using p when you mean p̂, or μ when you mean x̄, will be marked incorrect on the AP exam. Always use Greek letters for parameters and Roman letters/hats for statistics.
Believing Sampling Variability is an Error: Students often think that if their sample statistic isn't exactly equal to the population parameter, they did something wrong. Sampling variability is a natural feature of random sampling, not a mistake. The entire field of statistical inference is built on understanding and quantifying this variability.
Confusing the Distribution of a Sample with a Sampling Distribution: The "distribution of a sample" is a graph of the data you actually collected from one sample (e.g., a histogram of the 40 homework times in Maria's sample). The "sampling distribution" is a theoretical probability model showing the values a statistic (like the sample mean) would take if you took every possible sample of that size. You never see a sampling distribution in practice; you use statistical theory to understand what it would look like.