Introducing Statistics: Should | AP Stats Unit 7 Study Guide

Quick Summary

This lesson introduces the foundational concepts of statistical inference. You will learn to distinguish between a parameter, which describes a whole population, and a statistic, which is calculated from a sample. We will explore why statistics vary from sample to sample—a concept called sampling variability—and introduce the idea of a sampling distribution, which is the theoretical distribution of a statistic from all possible samples.

Key Concepts

1. Parameters vs. Statistics

The primary goal of inference is to use information from a sample to draw conclusions about a larger population. To do this, we must be precise about what we are measuring in the sample versus what we want to know about the population.

Population: The entire group of individuals or objects we are interested in studying.
Sample: A subset of the population from which we actually collect data.
Parameter: A number that describes a characteristic of the population.
- Parameters are typically unknown because we can't measure the entire population.
- A parameter is a fixed value. There is only one true mean height of all adult males in the U.S., for example.
- Common parameters and their notation:
  - Population Mean: μ (mu)
  - Population Proportion: p
  - Population Standard Deviation: σ (sigma)
Statistic: A number that is calculated from sample data.
- We use statistics to estimate unknown parameters.
- A statistic is a random variable; its value changes from sample to sample. If you take two different samples from the same population, you will likely get two different sample means.
- Common statistics and their notation:
  - Sample Mean: x̄ ("x-bar")
  - Sample Proportion: p̂ ("p-hat")
  - Sample Standard Deviation: s

Mnemonic:

Parameter goes with Population.
Statistic goes with Sample.

[Image: A large circle labeled "Population (Parameter μ is unknown)". An arrow points from it to a smaller circle labeled "Sample (Statistic x̄ is calculated)". The caption reads: "We use the sample statistic (x̄) to estimate the population parameter (μ)."]

Example: A national pollster wants to know the proportion of U.S. adults who approve of the current president. They take a random sample of 1,500 adults and find that 630 of them approve.

Population: All U.S. adults.
Sample: The 1,500 adults surveyed.
Parameter: The true proportion, p, of all U.S. adults who approve of the president. This value is unknown.
Statistic: The sample proportion, p̂ = 630 / 1500 = 0.42. This is our estimate of the true proportion, p.

2. Sampling Variability (aka Sampling Error)

If the pollster in the example above took a different random sample of 1,500 adults, would they get exactly 630 approvals again? Almost certainly not. They might get 615, or 640, or some other number. This means their calculated sample proportion, p̂, would be different.

Sampling Variability is the natural, expected variation in the values of a statistic from one random sample to another.
It is not an "error" in the sense of a mistake. It is the unavoidable result of using a sample to learn about a population.
The amount of sampling variability depends on two main factors:
1. Sample Size (n): Larger samples tend to produce statistics with less variability. A statistic from a sample of 1000 people will be a more consistent estimate of the parameter than a statistic from a sample of 10 people.
2. The method of sampling: Well-designed random sampling is crucial.

3. The Sampling Distribution

Since a statistic like x̄ or p̂ is a random variable, it has a distribution. This special distribution is the key to all of statistical inference.

The Sampling Distribution of a statistic is the distribution of values taken by the statistic in all possible samples of the same size (n) from the same population.

This is a theoretical concept. We don't actually take all possible samples. However, statistical theory tells us what this distribution should look like.

Conceptualizing a Sampling Distribution:

Imagine a large population (e.g., all the pennies in a jar). The parameter of interest is the true mean year of all the pennies, μ.
Take a random sample of size n=20. Calculate the sample mean year, x̄.
Plot that one value of x̄ on a dotplot.
Return the pennies, mix them up, and repeat the process. Take another random sample of n=20 and calculate its mean, x̄. Plot this new value.
Repeat this process thousands and thousands of times.
The resulting dotplot of all the x̄ values approximates the sampling distribution of the sample mean.

[Image: Three panels. Panel 1 shows a normal curve labeled "Population Distribution (μ)". Panel 2 shows several small histograms labeled "Sample 1 (x̄₁)", "Sample 2 (x̄₂)", etc. Panel 3 shows a dotplot of the many x̄ values, which are forming a new, narrower normal curve. This curve is labeled "Sampling Distribution of x̄".]

Like any other distribution, a sampling distribution can be described by its:

Shape: Is it symmetric, skewed, bell-shaped?
Center: Where is the distribution centered? (We will learn this is typically the value of the population parameter).
Spread (Variability): How much do the statistics vary from one another? (This is measured by the standard deviation of the statistic, often called the standard error).

Understanding the properties of this theoretical distribution allows us to say how close our single sample statistic is likely to be to the unknown population parameter we are trying to estimate.

Key Vocabulary

Population: The entire collection of individuals or instances about which we want to draw conclusions.
Sample: A subset of a population, selected for study in some prescribed manner.
Parameter: A numerical value that describes a characteristic of a population (e.g., μ, p). It is a fixed, but typically unknown, number.
Statistic: A numerical value that is calculated from a sample (e.g., x̄, p̂). It is used to estimate a parameter, and its value varies from sample to sample.
Sampling Variability: The natural tendency of a statistic to have different values for different random samples. This is not a mistake, but a property of sampling.
Sampling Distribution: The probability distribution of a statistic, showing all possible values the statistic could take and how often they would occur, from all possible samples of a given size.

Calculator Tech (TI-84)

No major calculator functions are required for this topic. The concepts are foundational and conceptual.

How to Show Work on the FRQ

Questions on this topic are conceptual and focus on clear communication and correct use of terminology and notation.

Template for Identifying Parameters and Statistics

When asked to identify a parameter or statistic in a given scenario, use this template for a complete answer.

Prompt: "Identify the parameter and statistic of interest."

Response Structure:

"The parameter of interest is the [true mean/proportion] of [describe the population in context]. The notation for this parameter is [μ for mean, p for proportion].

The statistic is the [sample mean/proportion] of [describe the sample in context]. The notation for this statistic is [x̄ for mean, p̂ for proportion], and its value is [state the calculated value, if given]."

Template for Describing a Sampling Distribution

When asked to describe what a sampling distribution is in a specific context, use this template.

Prompt: "Describe the sampling distribution of the sample proportion."

Response Structure:

"The sampling distribution of the [name the statistic, e.g., sample proportion, p̂] describes the distribution of all possible values of the [statistic] that could be obtained from all possible random samples of size n = [state the sample size] taken from the population of [describe the population in context]."

Practice Problems

Problem 1:

A large university reports that 22% of its undergraduate students are majoring in a STEM (Science, Technology, Engineering, or Math) field. The student newspaper takes a simple random sample of 150 undergraduate students and finds that 40 of them are STEM majors.

(a) Identify the population and the sample.

(b) Identify the parameter and the statistic. Use appropriate notation.

Solution:

(a)

Population: All undergraduate students at this large university.
Sample: The 150 undergraduate students selected by the student newspaper.

(b)

The parameter of interest is the true proportion of all undergraduate students at this university who are STEM majors. The notation for this parameter is p, and its value is given as p = 0.22.
The statistic is the sample proportion of the 150 surveyed students who are STEM majors. The notation for this statistic is p̂, and its value is p̂ = 40 / 150 \approx 0.267.

Problem 2:

The U.S. Census Bureau reports that the median household income in a particular state is $68,400. A sociologist is studying poverty and takes a random sample of 400 households from the state to estimate the mean household income.

(a) Is $68,400 a parameter or a statistic? Explain.

(b) The sociologist calculates the mean income for her sample of 400 households. In the context of this study, describe the sampling distribution of the sample mean household income.