Constructing a Confidence | AP Stats Unit 7 Study Guide

Quick Summary

This guide will enable you to construct and interpret a confidence interval for an unknown population mean (μ). You will learn to identify the correct inference procedure (the one-sample t-interval), verify the necessary conditions for its use, and perform the calculations using both the formula and a graphing calculator. By the end, you will be able to communicate your statistical findings clearly and accurately, just as required on the AP exam.

Key Concepts

When we want to estimate an unknown population mean (μ), we take a sample and calculate the sample mean (x̄). While x̄ is our best single guess (a point estimate), it's almost certainly not exactly equal to μ. A confidence interval provides a more useful estimate: a range of plausible values for the true population mean.

The Problem: The Population Standard Deviation (σ) is Unknown

In nearly all practical situations, if we don't know the population mean (μ), we also don't know the population standard deviation (σ). This is a problem. If we knew σ, we could use a z-distribution to build our interval. But since we don't, we must estimate it using the sample standard deviation (sₓ).

Using sₓ instead of σ introduces more variability into our calculations. To account for this extra uncertainty, we use a different distribution called the t-distribution.

The t-Distribution

The t-distribution is a family of distributions that, like the Normal distribution, are symmetric, single-peaked, and bell-shaped. However, t-distributions have more area in the tails, reflecting the greater uncertainty from using sₓ.

Degrees of Freedom (df): The specific t-distribution we use is defined by its degrees of freedom, which for a one-sample mean procedure is df = n - 1.
Shape: As the degrees of freedom increase (i.e., as the sample size n gets larger), the t-distribution gets closer and closer to the standard Normal (z) distribution. This is because a larger sample size gives us a better estimate of σ, reducing the extra uncertainty.

[Image: A graph showing a standard Normal (z) curve, a t-distribution with low df (e.g., df=2), and a t-distribution with higher df (e.g., df=30). The t-curves should be slightly shorter and wider, with fatter tails than the z-curve, and the df=30 curve should be very close to the z-curve.]

The Formula for a One-Sample t-Interval for a Population Mean

The general formula for a confidence interval is:

Point Estimate ± (Critical Value) × (Standard Error)

For a population mean, this becomes:

x̄ ± t(sₓ / √n)*

x̄ is the sample mean, our point estimate for μ.
sₓ is the sample standard deviation.
n is the sample size.
sₓ / √n is the standard error of the sample mean. It estimates the typical distance between a sample mean (x̄) and the population mean (μ).
t* is the critical value from the t-distribution with df = n - 1. It is determined by the confidence level. For example, for a 95% confidence interval, t* is the value that leaves 2.5% of the area in each tail of the t-distribution.
The entire term t(sₓ / √n)* is the margin of error.

Conditions for Inference

Before you can construct a valid t-interval, you must check three conditions:

Random: The data must come from a well-designed random sample or a randomized experiment. This condition is crucial for ensuring the sample is representative of the population and for reducing bias.
10% Condition (Independence): When sampling without replacement, the sample size n should be no more than 10% of the population size N (i.e., n \le 0.10N). This ensures that individual observations are reasonably independent.
Normal/Large Sample: The distribution of sample means needs to be approximately Normal. This can be satisfied in one of three ways (in order of preference):
- The population distribution is stated to be Normal.
- The sample size is large (n \ge 30). This invokes the Central Limit Theorem (CLT), which states that for large n, the sampling distribution of x̄ will be approximately Normal, regardless of the population's shape.
- If the sample size is small (n < 30) and the population shape is unknown, you must graph the sample data (e.g., a boxplot, dotplot, or histogram) and confirm there is no strong skewness or outliers.

Interpreting the Results

There are two key interpretations you must know:

Interpreting the Confidence Interval: "We are [C]% confident that the interval from [lower bound] to [upper bound] captures the true mean [parameter in context]."
Interpreting the Confidence Level: "If we were to take many random samples of the same size from this population and construct a [C]% confidence interval for each, about [C]% of these intervals would capture the true mean [parameter in context]."

Key Vocabulary

Confidence Interval: An interval of plausible values for an unknown population parameter, calculated from sample data.
t-distribution: A probability distribution used to estimate population parameters when the sample size is small and/or the population standard deviation is unknown. It is defined by its degrees of freedom.
Degrees of Freedom (df): For a one-sample t-interval, it is calculated as n - 1. It determines the specific t-distribution curve to be used.
Standard Error of the Mean: An estimate of the standard deviation of the sampling distribution of the sample mean (x̄). It is calculated as sₓ / √n.
Critical Value (t)*: The multiplier in the margin of error formula that is determined by the confidence level and degrees of freedom. It marks the boundary for the middle C% of the area under the t-distribution curve.
Margin of Error: The value that is added to and subtracted from the point estimate to create the confidence interval. It represents the maximum likely estimation error.

Calculator Tech (TI-84)

You will use the $T I n t er v a l$ function to calculate the confidence interval.

Path:STAT -> TESTS -> 8: TInterval

You have two input options: $St a t s$ or $D a t a$ .

1. Using $St a t s$ (when you have summary statistics):

Select $St a t s$ and press ENTER.
x̄: Enter the sample mean.
Sx: Enter the sample standard deviation.
n: Enter the sample size.
C-Level: Enter the confidence level as a decimal (e.g., 0.95 for 95%).
Highlight $C a l c u l a t e$ and press ENTER.
The calculator will output the confidence interval (lower, upper), x̄, Sx, and n.

2. Using $D a t a$ (when you have a raw list of data):

First, enter your data into a list (e.g., L1) by pressing STAT -> 1: Edit....
Go to the $T I n t er v a l$ function: STAT -> TESTS -> 8: TInterval.
Select $D a t a$ and press ENTER.
List: Specify the list where your data is stored (e.g., L1).
Freq: Leave this as 1 unless you have a frequency list.
C-Level: Enter the confidence level as a decimal.
Highlight $C a l c u l a t e$ and press ENTER.
The calculator will calculate x̄ and Sx from your data and provide the confidence interval.

How to Show Work on the FRQ

For any inference procedure on the AP exam, use the four-step State-Plan-Do-Conclude (SPDC) process to earn full credit.

State: Define the parameter you are estimating and the confidence level.

Template: "We want to estimate the true mean [describe the parameter in context], μ, at a C% confidence level."

Plan: Name the procedure and check the conditions.

Template:
- Procedure: One-sample t-interval for a population mean.
- Conditions:
  - 1. Random: The data come from a [random sample / randomized experiment] of [context].
  - 2. 10% Condition: The sample size n = [value] is less than 10% of the total population of [context]. (It is reasonable to assume there are at least [10 × n] [items in context]).
  - 3. Normal/Large Sample: [Choose one justification]:
    - "The population is stated to be Normal."
    - "The sample size is large (n = [value] \ge 30), so by the Central Limit Theorem, the sampling distribution of x̄ is approximately Normal."
    - "The sample size is small (n = [value] < 30), so we will examine a graph of the sample data. [Sketch or describe the graph, e.g., a boxplot]. The graph shows no strong skewness or outliers, so it is reasonable to assume the underlying population is approximately Normal."

Do: Perform the calculations. Show the formula, substitute the values, and state the final interval.

Template:
- Formula: x̄ ± t*(sₓ / √n)
- Values: x̄ = [value], sₓ = [value], n = [value], df = n - 1 = [value]
- Critical Value: For a C% confidence interval with [df] degrees of freedom, t* = [value]. (You can find this using a t-table or $in v T$ on the calculator).
- Calculation: [value] ± [t* value] * ([sₓ value] / √[n value])
  [value] ± [margin of error value]
- Interval: ([lower bound], [upper bound])

Conclude: Interpret the interval in the context of the problem.

Template: "We are C% confident that the interval from [lower bound] to [upper bound] captures the true mean [describe the parameter in context]."

Practice Problems

Problem 1:

A coffee machine is designed to dispense an average of 12 ounces of coffee per cup. To test the machine, a quality control manager randomly selects 50 cups of coffee and finds the sample mean amount is 11.85 ounces with a sample standard deviation of 0.40 ounces. Construct and interpret a 95% confidence interval for the true mean amount of coffee dispensed by the machine.

Solution:

State: We want to estimate μ, the true mean amount of coffee dispensed by the machine, with 95% confidence.

Plan:

Procedure: One-sample t-interval for a population mean.
Conditions:
- 1. Random: The problem states the 50 cups were "randomly selected."
- 2. 10% Condition: The sample size is n = 50. It is reasonable to assume the machine dispenses more than 10 × 50 = 500 cups of coffee.
- 3. Normal/Large Sample: The sample size is large (n = 50 \ge 30), so by the Central Limit Theorem, the sampling distribution of the sample mean is approximately Normal.

Do:

Formula: x̄ ± t*(sₓ / √n)
Values: x̄ = 11.85, sₓ = 0.40, n = 50, df = 50 - 1 = 49.
Critical Value: For a 95% confidence interval with df = 49, t* \approx 2.010. (Using a calculator's $in v T (0.975, 49)$ or a t-table).
Calculation: 11.85 ± 2.010 * (0.40 / √50)
11.85 ± 2.010 * (0.0566)
11.85 ± 0.1137
Interval: (11.736, 11.964)
(Calculator check: Using $T I n t er v a l$ with these stats gives (11.736, 11.964).)

Conclude: We are 95% confident that the interval from 11.736 ounces to 11.964 ounces captures the true mean amount of coffee dispensed by the machine.

Problem 2:

A high school counselor wants to estimate the mean number of hours students at her school spend on homework per week. She takes a random sample of 12 students and records their self-reported hours:

$10, 15, 8, 12, 14, 7, 9, 11, 13, 16, 10, 12$

Construct and interpret a 90% confidence interval for the true mean number of hours spent on homework.