Skills Focus: Selecting, | AP Stats Unit 7 Study Guide

Quick Summary

This guide provides a comprehensive framework for mastering statistical inference. You will learn how to analyze a problem, select the appropriate significance test or confidence interval from a range of options, and meticulously implement the four-step inference process. By the end of this lesson, you will be able to confidently choose the correct procedure for means or proportions, for one or two samples, and clearly communicate your statistical conclusions in context to earn full credit on the AP exam.

Key Concepts

Selecting the correct inference procedure is the most critical skill in this unit. It requires a systematic approach to reading and interpreting the problem. Use the following decision-making framework to guide your choice.

The Three Key Questions to Select Your Procedure:

Inference Type: Test or Interval?
- Is the goal to estimate a population parameter? You are likely being asked for a range of plausible values. Look for keywords like "construct a 95% confidence interval," "estimate the true proportion," or "what is the margin of error?"
- Procedure: Confidence Interval.
- Is the goal to assess evidence for a claim? You are likely being asked to see if a value has changed, is different from, or is greater/less than some hypothesized value. Look for keywords like "is there convincing evidence," "test the claim," "is the new method better," or a significance level (α).
- Procedure: Significance Test.
Parameter Type: Mean (μ) or Proportion (p)?
- Are you working with categorical data? The data will consist of counts of "successes" and "failures" in a category. The parameter of interest will be a proportion (p). Look for percentages, fractions, or phrases like "proportion of students" or "percentage of defective items."
- Procedures: z-procedures (1-PropZTest, 2-PropZInt, etc.).
- Are you working with quantitative data? The data will be numerical measurements or counts for which it makes sense to calculate an average. The parameter of interest will be a mean (μ). Look for phrases like "average weight," "mean score," or "average difference."
- Procedures: t-procedures (T-Test, 2-SampTTest, etc.). Note: We use t-procedures for means unless the population standard deviation (σ) is known, which is extremely rare in practice.
Number of Samples/Groups: One or Two? Paired?
- One Sample: You have data from a single group, and you are comparing it to a known value or estimating a single parameter.
  - Example: Estimating the proportion of all students at a school who have a part-time job.
- Two Independent Samples: You have data from two separate, unrelated groups, and you are comparing them to each other.
  - Example: Comparing the average GPA of male students to the average GPA of female students.
- Paired Data (a special type of one-sample test): You have two data points for each subject or experimental unit, or the subjects are matched in some way. You are interested in the mean difference.
  - Example: Measuring the blood pressure of 50 patients before and after taking a new medication. The data are the differences in blood pressure for each patient. This is analyzed as a one-sample t-test on the differences.

Inference Procedure Flowchart:

[Image: A flowchart diagram showing the decision process. The first branch asks "Test or Interval?". The next level asks "Mean or Proportion?". The final level asks "One Sample, Two Samples, or Paired?". Each final branch leads to a specific procedure name, like "One-Sample t-test for a mean (μ)" or "Two-Sample z-interval for a difference in proportions (p1 - p2)".]

Summary of Common Inference Procedures & Conditions:

| Procedure Name | Parameter(s) | Conditions - | | One-Sample z-Interval for a Proportion | p | Random: Data from a random sample. 10% Condition: n \le 0.10N. Large Counts: np̂ \ge 10 and n(1-p̂) \ge 10. - | | One-Sample z-Test for a Proportion | p | Random: Data from a random sample. 10% Condition: n \le 0.10N. Large Counts: np₀ \ge 10 and n(1-p₀) \ge 10. (Use p₀ from H₀) - | | One-Sample t-Interval for a Mean | μ | Random: Data from a random sample. 10% Condition: n \le 0.10N. Normal/Large Sample: n \ge 30 OR population is stated as Normal OR a graph of sample data shows no strong skew or outliers. - | | One-Sample t-Test for a Mean | μ | Random: Data from a random sample. 10% Condition: n \le 0.10N. Normal/Large Sample: n \ge 30 OR population is stated as Normal OR a graph of sample data shows no strong skew or outliers. - | | Two-Sample z-Interval for Diff. of Proportions | p₁ - p₂ | Random: Two independent random samples. 10% Condition: n₁ \le 0.10N₁ and n₂ \le 0.10N₂. Large Counts: n₁p̂₁ \ge 10, n₁(1-p̂₁) \ge 10, n₂p̂₂ \ge 10, and n₂(1-p̂₂) \ge 10. - | | Two-Sample z-Test for Diff. of Proportions | p₁ - p₂ | Random: Two independent random samples. 10% Condition: n₁ \le 0.10N₁ and n₂ \le 0.10N₂. Large Counts: Use pooled proportion p̂_c. n₁p̂_c \ge 10, n₁(1-p̂_c) \ge 10, n₂p̂_c \ge 10, and n₂(1-p̂_c) \ge 10. - | | Two-Sample t-Interval for Diff. of Means | μ₁ - μ₂ | Random: Two independent random samples. 10% Condition: n₁ \le 0.10N₁ and n₂ \le 0.10N₂. Normal/Large Sample: n₁ \ge 30 and n₂ \ge 30 OR both populations are Normal OR graphs of both samples show no strong skew/outliers. - | | Two-Sample t-Test for Diff. of Means | μ₁ - μ₂ | Random: Two independent random samples. 10% Condition: n₁ \le 0.10N₁ and n₂ \le 0.10N₂. Normal/Large Sample: n₁ \ge 30 and n₂ \ge 30 OR both populations are Normal OR graphs of both samples show no strong skew/outliers. - |

t-Test for Paired Data (Mean Difference) | μ_diff | Random: Data from a random sample. 10% Condition: n \le 0.10N. Paired Data: Data are paired. Normal/Large Sample: n_diff \ge 30 OR population of differences is Normal OR a graph of the sample differences shows no strong skew/outliers. |

Key Vocabulary

Parameter of Interest: A numerical value that describes a characteristic of a population (e.g., the true mean GPA, μ, of all students at a school). We use inference to estimate this value or test a claim about it.
Statistic: A numerical value calculated from a sample (e.g., the sample mean GPA, x̄, of 50 students). It is used as an estimate for the population parameter.
Significance Test: A formal procedure that uses sample data to assess the evidence against a claim (the null hypothesis) about a population parameter.
Confidence Interval: An interval of plausible values for a population parameter, calculated from sample data. The confidence level gives the long-run success rate of the method.
Paired Data: Data consisting of two observations on the same individual or on two matched individuals. The analysis focuses on the difference between the two observations in each pair.
Type I Error: The error of rejecting the null hypothesis (H₀) when it is actually true. The probability of a Type I error is the significance level, α.
Type II Error: The error of failing to reject the null hypothesis (H₀) when it is actually false. The probability of a Type II error is denoted by β.

Calculator Tech (TI-84)

Your calculator is essential for the "Do" step, but only after you have correctly identified the procedure in the "Plan" step. All inference procedures are found under the STAT -> TESTS menu.

How to Find the Right Procedure:

Press STAT, then use the right arrow to select the TESTS menu.
Scroll through the list to find the procedure you named in your "Plan" step.

Common Procedures (STAT -> TESTS):

Means (t-procedures):
- 2: T-Test... (One-sample t-test for μ)
- 4: 2-SampTTest... (Two-sample t-test for μ₁ - μ₂)
- 8: TInterval... (One-sample t-interval for μ)
- 0: 2-SampTInt... (Two-sample t-interval for μ₁ - μ₂)
- Note: For a paired t-test, you first calculate the differences (e.g., L3 = L1 - L2) and then run a 2: T-Test... on the list of differences (L3).
Proportions (z-procedures):
- 5: 1-PropZTest... (One-sample z-test for p)
- 6: 2-PropZTest... (Two-sample z-test for p₁ - p₂)
- A: 1-PropZInt... (One-sample z-interval for p)
- B: 2-PropZInt... (Two-sample z-interval for p₁ - p₂)

When you select a procedure, the calculator will prompt you for inputs like the hypothesized mean/proportion (μ₀ or p₀), sample statistics (x̄, sₓ, n, or x, n), and the alternative hypothesis (\neq, <, >). Always double-check that you have entered these values correctly from the problem.

How to Show Work on the FRQ

To earn full credit on an inference FRQ, you must communicate your reasoning clearly. The four-step State-Plan-Do-Conclude process is the gold standard.

STATE (1 point): Define parameters and state hypotheses or confidence level.

For a Significance Test:
- "Let [parameter symbol, e.g., μ₁, p_diff] = the true [parameter in context]."
- "We will test the following hypotheses at an α = [significance level] level:"
- "H₀: [null hypothesis in symbols, e.g., μ₁ - μ₂ = 0]"
- "Hₐ: [alternative hypothesis in symbols, e.g., μ₁ - μ₂ > 0]"
- Make sure to define both parameters if you have two groups.
For a Confidence Interval:
- "We want to estimate [parameter in context] with a [C%] confidence interval."
- "The parameter of interest is [parameter symbol] = the true [parameter in context]."

PLAN (1 point): Name the procedure and check conditions.

Name: "The appropriate inference procedure is a [full name of procedure, e.g., two-sample t-test for a difference in means]."
Check Conditions: Check each condition by name, showing work and referencing the problem.
- Random: "The data come from [two independent random samples of... / a random sample of...], as stated in the problem."
- 10% Condition (for sampling without replacement): "The sample size(s) of n = [value] is/are less than 10% of the population size(s) of all [population description]." (e.g., 50 < 0.10 * (all smartphones)). This is to ensure independence.
- Large Counts/Normality:
  - For Proportions: "The Large Counts condition is met because n₁p̂₁ = ... \ge 10, n₁(1-p̂₁) = ... \ge 10, etc." (Show calculations for all relevant counts).
  - For Means: "The Normal/Large Sample condition is met because the sample size n = [value] is \ge 30, which means the sampling distribution of the mean is approximately Normal by the Central Limit Theorem." OR "The problem states the population is Normally distributed." OR "A graph of the sample data (sketch or describe it) shows no strong skew or outliers, so we can assume the population of differences is approximately Normal."

DO (1 point): Calculations.

Provide the key calculated values. You do not need to show the massive formula substitutions if you use the calculator, but you must show the initial setup.
General Formula: Write the general formula for the statistic (e.g., $t es t s t a t i s t i c = (s t a t i s t i c - p a r am e t er) / (s t an d a r d erroro f s t a t i s t i c)$ ).
Calculations:
- For a Test: Report the calculated test statistic (e.g., t = 2.45), the degrees of freedom (df), and the p-value.
- For an Interval: Report the calculated interval in the form (lower bound, upper bound). It's good practice to also show the formula: $s t a t i s t i c \pm (cr i t i c a l v a l u e) * (s t an d a r d error)$ .
Example (Test): $t = (\overset{x}{ˉ}_{1} - \overset{x}{ˉ}_{2}) / s q r t (s_{1}^{2} / n_{1} + s_{2}^{2} / n_{2})$ -> $t = 2.14$ , $df = 45.8$ , $p - v a l u e = 0.018$
Example (Interval): $(\overset{p}{^}_{1} - \overset{p}{^}_{2}) \pm z * * s q r t (\overset{p}{^}_{1} (1 - \overset{p}{^}_{1}) / n_{1} + \overset{p}{^}_{2} (1 - \overset{p}{^}_{2}) / n_{2})$ -> $(- 0.08, 0.24)$

CONCLUDE (1 point): Interpret your results in context.

For a Significance Test:
- "Because our p-value of [value] is [less than / greater than] our significance level α = [value], we [reject / fail to reject] the null hypothesis."
- "We [have / do not have] convincing statistical evidence that [state the alternative hypothesis in words and context]."
For a Confidence Interval:
- "We are [C%] confident that the interval from [lower bound] to [upper bound] captures the true [parameter in context]."
- If asked to make a decision based on the interval: "Because the value [e.g., 0] is [in / not in] our confidence interval, we [do not have / have] convincing evidence of a true difference between..."

Practice Problems

Problem 1:

A high school administrator wants to determine if there is a significant difference in the average number of hours students spend on homework per week between junior and senior students. The administrator selects a random sample of 40 juniors and a random sample of 45 seniors. For the juniors, the sample mean was 12.5 hours with a standard deviation of 3.2 hours. For the seniors, the sample mean was 11.3 hours with a standard deviation of 2.8 hours. Is there convincing evidence at the α = 0.05 level of a difference in the mean homework hours between juniors and seniors at this school?

Solution:

STATE:

Let μ_J = the true mean number of hours spent on homework per week by all juniors at the school.

Let μ_S = the true mean number of hours spent on homework per week by all seniors at the school.

We will test the following hypotheses at an α = 0.05 significance level:

H₀: μ_J - μ_S = 0 (There is no difference in mean homework hours.)

Hₐ: μ_J - μ_S \neq 0 (There is a difference in mean homework hours.)

PLAN:

The appropriate inference procedure is a two-sample t-test for a difference in means.

Conditions:

Random: The data come from independent random samples of 40 juniors and 45 seniors, as stated in the problem.
10% Condition: It is reasonable to assume there are more than 400 juniors (10 * 40) and 450 seniors (10 * 45) at the high school, so the samples are less than 10% of their respective populations.
Normal/Large Sample: The sample sizes are n_J = 40 and n_S = 45. Both are \ge 30, so the Central Limit Theorem applies and the sampling distribution of the difference in sample means is approximately Normal.

DO:

Sample statistics: x̄_J = 12.5, s_J = 3.2, n_J = 40; x̄_S = 11.3, s_S = 2.8, n_S = 45.

Test Statistic:

t = ( (x̄_J - x̄_S) - (μ_J - μ_S) ) / sqrt(s_J^2/n_J + s_S^2/n_S)

t = ( (12.5 - 11.3) - 0 ) / sqrt(3.2^2/40 + 2.8^2/45)

Using a calculator (2-SampTTest):

t \approx 1.828

df \approx 79.9 (using calculator's formula)

p-value \approx 0.071

CONCLUDE:

Because our p-value of 0.071 is greater than our significance level α = 0.05, we fail to reject the null hypothesis. We do not have convincing statistical evidence to conclude that there is a difference in the true mean number of hours spent on homework per week between juniors and seniors at this school.

Problem 2:

A city council is considering a new recycling initiative. A polling agency wants to estimate the proportion of residents who support the initiative. They take a simple random sample of 200 city residents and find that 128 of them support the initiative. Construct and interpret a 95% confidence interval for the true proportion of all city residents who support the initiative.

Solution:

STATE:

We want to estimate the true proportion of all city residents who support the new recycling initiative with a 95% confidence interval.

The parameter of interest is p = the true proportion of all residents in the city who support the initiative.

PLAN:

The appropriate inference procedure is a one-sample z-interval for a proportion.

Conditions:

Random: The data come from a simple random sample of 200 city residents, as stated.
10% Condition: The sample size n = 200 is likely less than 10% of all residents in the city.
Large Counts: The number of successes (128) and failures (200 - 128 = 72) are both at least 10.

DO:

Sample proportion: p̂ = 128 / 200 = 0.64

For 95% confidence, the critical value is z* = 1.96.

Confidence Interval Formula: p̂ ± z* * sqrt(p̂(1-p̂)/n)

0.64 ± 1.96 * sqrt(0.64(0.36)/200)

0.64 ± 1.96 * (0.0339)

0.64 ± 0.0665

Interval: (0.5735, 0.7065)