Quick Summary
This guide introduces the framework for determining if a linear relationship observed in a sample's scatterplot is statistically significant. You will learn how to formulate the null and alternative hypotheses for the true slope of a population regression line and identify the appropriate inference procedure, a t-test for slope. Most critically, you will master the five essential conditions (LINER) that must be verified before proceeding with inference, using graphical displays like scatterplots and residual plots to justify your reasoning.
Key Concepts
This topic lays the groundwork for formal inference on the slope of a least-squares regression line. We move from describing a linear relationship in a sample to asking if there is convincing evidence that a linear relationship exists in the entire population.
1. The Big Idea: Sample Slope vs. Population Slope
When we fit a least-squares regression line to a sample of bivariate data, we get an equation . The slope of this line is b, the sample slope. It's a statistic calculated from our sample data.
However, we are interested in the true linear relationship for the entire population. We model this with the equation . The slope of this line is β (the Greek letter beta), the population slope. It's a parameter we can never know for certain, so we must use our sample statistic b to make inferences about it.
The central question is: Does our sample slope b provide enough evidence to conclude that the population slope β is not zero? If β = 0, there is no true linear relationship between the variables.
2. Hypotheses for a Test of the Slope
The hypotheses for a significance test for the slope of a regression model always compare the population slope β to a hypothesized value of 0.
Null Hypothesis (H₀): The null hypothesis always states that there is no linear relationship between the two variables in the population.
Formula:
H₀: β = 0In Words: "There is no linear relationship between [explanatory variable in context] and [response variable in context] for the population of [population in context]."
Alternative Hypothesis (Hₐ): The alternative hypothesis states that there is a linear relationship. The specific form depends on the question being asked.
Two-Sided: Used when we are looking for any linear relationship (positive or negative).
Formula:
Hₐ: β \neq 0In Words: "There is a linear relationship between [explanatory variable] and [response variable]."
One-Sided (Positive): Used when we are specifically looking for a positive linear relationship.
Formula:
Hₐ: β > 0In Words: "There is a positive linear relationship between [explanatory variable] and [response variable]."
One-Sided (Negative): Used when we are specifically looking for a negative linear relationship.
Formula:
Hₐ: β < 0In Words: "There is a negative linear relationship between [explanatory variable] and [response variable]."
3. The Appropriate Test
The correct statistical test to determine if the slope of a regression model is significant is a t-test for the slope.
4. Conditions for Inference (LINER)
Before we can perform a t-test for the slope, we must verify five conditions. The acronym LINER is a helpful way to remember them.
(L) Linear: The true relationship between the explanatory variable (x) and the response variable (y) must be linear.
How to Check:
Examine the scatterplot of the original data (y vs. x). It should show a roughly linear form.
Examine the residual plot (residuals vs. x). There should be no leftover curved pattern. The points should be randomly scattered above and below the horizontal line at 0.
[Image: A good residual plot with random scatter vs. a bad residual plot with a clear U-shaped pattern.]
(I) Independent: Individual observations must be independent of each other. When sampling without replacement, we must check the 10% condition.
- How to Check: The sample size, n, must be no more than 10% of the population size, N. (i.e., ). You must state this and show the numbers in context (e.g., "It is reasonable to assume there are more than 10 * 25 = 250 cars of this model in the population.").
(N) Normal: For any given value of x, the distribution of y-values (and thus, the distribution of residuals) must be approximately Normal.
How to Check: Examine a graph of the residuals.
A histogram of the residuals should be roughly symmetric and unimodal, with no strong skew or outliers.
A Normal probability plot of the residuals should be roughly linear.
[Image: A histogram of residuals that is roughly symmetric and unimodal.]
(E) Equal Variance (Homoscedasticity): The standard deviation of the residuals must be the same for all values of x.
How to Check: Examine the residual plot. The amount of vertical scatter of the points around the horizontal line at 0 should be roughly the same from left to right.
What to Avoid: A "fanning" or "cone" shape, where the residuals become more (or less) spread out as x increases. This indicates unequal variance (heteroscedasticity).
[Image: A good residual plot with consistent vertical spread vs. a bad residual plot showing a clear fanning-out pattern.]
(R) Random: The data must come from a well-designed random sample or a randomized experiment.
- How to Check: The problem statement must explicitly state that the data were collected using a random sampling method or a randomized experiment.
Key Vocabulary
Population Slope (β): The true, unknown slope of the least-squares regression line that describes the linear relationship between two variables for an entire population.
Sample Slope (b): The slope of the least-squares regression line calculated from sample data. It serves as the point estimate for the population slope, β.
Residual: The difference between an observed y-value and the y-value predicted by the regression line (residual = y - ŷ). Residuals are crucial for checking the conditions for inference.
Residual Plot: A scatterplot of the residuals against the explanatory variable (x) or the predicted values (ŷ). It is the primary tool for checking the Linear and Equal Variance conditions.
Homoscedasticity: The formal term for the "Equal Variance" condition. It means that the variability of the response variable is the same across all values of the explanatory variable.
Calculator Tech (TI-84)
To check the LINER conditions, you often need to create a scatterplot of the original data and a residual plot.
Scenario: You have explanatory variable data in L1 and response variable data in L2.
Step 1: Enter Data
Press
STAT->1:Edit....Enter your x-values into
L1and your y-values intoL2.
Step 2: Create a Scatterplot of the Original Data
Press
2nd->Y=[STAT PLOT].Select
1:Plot1...and pressENTER.Turn the plot .
Set
Type:to the first option (scatterplot).Set
Xlist: L1andYlist: L2.Press
ZOOM->9:ZoomStatto see the graph. This helps check the Linear condition.
Step 3: Calculate the Regression and Store Residuals
Press
STAT->CALC->8:LinReg(a+bx).Set
Xlist: L1,Ylist: L2$. Leave FreqList` blank. 3. (Optional but good practice) To store the regression equation in Y1, press `VARS` -> `Y-VARS` -> `1:Function...` -> `1:Y1`. 4. Select $Calculate and pressENTER`.The calculator automatically computes the residuals from this regression and stores them in a list called
RESID.
Step 4: Create a Residual Plot
Press
2nd->Y=[STAT PLOT].Select
1:Plot1...(or another plot).Turn the plot .
Set
Type:to scatterplot.Set
Xlist: L1.Set
Ylist: RESID. To getRESID, press2nd->STAT[LIST] and scroll down to7:RESID.Press
ZOOM->9:ZoomStat. You will now see the residual plot. This helps check the Linear and Equal Variance conditions.
How to Show Work on the FRQ
For a full significance test for a slope, you use the State-Plan-Do-Conclude model. Topic 9.1 focuses on the State and Plan steps.
State (Hypotheses and Parameter)
Parameter: Define β in the context of the problem.
- Template: "Let β be the true slope of the population regression line relating [response variable] to [explanatory variable]."
Hypotheses: State the null and alternative hypotheses using symbols and context.
Template:
H₀: β = 0. There is no linear relationship between [response variable] and [explanatory variable] for the population of [population in context].`Hₐ: β \neq 0$ (or or ). There is a [positive/negative/any] linear relationship between [response variable] and [explanatory variable] for the population of [population in context].
Plan (Name the Test and Check Conditions)
Name the Test: Identify the procedure you will use.
- Template: "The appropriate inference procedure is a t-test for the slope of a regression line."
Check Conditions (LINER): Check all five conditions, providing evidence for each.
Template:
Linear: "The scatterplot of [y-variable] vs. [x-variable] appears roughly linear. Furthermore, the residual plot shows a random scatter of points with no leftover pattern, confirming the linear relationship is appropriate."
Independent: "The problem states the data were randomly sampled. Since we are sampling without replacement, we must check the 10% condition. The sample size of n = [sample size] is likely less than 10% of all [population in context]. It is reasonable to assume individual observations are independent."
Normal: "A histogram of the residuals is roughly symmetric and unimodal (or, a Normal probability plot of the residuals is roughly linear), so we can assume the residuals are approximately Normally distributed."
Equal Variance: "The residual plot shows a consistent amount of vertical scatter for all x-values. There is no fanning pattern, so the equal variance condition is met."
Random: "The problem states that the data were collected from a random sample (or randomized experiment)."
Practice Problems
Problem 1:
A real estate agent wants to investigate the relationship between the size of a house (in square feet) and its selling price (in thousands of dollars). She takes a random sample of 30 recently sold houses in a large suburb and records the size and selling price for each. A scatterplot of the data, a residual plot, and a histogram of the residuals are provided below.
[Image: A scatterplot of Price vs. Size, showing a moderately strong, positive, linear association. A residual plot of Residuals vs. Size, showing random scatter with no pattern and consistent vertical spread. A histogram of the residuals, showing a roughly symmetric, unimodal distribution.]
State the hypotheses and check the conditions for performing a significance test for the slope of the regression line.
Solution:
State:
Parameter: Let β be the true slope of the population regression line relating selling price (in thousands of dollars) to the size (in square feet) for houses in this suburb.
Hypotheses: We want to see if there is a linear relationship, so we will use a two-sided test.
H₀: β = 0. There is no linear relationship between the size of a house and its selling price for the population of houses in this suburb.Hₐ: β \neq 0. There is a linear relationship between the size of a house and its selling price for the population of houses in this suburb.
Plan:
The appropriate inference procedure is a t-test for the slope of a regression line. We must check the LINER conditions.
Linear: The provided scatterplot shows a moderately strong, positive, linear relationship between size and selling price. The residual plot shows a random scatter of points with no leftover curved pattern. The linear model is appropriate.
Independent: The agent took a random sample of 30 houses. It is reasonable to assume there are more than 10 * 30 = 300 houses in this large suburb, so the 10% condition is met. We assume the selling price of one house is independent of another.
Normal: The provided histogram of the residuals is roughly symmetric and unimodal, indicating that the residuals are approximately Normally distributed.
Equal Variance: The residual plot shows a consistent amount of vertical scatter around the line at 0 for all house sizes. There is no fanning, so the equal variance condition is met.
Random: The problem states that the data were collected from a random sample of 30 recently sold houses.
All conditions for inference have been met.
Problem 2:
A student investigates whether there is a negative linear relationship between the age of a smartphone (in months) and its battery life (in hours). She randomly selects 8 of her friends and records the age of their phone and its maximum battery life. The data are below:
| Age (months) | 1 | 3 | 6 | 9 | 12 | 18 | 24 | 30 |
|---|---|---|---|---|---|---|---|---|
| Battery (hours) | 15.1 | 14.5 | 13.2 | 12.8 | 11.0 | 9.5 | 8.1 | 6.5 |
State the appropriate hypotheses and check the necessary conditions for a significance test.
Solution:
State:
Parameter: Let β be the true slope of the population regression line relating maximum battery life (in hours) to the age of a smartphone (in months).
Hypotheses: The student is investigating a negative linear relationship, so we will use a one-sided test.
H₀: β = 0. There is no linear relationship between the age of a smartphone and its battery life.Hₐ: β < 0. There is a negative linear relationship between the age of a smartphone and its battery life.
Plan:
The appropriate inference procedure is a t-test for the slope. We must check the LINER conditions. (Note: We would use a calculator to create the required plots from the data.)
Linear: A scatterplot of the data (battery life vs. age) shows a strong, negative, and roughly linear pattern. A residual plot (created on a calculator after running a linear regression) shows no obvious leftover pattern. The linear model is appropriate.
Independent: The student randomly selected 8 friends. We must assume these 8 friends' phones are a representative random sample of a larger population of phones. To meet the 10% condition, we must assume there are more than 10 * 8 = 80 such phones in the population of interest.
Normal: With such a small sample size (n=8), it is difficult to assess Normality from a histogram of residuals. We would create the histogram on a calculator and note that it shows no strong skew or outliers.
Equal Variance: The residual plot shows a similar amount of vertical scatter across all x-values (ages). The equal variance condition appears to be met.
Random: The problem states the student randomly selected 8 friends.
The conditions for inference are met, though we proceed with caution due to the very small sample size.
Common Mistakes to Avoid
Confusing b and β: Writing hypotheses using the sample slope (e.g.,
H₀: b = 0). Hypotheses are always about the unknown population parameter, β.Checking Normality on the Wrong Data: Creating a histogram of the original y-values (e.g., selling prices) to check the Normal condition. You MUST check for Normality of the residuals, not the raw response variable data.
Vague Condition Checks: Simply writing "Linear condition met" or "Normal condition met" is not enough for credit on the AP exam. You must provide justification by referencing a specific graph (e.g., "The scatterplot is linear," "The residual plot has no pattern," "The histogram of residuals is symmetric.").
Misinterpreting the Residual Plot: A residual plot is used to check two conditions. A leftover pattern (like a curve) violates the Linear condition. A fanning shape (changing vertical spread) violates the Equal Variance condition. Know the difference.
Forgetting the 10% Condition: The Independence condition is not just about random sampling. If you are sampling without replacement from a finite population, you must explicitly state and check that your sample size is no more than 10% of the population.