Quick Summary
This guide will equip you to master the least-squares regression line (LSRL), the fundamental tool for modeling linear relationships between two quantitative variables. You will learn to calculate the equation of the LSRL using summary statistics, interpret its slope and y-intercept in the context of a problem, and use the line to make predictions. By the end of this lesson, you will be able to construct and interpret a complete linear regression model, a critical skill for the AP exam.
Key Concepts
The Least-Squares Regression Line (LSRL) is the specific line that best models a linear association between an explanatory variable (x) and a response variable (y). It is the line that minimizes the sum of the squared residuals (the vertical distances from each data point to the line).
[Image: A scatterplot with a regression line. Several vertical lines are drawn from the points to the line, representing residuals. Some of these residual lines are labeled 'e' and are squared to show the area 'e^2'.]
The Equation of the LSRL:
The equation is always written in the form:
ŷ = a + bx
ŷ (read "y-hat") is the predicted value of the response variable for a given value of x. The "hat" is crucial—it distinguishes a prediction from an actual observed value (y).
a is the y-intercept, the predicted value of y when x = 0.
b is the slope, the predicted change in y for each one-unit increase in x.
x is the value of the explanatory variable.
Calculating the Slope (b) and Y-Intercept (a) from Summary Statistics:
You are often given summary statistics instead of raw data. Use these specific formulas to find the equation of the LSRL.
Formulas:
Slope:
Y-Intercept:
Where:
= correlation coefficient between x and y
= standard deviation of the response variable, y
= standard deviation of the explanatory variable, x
= mean of the response variable, y
= mean of the explanatory variable, x
Interpreting the Slope (b):
The slope is the most important part of the model. It describes the strength and direction of the linear relationship in practical terms.
Interpretation Template: "For each one [unit] increase in [x-variable in context], we predict the [y-variable in context] to [increase/decrease] by [b units]."
The sign of the slope ( or ) must match the sign of the correlation ().
Interpreting the Y-Intercept (a):
The y-intercept is the predicted starting point of the model. Its practical interpretation depends on whether x=0 is a meaningful value in the context of the data.
Interpretation Template: "When the [x-variable in context] is 0 [units], we predict the [y-variable in context] to be [a units]."
Warning on Extrapolation: If the value x=0 is far outside the range of x-values used to create the model, interpreting the y-intercept is an act of extrapolation and may not be meaningful. Always consider if x=0 makes sense for the problem.
Key Property of the LSRL:
The least-squares regression line always passes through the point of averages (x̄, ȳ). This means if you plug the mean of x into the equation, the predicted value ŷ will be the mean of y. This is a useful fact for checking your calculations and for certain multiple-choice questions.
Key Vocabulary
Least-Squares Regression Line (LSRL): The line that minimizes the sum of the squared vertical distances (residuals) between the observed data points and the line itself. Also known as the "line of best fit."
Residual: The error in a prediction, calculated as the difference between the observed y-value and the predicted y-value: or .
Slope (b): The predicted amount by which the response variable (y) changes for every one-unit increase in the explanatory variable (x).
Y-Intercept (a): The predicted value of the response variable (y) when the explanatory variable (x) is equal to zero.
Explanatory Variable (x): The independent variable that is used to model or predict changes in the response variable. It is plotted on the horizontal axis.
Response Variable (y): The dependent variable whose outcome we are trying to predict. It is plotted on the vertical axis.
Extrapolation: Using a regression line to make predictions for x-values that are far outside the range of the original data. These predictions are often unreliable.
Calculator Tech (TI-84)
To calculate the LSRL from raw data in two lists (e.g., L1 and L2):
Step 0: First-Time Setup (Turn Diagnostics On)
This step ensures your calculator will show the correlation coefficient (r). You only need to do this once.
2nd -> [CATALOG] -> Scroll down to -> ENTER -> ENTER. It should say "Done."
Step 1: Enter Data
Press
STAT->1:Edit...Enter your explanatory (x) values into list
L1.Enter your corresponding response (y) values into list
L2.
Step 2: Calculate the LSRL
Press
STAT->CALC->8:LinReg(a+bx).(Note: Option 4:LinReg(ax+b) is the same, but the letters for slope and intercept are swapped. AP Statistics exclusively uses for the intercept and for the slope, so option 8 is preferred.)
The screen will show LinReg(a+bx)`. - **Xlist:** `L1` - **Ylist:** `L2` - **FreqList:** Leave blank. - **Store RegEQ:** (Optional but highly recommended) Press `VARS` -> `Y-VARS` -> `1:Function...` -> `1:Y1`. This stores the equation in your calculator's graphing memory. - **Calculate:** Highlight and press `ENTER`. **Step 3: Read the Output** The calculator will display: - $y=a+bx
(your y-intercept)
(your slope)
(the coefficient of determination)
(the correlation coefficient)
How to Show Work on the FRQ
For questions involving LSRL, graders look for clear communication in context. Use these templates to ensure you earn full credit.
1. Template for Interpreting the Slope (b):
Structure: "For each additional [1 unit] of [Explanatory Variable in Context], the predicted [Response Variable in Context] [increases/decreases] by approximately [value of b, with units]."
Example: For a slope of relating hours studied (x) and test score (y): "For each additional hour a student studies, their predicted test score increases by approximately 4.5 points."
Key Words: You MUST use "predicted" or "on average." Saying the score will increase is too definitive and will lose credit.
2. Template for Interpreting the Y-Intercept (a):
Structure: "When the [Explanatory Variable in Context] is 0 [units], the predicted [Response Variable in Context] is [value of a, with units]."
Example: For an intercept of : "For a student who studies for 0 hours, the predicted test score is 65.2 points."
Add Contextual Check: After the interpretation, add a sentence about whether this value is meaningful. "This is a reasonable interpretation, as it is possible for a student to study for 0 hours." OR "This is an example of extrapolation, as there were no houses with 0 square feet in our data, and a house cannot have 0 square feet."
3. Template for Making a Prediction:
Step 1 (Equation): Write the LSRL equation using contextual variable names.
Step 2 (Substitution): Plug the given x-value into the equation. Show the substitution.
Step 3 (Answer): State the final answer with units.
Practice Problems
Problem 1:
A guidance counselor is investigating the relationship between the number of AP courses a student takes and their final GPA. For a sample of students, the counselor finds the following summary statistics:
| Variable | Mean | Standard Deviation |
|---|---|---|
| Number of AP Courses (x) | x̄ = 3.5 | S_x = 1.2 |
| GPA (y) | ȳ = 3.80 | S_y = 0.25 |
The correlation between the number of AP courses and GPA is .
(a) Calculate the equation of the least-squares regression line for predicting GPA from the number of AP courses taken.
(b) Interpret the slope of the regression line in context.
(c) Interpret the y-intercept of the regression line in context. Is this interpretation meaningful?
Solution:
(a) Calculate the equation.
First, find the slope :
Next, find the y-intercept :
The equation of the LSRL is:
(b) Interpret the slope.
Using the template: For each additional AP course a student takes, their predicted GPA increases by approximately 0.177 points.
(c) Interpret the y-intercept.
Using the template: For a student who takes 0 AP courses, their predicted GPA is 3.18.
This interpretation is meaningful. It is plausible for a student to take no AP courses, and a predicted GPA of 3.18 is a reasonable value.
Problem 2:
The least-squares regression line for predicting the fuel efficiency (in miles per gallon, MPG) of a car from its weight (in thousands of pounds) is given by:
(a) A certain car model weighs 3,500 pounds. What is its predicted fuel efficiency?
(b) If the car model from part (a) has an actual fuel efficiency of 22 MPG, calculate and interpret the residual.
Solution:
(a) Make a prediction.
First, convert the weight to thousands of pounds: 3,500 pounds = 3.5 thousand pounds.
Equation:
Substitution:
Answer:. The predicted fuel efficiency is 17.9 MPG.
(b) Calculate and interpret the residual.
The formula for a residual is .
(observed value) = 22 MPG
(predicted value from part a) = 17.9 MPG
Calculation:
Interpretation: This car's actual fuel efficiency is 4.1 MPG higher than predicted by the regression model for a car of its weight. The positive residual means the model underestimated the car's MPG.
Common Mistakes to Avoid
Forgetting "Predicted" or "On Average": When interpreting the slope or making a prediction, you must use language that indicates the LSRL provides an estimate, not a certainty. Saying "the GPA will increase" is incorrect; say "the predicted GPA will increase."
Mixing up Sy and Sx in the Slope Formula: A very common error is to calculate . Always remember the "y's go high"— is in the numerator. The units of the slope (y-units per x-unit) can be a helpful check.
Mindless Interpretation of the Y-Intercept: Always ask yourself, "Does x=0 make sense in this context?" If you are modeling the height of a child based on age, the y-intercept (age=0) is meaningful. If you are modeling house price based on square footage, the y-intercept (0 sq. ft.) is not. Stating a nonsensical interpretation without acknowledging it is a mistake.
Confusing Correlation (r) and Slope (b): While their signs will always be the same, is a unitless measure of strength and direction between -1 and 1. The slope has units (y-units per x-unit) and can be any real number. Do not use them interchangeably.
Using the LSRL for Extrapolation: Do not use the model to make predictions for x-values far outside the range of the original data. The linear relationship may not hold for those values, and the prediction will be unreliable.