PrepGo

Linear Regression Models - AP Statistics Study Guide

Written by AP Content Team, Verified for 2026 AP Exams, Last updated: May 2026

Learn with study guides reviewed by top AP teachers. This guide takes about 14 minutes to read.

Quick Summary

This guide covers the fundamentals of a linear regression model, which describes the relationship between two quantitative variables. You will learn how to interpret the slope and y-intercept of a least-squares regression line in the context of a problem, use the model to make predictions about a response variable, and understand the critical limitations of these predictions, particularly the danger of extrapolation.

Key Concepts

A linear regression model is a powerful tool for describing the relationship between a quantitative explanatory variable (x) and a quantitative response variable (y). The goal is to create a mathematical model that can predict the value of y based on the value of x.

The Least-Squares Regression Line (LSRL)

The most common method for creating this model is the least-squares regression line (LSRL). This is the unique line that minimizes the sum of the squared vertical distances from each data point to the line.

  • Formula: The equation of the LSRL is always written in the form:

    ŷ = a + bx

    • ŷ (read "y-hat") is the predicted value of the response variable for a given value of x. It's crucial to distinguish this from y, which is the actual, observed value.

    • a is the y-intercept, the predicted value of y when x = 0.

    • b is the slope, which represents the predicted change in y for each one-unit increase in x.

    • x is the value of the explanatory variable.

[Image: A scatterplot with numerous data points. A straight line (the LSRL) is drawn through the data, positioned to be as close as possible to all points collectively.]

Interpreting the Slope and Y-Intercept

Interpretation is a critical skill for the AP exam. You must always interpret these values in context.

  • Interpreting the Slope (b):

    • The slope describes the predicted rate of change.

    • Template: "For each 1 [unit] increase in [name of x-variable], our model predicts an average [increase/decrease] of [b] [units of y-variable]."

    • Example: If a model relating weight (x, in pounds) to height (y, in inches) has a slope of 0.2, the interpretation is: "For each 1 pound increase in weight, our model predicts an average increase of 0.2 inches in height."

  • Interpreting the Y-Intercept (a):

    • The y-intercept is the starting point of the model, the predicted value when the explanatory variable is zero.

    • Template: "When the [name of x-variable] is 0 [units], our model predicts the [name of y-variable] to be [a] [units of y-variable]."

    • Example: If a model relating hours of study (x) to exam score (y) has a y-intercept of 65, the interpretation is: "When a student studies for 0 hours, our model predicts their exam score to be 65."

    • Warning: The y-intercept is often not meaningful. If x=0 is not a plausible value or falls far outside the range of x-values in the data, the y-intercept serves only to adjust the height of the line and has no practical interpretation. Always ask yourself: "Does x=0 make sense in this context?"

Making Predictions

The primary use of the LSRL is to make predictions. To predict the value of y for a given x, simply substitute the x-value into the regression equation.

  • Example: Using the equation , to predict the score for a student who studies 3.5 hours:

    • ŷ = 65 + 5.2(3.5)

    • ŷ = 65 + 18.2

    • ŷ = 83.2

    • The predicted score is 83.2.

Extrapolation

Extrapolation is the act of using the LSRL to make predictions for x-values that are far outside the range of the original data.

  • Why it's dangerous: A linear trend observed in the data may not continue indefinitely. For example, a model showing a child's height increasing over time is not useful for predicting their height at age 40. The linear relationship only holds for a certain range.

  • Predictions made via extrapolation are unreliable and should not be trusted. On the AP exam, you should always point out when a prediction involves extrapolation.

[Image: A scatterplot showing data points for x-values between 10 and 50. The LSRL is drawn through them but extends far to the right, to an x-value of 100. The region beyond x=50 is shaded and labeled 'Extrapolation Zone: Unreliable Predictions'.]

Residuals

A residual is the error of a prediction. It is the difference between the actual, observed y-value and the predicted y-value (ŷ) for a given x.

  • Formula:Residual = actual y - predicted y or e = y - ŷ

  • Interpretation:

    • Positive residual (y > ŷ): The model underpredicted the actual value. The data point is above the LSRL.

    • Negative residual (y < ŷ): The model overpredicted the actual value. The data point is below the LSRL.

    • Residual of 0 (y = ŷ): The model's prediction was perfect. The data point is on the LSRL.

  • The LSRL is the line that makes the sum of the squared residuals as small as possible.

[Image: A scatterplot with the LSRL. A single data point (x, y) is highlighted. A vertical line segment connects this point to the LSRL at (x, ŷ). The segment is labeled 'Residual = y - ŷ'.]

Key Vocabulary

  • Least-Squares Regression Line (LSRL): The line that best fits a set of bivariate data by minimizing the sum of the squared residuals.

  • Slope (b): In a regression context, the predicted amount by which the response variable (y) changes for every one-unit increase in the explanatory variable (x).

  • Y-intercept (a): The predicted value of the response variable (y) when the explanatory variable (x) is equal to zero.

  • Predicted Value (ŷ): The value of the response variable that would be predicted for a given x-value based on the LSRL.

  • Residual: The difference between an observed y-value and its corresponding predicted y-value (y - ŷ). It measures the error of the model's prediction for a single point.

  • Extrapolation: Using a regression model to make a prediction for an x-value that lies far outside the range of the x-values in the original data set.

Calculator Tech (TI-84)

To find the equation of the least-squares regression line, follow these steps.

Step 1: Enter Your Data

  1. Press STAT -> 1:Edit....

  2. Enter your explanatory (x) values into list L1.

  3. Enter your corresponding response (y) values into list L2. Ensure the lists are of equal length.

Step 2: Turn On Diagnostic Statistics (One-time setup)

This is crucial for seeing the correlation coefficient (r) and coefficient of determination (r^2), which are covered in later topics but are output by this same function.

  1. Press 2nd -> [CATALOG].

  2. Scroll down to .

  3. Press ENTER, then ENTER again. It should say "Done."

Step 3: Calculate the Linear Regression Model

  1. Press STAT -> CALC menu.

  2. Select 8:LinReg(a+bx). (Note: Option 4 works too, but AP Statistics prefers the $a+bx` form). 3. On the screen that appears, ensure your inputs are: * **Xlist:** `L1` (or whichever list has your x-data) * **Ylist:** `L2` (or whichever list has your y-data) * **FreqList:** Leave this blank. * **Store RegEQ:** This is optional but highly recommended. To store the equation in Y1 for graphing and easy predictions, press `VARS` -> `Y-VARS` -> `1:Function...` -> `1:Y1`. 4. Select $Calculate and press ENTER`.

Step 4: Read the Output

The calculator will display:

You can now write the full LSRL equation using the values for and .

How to Show Work on the FRQ

For questions involving linear regression models, clarity, context, and precise language are key to earning full credit. Use these templates.

Template for Interpreting the Slope (b)

  1. Identify Variables: State the explanatory (x) and response (y) variables with their units.

  2. Use the Template: "For each additional [1 unit of x-variable], the predicted [y-variable] is expected to [increase/decrease] by approximately [value of b, with units]."

  3. Example: For an LSRL , where price is in dollars and age is in years:

    • "For each additional year of age for the car, the predicted price is expected to decrease by approximately 2500." ### Template for Interpreting the Y-Intercept (a) 1. **Identify Variables:** State the explanatory (x) and response (y) variables with their units. 2. **Use the Template:** "When the [x-variable] is 0 [units], the predicted [y-variable] is [value of a, with units]." 3. **Check for Meaning:** Add a sentence evaluating if this interpretation is practically meaningful. * **Example (Meaningful):** For an LSRL $score = 68 + 4(hours_studied): "When a student studies for 0 hours, the predicted exam score is 68." This is a meaningful interpretation.

    • Example (Not Meaningful): For an LSRL where height is in feet and weight is in pounds: "When a person's height is 0 feet, their predicted weight is -5.2 pounds." This interpretation is nonsensical and should be identified as such.

Template for Calculating and Interpreting a Residual

  1. State the Model: Write down the LSRL equation: ŷ = a + bx.

  2. Calculate Predicted Value (ŷ): Plug the given x-value into the equation and calculate ŷ. Show your work.

  3. Calculate Residual (e): Use the formula . Show your work.

  4. Interpret the Residual: "The actual [y-variable] for an [x-variable] of [x-value] was [value of residual, with units] [higher/lower] than the value predicted by the linear model."

Practice Problems

Problem 1:

A real estate agent studies the relationship between the size of a house (in square feet) and its selling price (in thousands of dollars). From a sample of 15 recent sales, the agent calculates the least-squares regression line to be:

(a) Interpret the slope of the regression line in the context of the problem.

(b) Interpret the y-intercept of the regression line. Is this interpretation meaningful?

(c) Predict the selling price for a house that is 2,200 square feet.

Solution:

(a) Slope Interpretation:

  • Variables: The explanatory variable (x) is the size in square feet. The response variable (y) is the selling price in thousands of dollars.

  • Interpretation: For each additional square foot of size, the predicted selling price of a house is expected to increase by approximately 0.12 thousands of dollars, or 120. **(b) Y-Intercept Interpretation:** * **Interpretation:** When the size of a house is 0 square feet, the predicted selling price is 45.5 thousand dollars, or $45,500. * **Meaningfulness:** This interpretation is not meaningful. A house cannot have a size of 0 square feet, so this value of x is a significant extrapolation and has no practical meaning. It only serves to position the regression line correctly within the range of the data. **(c) Prediction:** * **Model:** $predicted price = 45.5 + 0.12(size)

  • Calculation:

    • ŷ = 45.5 + 0.12(2200)

    • ŷ = 45.5 + 264

    • ŷ = 309.5

  • Conclusion: The predicted selling price for a 2,200 square foot house is 309,500. --- **Problem 2:** A coffee shop owner wants to predict daily ice cream sales (in dollars) based on the high temperature for the day (in degrees Fahrenheit). The LSRL is found to be $predicted sales = -150 + 7.5(temperature). On a day when the high temperature was 80°F, the actual ice cream sales were $465.

(a) Calculate the predicted sales for a day with a high temperature of 80°F.

(b) Calculate and interpret the residual for the day when the temperature was 80°F.

Solution:

(a) Predicted Sales:

  • Model:predicted sales = -150 + 7.5(temperature)

  • Calculation:

    • ŷ = -150 + 7.5(80)

    • ŷ = -150 + 600

    • ŷ = 450

  • Conclusion: The predicted sales for a day with a high of 80°F is $450.

(b) Residual Calculation and Interpretation:

  • Predicted Value (ŷ): From part (a), ŷ = $450.

  • Actual Value (y): Given in the problem, y = $465.

  • Calculate Residual:

    • Residual = actual y - predicted ŷ

    • Residual = 465 - 450 = 15

  • Interpretation: The actual ice cream sales on the 80°F day were $15 higher than the amount predicted by the linear model. This means the model underpredicted the sales for that day.

Common Mistakes to Avoid

  • Using "Will" Instead of "Predicted": Never use deterministic language. The model provides a prediction or an estimate, not a certainty. Always say "the predicted y is..." or "the model predicts..." instead of "y will be...".

  • Interpreting the Y-Intercept When It's Meaningless: Always pause to consider if x=0 is a plausible value within the scope of your data. If it's not (e.g., a car with a weight of 0 pounds), state that the y-intercept has no practical interpretation.

  • Forgetting Context and Units: Every interpretation and prediction must be in the context of the problem. This means using the names of the variables (e.g., "price in dollars," "size in square feet") and their corresponding units.

  • Extrapolating Without Caution: If you are asked to make a prediction for an x-value far outside the range of the original data, you must make the prediction but also state that it is an extrapolation and is therefore unreliable.

  • Confusing Slope and Correlation: The slope (b) has units (y-units per x-unit) and describes the steepness of the line. The correlation (r) has no units and describes the strength and direction of the linear association. They are related, but not interchangeable.