PrepGo

Analyzing Departures from Linearity - AP Statistics Study Guide

Written by AP Content Team, Verified for 2026 AP Exams, Last updated: May 2026

Learn with study guides reviewed by top AP teachers. This guide takes about 17 minutes to read.

Quick Summary

This guide will equip you to analyze how individual data points affect a linear regression model. You will learn to identify and distinguish between outliers, high-leverage points, and influential points. Critically, you will be able to predict and describe the specific changes to the slope, y-intercept, and correlation coefficient that occur when an influential point is removed from a dataset.

Key Concepts

When we fit a least-squares regression line (LSRL) to a set of bivariate data, we assume a linear relationship. However, some individual points can depart from the overall pattern and disproportionately affect our model. Understanding these points is crucial for a complete analysis.

There are three key types of unusual points:

1. Outliers

An outlier in a regression setting is a point that does not follow the overall pattern of the data and has a large residual.

  • Residual: The residual is the vertical distance between an observed data point and the predicted value on the regression line (residual = observed y - predicted y, or ).

  • Identification: An outlier is a point that is far above or below the regression line. It represents a large error in prediction for that specific point.

  • Key Idea: Outliers are defined by their large vertical deviation from the line. A point can have an ordinary x-value but an unusual y-value, making it an outlier.

[Image: A scatterplot with a clear linear trend. The LSRL is drawn. One point is far vertically above the line, with a large dashed line showing its residual. This point is labeled "Outlier".]

2. High-Leverage Points

A high-leverage point is a data point with an x-value that is far from the mean of the x-values ().

  • Identification: Look for points that are horizontally distant from the main cluster of data points.

  • Potential for Influence: These points act like a "lever" on the regression line. Because they are at the extreme end of the x-values, they have the potential to pull the line towards them, significantly changing the slope.

  • Important Note: A high-leverage point is not automatically a problem. If it follows the linear pattern of the other points, it can actually strengthen the correlation and confirm the trend.

[Image: A scatterplot with the data clustered on the left. One point is far to the right, horizontally separated from the rest. This point is labeled "High-Leverage Point".]

3. Influential Points

An influential point is a point that, if removed, would significantly change the calculated regression line. The most important statistical measures that change are:

  • The slope () of the LSRL.

  • The y-intercept () of the LSRL.

  • The correlation coefficient ().

  • How to Determine Influence: The best way to determine if a point is influential is to calculate the regression line with and without the point and compare the results. Conceptually, the most influential points are often those that are both high-leverage and outliers.

  • The "Lever" Analogy: Imagine the regression line is a see-saw balanced on the mean of the data (, ). A high-leverage point is like a heavy person sitting far from the center. If that person is also an outlier (i.e., they are sitting far above or below where the see-saw plank should be), they will drastically tilt the see-saw (the line).

Types of Influential Points and Their Effects:

  • Case A: High Leverage, Large Residual (The Classic Influential Point)

    • Description: A point with an unusual x-value that does not follow the linear trend of the other data.

    • Effect of Removal:

      • Slope: Will change dramatically. If the point was pulling the slope down, the new slope will be steeper (increase). If it was pulling the slope up, the new slope will be less steep (decrease).

      • Correlation (r): Will become stronger (the absolute value of r will get closer to 1). The point was disrupting the linear pattern, so removing it makes the remaining points appear more linear.

    • [Image: A scatterplot with a positive trend. A high-leverage point is in the bottom right, well below the pattern. The original LSRL is shown as a solid line, tilted down towards the point. A new LSRL without the point is shown as a dashed line, with a much steeper positive slope.]

  • Case B: High Leverage, Small Residual

    • Description: A point with an unusual x-value that does follow the linear trend of the other data.

    • Effect of Removal:

      • Slope: Will not change much.

      • Correlation (r): Will become weaker (the absolute value of r will get closer to 0). This point was confirming and strengthening the linear trend, so removing it introduces more uncertainty.

      • Conclusion: This point is high-leverage but not influential.

  • Case C: Low Leverage, Large Residual (Just an Outlier)

    • Description: A point with an x-value near the mean of x-values () but with a large residual.

    • Effect of Removal:

      • Slope: Will not change much. It's too close to the "fulcrum" of the line to have much tilting power.

      • Y-Intercept: May change noticeably. The line will shift up or down to better fit the remaining points.

      • Correlation (r): Will become stronger (closer to 1 or -1).

      • Conclusion: This point is an outlier but generally not considered influential with respect to the slope.

Key Vocabulary

  • Outlier (in regression): A data point that has a large residual, meaning its observed y-value is far from the y-value predicted by the regression line.

  • Residual: The directed vertical distance between an observed data point and the least-squares regression line; calculated as .

  • High-Leverage Point: A data point with an x-value that is substantially far from the mean of the x-values. It has the potential to exert strong influence on the slope of the regression line.

  • Influential Point: A point whose removal would cause a significant change in the slope, y-intercept, and/or correlation coefficient of the least-squares regression line.

  • Least-Squares Regression Line (LSRL): The unique line that minimizes the sum of the squared vertical distances (residuals) from the data points to the line.

Calculator Tech (TI-84)

While identifying these points is often a conceptual task based on a scatterplot, you can use the calculator to quantify the influence of a point by comparing the regression model with and without it.

Scenario: You are given a dataset and suspect the point (x*, y*) is influential.

Step 1: Calculate the Original LSRL

  1. Press STAT -> 1:Edit....

  2. Enter all your x-values into list L1 and all your y-values into list L2.

  3. Press STAT -> CALC -> 8:LinReg(a+bx).

  4. Set Xlist: L1, `Ylist: L2$. Leave blank.

  5. Select and press ENTER.

  6. Record the original slope (b), y-intercept (a), and correlation coefficient (r).

Step 2: Remove the Suspected Influential Point

  1. Press STAT -> 1:Edit....

  2. Use the arrow keys to navigate to the row containing the point (x*, y*) you want to remove.

  3. Highlight the x-value in L1 and press DEL.

  4. Move to the corresponding y-value in L2 and press DEL. Be careful to delete both parts of the data pair.

Step 3: Calculate the New LSRL

  1. Repeat the process from Step 1: STAT -> CALC -> 8:LinReg(a+bx).

  2. The settings (L1, L2) should still be correct.

  3. Select and press ENTER.

  4. Record the new slope (b), y-intercept (a), and correlation coefficient (r).

Step 4: Compare and Conclude

  • Compare the original and new values. If the slope and/or correlation coefficient changed significantly, you have quantitative proof that the removed point was influential.

How to Show Work on the FRQ

Questions about influential points are descriptive and require clear communication. Your answer must identify the point and then justify its classification and the effect of its removal.

Use this two-part template for a complete response:

Part 1: Identify and Classify the Point

  • Sentence 1 (Identify): "The point at approximately (x, y) is an unusual point."

  • Sentence 2 (Classify & Justify): "This point is a high-leverage point because its x-value is much larger/smaller than the other x-values. It is also an outlier because it has a large positive/negative residual, falling far above/below the general pattern of the data."

  • Sentence 3 (Conclude Influence): "Because it is a high-leverage outlier, this point is influential."

Part 2: Describe the Effect of Removal

  • Sentence 1 (Effect on Slope): "If this point were removed, the slope of the regression line would [increase/decrease]. The point is currently pulling the line [up/down] on the right side, so its removal would allow the line to [tilt up/tilt down] to better fit the remaining points."

  • Sentence 2 (Effect on Correlation): "The correlation coefficient (r) would become stronger (i.e., its absolute value would increase, moving closer to 1 or -1). This is because the point deviates from the linear pattern, and its removal would make the remaining points appear more tightly clustered in a linear form."

  • Sentence 3 (Effect on Y-Intercept - Optional but good): "The y-intercept would likely [increase/decrease] as the line shifts to better fit the rest of the data."

Practice Problems

Problem 1:

A biologist is studying the relationship between the body length (in cm) and weight (in grams) of a species of snake. A scatterplot of the data is shown below, along with the least-squares regression line. One snake, labeled Point A, was much older and larger than the others.

[Image: A scatterplot showing a strong, positive, linear association between snake length and weight. Most points are clustered between x=30 and x=60. Point A is at approximately (90, 250), far to the right of the other points but well below the pattern established by the other points. The LSRL is shown being "pulled down" towards Point A.]

(a) Describe Point A in terms of leverage and residual. Is it an influential point? Justify your answer.

(b) Describe how the removal of Point A would affect the slope and the correlation coefficient of the regression line.

Solution:

(a)

Identification and Classification: The point A at approximately (90, 250) is an unusual point. It is a high-leverage point because its x-value (length = 90 cm) is much larger than the x-values of the other snakes. It is also an outlier because it falls far below the linear pattern of the rest of the data, meaning it has a large negative residual. Because it is a high-leverage outlier, Point A is an influential point.

(b)

Effect on Slope: If Point A were removed, the slope of the regression line would increase. The point is currently pulling the right side of the line down, weakening the positive association. Its removal would allow the line to tilt upwards to better fit the strong, positive trend of the remaining data.

Effect on Correlation: The correlation coefficient (r) would become stronger (it would increase and get closer to +1). Point A deviates from the strong linear pattern, so removing it would make the remaining points appear more tightly clustered along a straight line, strengthening the measured association.


Problem 2:

An admissions officer at a university is analyzing the relationship between students' high school GPA and their first-year college GPA. The data for 10 students is provided.

HS GPA (x)College GPA (y)
3.83.5
3.23.0
3.93.9
2.82.5
3.02.8
4.03.8
3.53.2
3.73.6
2.52.2
2.23.7

The admissions officer suspects the last student (HS GPA = 2.2, College GPA = 3.7) is an unusual data point. Use your calculator to investigate whether this point is influential. Compare the slope and correlation with and without this point.

Solution:

Step 1: Calculate the LSRL with all 10 points.

  • Enter all 10 HS GPAs into L1 and College GPAs into L2.

  • Running gives:

Step 2: Remove the suspected influential point (2.2, 3.7) and recalculate.

  • Delete 2.2 from L1 and 3.7 from L2.

  • Re-running on the remaining 9 points gives:

Step 3: Compare and Conclude.

The point (2.2, 3.7) is highly influential. Its removal caused a dramatic change in the regression model.

  • Effect on Slope: The slope increased significantly, from 0.46 to 0.98. The original point, with a low x-value but a very high y-value, was pulling the left side of the line up, flattening the slope. Removing it allowed the slope to become much steeper, better reflecting the strong positive trend in the other nine students.

  • Effect on Correlation: The correlation coefficient became much stronger, increasing from a weak positive to a very strong positive . This confirms the point was disrupting the underlying linear relationship.

Common Mistakes to Avoid

  • Confusing Outlier, Leverage, and Influence: These terms are not interchangeable. An outlier has a large residual (y-direction error). A high-leverage point has an extreme x-value. An influential point causes a large change in the model if removed. A point can be one, two, or all three of these.

  • Assuming All Outliers Are Influential: A point can have a very large residual but be located near the mean of the x-values. As described in Key Concepts (Case C), such a point will have little effect on the slope and is therefore not considered influential in that regard.

  • Assuming All High-Leverage Points Are Bad: A high-leverage point that falls in line with the existing trend is not influential and is actually beneficial. It confirms the linear model over a wider range of x-values and strengthens the correlation.

  • Using Vague Language: Avoid saying an influential point "changes the line" or "affects the correlation." You must be specific. Does the slope increase or decrease? Does the correlation become stronger (closer to 1 or -1) or weaker (closer to 0)? Use directional and comparative words.