PrepGo

AP Statistics Practice Quiz: Analyzing Departures from Linearity

Written by AP Content Team, Verified for 2026 AP Exams, Last updated: May 2026

Test your understanding with short quizzes. This quiz has 16 questions to check your progress.

Question 1 of 16

In the context of regression analysis, how is an outlier primarily defined?

All Questions (16)

In the context of regression analysis, how is an outlier primarily defined?

A) A point that, if removed, substantially changes the regression model.

B) A point with a substantially larger or smaller x-value than other observations.

C) A point that does not follow the general trend and has a large residual.

D) A point that is created by transforming a variable.

Correct Answer: C

Based on the provided content, 'An outlier in regression does not follow the general trend and has a large residual.'

A data point in a scatterplot has an x-value that is substantially larger than all other x-values in the dataset. What is the correct term for this point?

A) An outlier

B) A high-leverage point

C) An influential point

D) A residual

Correct Answer: B

The provided content states, 'A high-leverage point has a substantially larger or smaller x-value than other observations.'

A researcher calculates a least-squares regression line. After removing a single point and recalculating, the slope of the line changes dramatically. This point is best described as:

A) A transformed point

B) A high-leverage point

C) An outlier

D) An influential point

Correct Answer: D

According to the content, 'An influential point, if removed, substantially changes the regression model.' A dramatic change in the slope is a substantial change.

What is the primary reason for transforming variables when analyzing a bivariate data set?

A) To increase the number of outliers.

B) To create a data set that is more linear.

C) To decrease the r-squared value to a more realistic level.

D) To ensure all residuals are exactly zero.

Correct Answer: B

The content explicitly states, 'Transforming variables can create a data set that is more linear.'

After applying a transformation to a data set, a statistician observes that the r-squared value increased from 0.65 to 0.90 and the corresponding residual plot shows a random scatter of points. What is the most appropriate conclusion?

A) The original model was better because it was simpler.

B) The transformation was unsuccessful because it created influential points.

C) The transformed model is more appropriate for the data.

D) The transformation should be reversed to analyze the residuals.

Correct Answer: C

The content indicates that 'Increased randomness in residual plots and an r-squared value closer to 1 after transformation suggest a more appropriate model.' Both of these conditions were met.

A regression analysis on a transformed data set produced the least-squares regression line log(ŷ) = 2.1 + 0.5x, where log is the base-10 logarithm. What is the predicted response, ŷ, when x = 4?

A) 4.1

B) 125.89

C) 12589.25

D) 4.0

Correct Answer: C

First, calculate the predicted transformed value: log(ŷ) = 2.1 + 0.5(4) = 2.1 + 2.0 = 4.1. To find ŷ, you must reverse the log transformation: ŷ = 10^4.1 ≈ 12589.25. This tests the ability to calculate a predicted response from a transformed model.

Which of the following best describes a high-leverage point that is NOT an outlier?

A) A point with an extreme x-value that follows the general trend of the data.

B) A point with an average x-value that has a very large residual.

C) A point that, when removed, changes the y-intercept but not the slope.

D) A point with an extreme x-value that has the largest residual in the data set.

Correct Answer: A

A high-leverage point has an extreme x-value. For it to not be an outlier, it must follow the general trend, which means it will have a small residual.

A statistician is comparing two models. The original linear model has an r-squared of 0.92, but its residual plot shows a clear curved pattern. A second model using a transformed y-variable has an r-squared of 0.88 and a residual plot with no pattern. Which model should be preferred and why?

A) The original model, because its r-squared value is higher.

B) The transformed model, because the random residual plot indicates it is more appropriate.

C) Neither model, because the r-squared values are contradictory.

D) The original model, because a curved pattern in residuals is acceptable if r-squared is high.

Correct Answer: B

The content states that 'Increased randomness in residual plots...suggest a more appropriate model.' A clear pattern in the residuals, like a curve, indicates that the linear model is not appropriate, regardless of a high r-squared value.

The relationship between the age of a machine (x, in years) and its value (y, in dollars) is modeled by the equation √ŷ = 150 - 12x. According to this model, what is the predicted value of a machine that is 5 years old?

A) $90

B) $30

C) $8,100

D) $1,800

Correct Answer: C

First, substitute x = 5 into the transformed equation: √ŷ = 150 - 12(5) = 150 - 60 = 90. To find the predicted response ŷ, you must square both sides: ŷ = (90)^2 = 8,100.

Which statement provides the defining characteristic of an influential point in regression?

A) It has the largest possible x-value in the data set.

B) It has a residual that is greater than two standard deviations from zero.

C) Its exclusion from the analysis leads to a substantial change in the regression model.

D) It is always located at the mean of the x and y values.

Correct Answer: C

This is a direct application of the definition provided in the content: 'An influential point, if removed, substantially changes the regression model.'

A data point is identified as an outlier because it does not follow the general trend. Under what condition is this outlier also considered a high-leverage point?

A) If its residual is positive.

B) If its x-value is substantially larger or smaller than the other x-values.

C) If removing it increases the r-squared value.

D) If it is also an influential point.

Correct Answer: B

The classifications are based on different criteria. An outlier is defined by its large residual, while a high-leverage point is defined by its extreme x-value. A point can be one, both, or neither.

A point with high leverage must have which of the following?

A) A large residual.

B) A significant influence on the regression line's slope.

C) An x-value far from the mean of the x-values.

D) A y-value that does not match the model's prediction.

Correct Answer: C

The definition of a high-leverage point is based solely on its x-position relative to the other data points. The provided content states it has a 'substantially larger or smaller x-value than other observations.'

Which of the following outcomes provides the strongest evidence that a data transformation has resulted in a more appropriate regression model?

A) The r-squared value decreases and the residual plot shows a clear pattern.

B) The slope of the regression line becomes positive.

C) The r-squared value increases and the residual plot becomes more random.

D) The number of high-leverage points is reduced to zero.

Correct Answer: C

The content lists two key indicators of a successful transformation: 'Increased randomness in residual plots and an r-squared value closer to 1 after transformation suggest a more appropriate model.'

A least-squares regression line for a transformed data set is given by ŷ^2 = 20 + 4x. What is the predicted response ŷ for an observation where x = 11?

A) 64

B) 35

C) 4096

D) 8

Correct Answer: D

First, calculate the value of ŷ^2 by substituting x = 11: ŷ^2 = 20 + 4(11) = 20 + 44 = 64. To find the predicted response ŷ, take the square root of both sides: ŷ = √64 = 8.

In a scatterplot of student test scores versus hours studied, one data point represents a student who studied for a very long time (high x-value) but received a very low score (low y-value). This point has a large negative residual. How would this point be classified?

A) As a high-leverage point only.

B) As an outlier only.

C) As both an outlier and a high-leverage point.

D) As neither an outlier nor a high-leverage point.

Correct Answer: C

It is a high-leverage point because its x-value (hours studied) is substantially larger than others. It is an outlier because it has a large residual and does not follow the general trend of more studying leading to higher scores.

When a residual plot for a linear regression shows a clear pattern, indicating the linear model is not a good fit, what is a common first step to improve the model?

A) Remove all points with negative residuals.

B) Transform one or both of the variables to achieve linearity.

C) Assume the correlation is zero and stop the analysis.

D) Add more data points until the pattern disappears.

Correct Answer: B

The provided content suggests that when data is not linear, 'Transforming variables can create a data set that is more linear.' This is a standard procedure for dealing with non-linear relationships.