PrepGo

Correlation - AP Statistics Study Guide

Written by AP Content Team, Verified for 2026 AP Exams, Last updated: May 2026

Learn with study guides reviewed by top AP teachers. This guide takes about 19 minutes to read.

Quick Summary

This guide will equip you to master the concept of correlation. You will learn to calculate the correlation coefficient, , and interpret its value to describe the strength and direction of a linear relationship between two quantitative variables. By the end of this lesson, you will be able to confidently analyze scatterplots, understand the properties and limitations of correlation, and communicate your findings using precise statistical language required for the AP exam.

Key Concepts

The correlation coefficient, denoted by , is a numerical value that measures the strength and direction of the linear relationship between two quantitative variables.

Properties of the Correlation Coefficient (r)

  • Range: The value of is always between -1 and 1, inclusive.

    • -1 \le r \le 1
  • Direction: The sign of indicates the direction of the association.

    • If , the association is positive. As the explanatory variable (x) increases, the response variable (y) tends to increase. The points on a scatterplot will trend upwards from left to right.

    • If , the association is negative. As the explanatory variable (x) increases, the response variable (y) tends to decrease. The points on a scatterplot will trend downwards from left to right.

  • Strength: The absolute value of indicates the strength of the linear relationship. The closer is to 1, the stronger the linear relationship. The closer is to 0, the weaker the linear relationship.

    • or indicates a perfect linear relationship. All data points fall exactly on a straight line.

    • indicates no linear relationship.

    • General Guidelines for Strength (use with caution, context is key):

      • is typically considered strong.

      • is typically considered moderate.

      • is typically considered weak.

    [Image: Three scatterplots side-by-side. The first shows a strong positive correlation (r \approx 0.9). The second shows a moderate negative correlation (r \approx -0.6). The third shows a weak or no correlation (r \approx 0.1).]

Critical Limitations and Characteristics

  • Correlation only measures LINEAR relationships. A strong, curved relationship may have a correlation close to 0. You must always look at the scatterplot first to check for linearity before relying on .

    [Image: A scatterplot showing a perfect parabolic curve (U-shape). The caption notes that even though there is a strong relationship, r = 0.]

  • Correlation is non-resistant. The value of can be heavily influenced by outliers or influential points. A single point can dramatically change the value and even the sign of .

  • Correlation has no units. It is a standardized measure.

  • Correlation is symmetrical. The correlation between variables X and Y is the same as the correlation between variables Y and X. Switching the explanatory and response variables does not change .

  • Correlation does not imply causation. This is the most important mantra in statistics. A strong correlation between two variables does not mean that one variable causes the change in the other. There may be a lurking variable that is influencing both. For example, ice cream sales and drowning deaths are strongly positively correlated, but ice cream does not cause drowning. The lurking variable is hot weather, which causes increases in both.

Formula (For Conceptual Understanding)

The formula for the correlation coefficient is:

  • Conceptual Breakdown: This formula calculates the sum of the products of the standardized scores (z-scores) for each pair of (x, y) data points, and then averages it (by dividing by n-1). It essentially measures how consistently the x and y values vary together from their respective means. You will not be expected to calculate this by hand on the AP exam, but you should understand that it is built from standardized values.

Key Vocabulary

  • Correlation Coefficient (r): A number between -1 and 1 that measures the strength and direction of the linear relationship between two quantitative variables.

  • Strength: A description of how closely the data points in a scatterplot adhere to a linear pattern. Described with words like strong, moderate, or weak.

  • Direction: A description of the type of linear association, either positive (as x increases, y tends to increase) or negative (as x increases, y tends to decrease).

  • Linear Relationship: An association between two variables that can be reasonably modeled by a straight line.

  • Outlier: A data point that falls far from the overall pattern of the rest of the data. In a scatterplot, it can be far from the main cluster of points.

  • Lurking Variable: A variable that is not among the explanatory or response variables in a study but may influence the relationship between them.

Calculator Tech (TI-84)

To calculate the correlation coefficient , you first need to ensure your calculator's diagnostics are turned on. You only need to do this once.

One-Time Setup: Turn On Diagnostics

  1. Press 2nd -> [CATALOG].

  2. Scroll down to .

  3. Press ENTER.

  4. Press ENTER again. The calculator will display "Done."

Calculating from Data

Let's say you have your explanatory variable data in list L1 and your response variable data in list L2.

  1. Enter Data:

    • Press STAT -> 1:Edit...

    • Enter your x-values into L1.

    • Enter your corresponding y-values into L2.

  2. Calculate Linear Regression Statistics:

    • Press STAT.

    • Arrow over to the CALC menu.

    • Select 8:LinReg(a+bx). (Note: `4:LinReg(ax+b)works identically for finding $r).

    • On the input screen:

      • Xlist: L1 (or whichever list has your x-data)

      • Ylist: L2 (or whichever list has your y-data)

      • FreqList: should be left blank.

      • Store RegEQ: can be left blank.

    • Arrow down to and press ENTER.

  3. Read the Output:

    • The calculator will display the values for and (for the regression line), and importantly, the values for and . You are looking for the value of .

How to Show Work on the FRQ

For questions asking you to interpret the correlation coefficient, you must provide a complete description in context. Use the following three-part template to ensure you earn full credit.

FRQ Interpretation Template:

"There is a [1. Strength], [2. Direction], linear relationship between [3. Explanatory Variable in Context] and [4. Response Variable in Context]."

Breakdown of the Template:

  1. Strength: Use a descriptive word based on the value of .

    • If is close to 1 or -1 (e.g., or ), use "strong" or "moderately strong."

    • If is in the middle range (e.g., or ), use "moderate."

    • If is close to 0 (e.g., or ), use "weak" or "moderately weak."

  2. Direction: Use "positive" if or "negative" if .

  3. Explanatory Variable: State the full name of the x-variable, including units if provided.

  4. Response Variable: State the full name of the y-variable, including units if provided.

Crucial Note: Always include the word "linear" in your description. Omitting it can result in a loss of credit because is only a valid measure for linear relationships.

Practice Problems

Problem 1:

A guidance counselor is investigating the relationship between the number of AP courses a student takes and their final GPA. For a sample of 15 students, the correlation coefficient was calculated to be . A scatterplot of the data showed a roughly linear pattern. Interpret the correlation coefficient in the context of this study.

Solution:

Using the FRQ interpretation template:

"There is a strong, positive, linear relationship between the number of AP courses a student takes and their final GPA."

  • Strength: "strong" (because r = 0.89 is very close to 1)

  • Direction: "positive" (because r is positive)

  • Context: The variables are "number of AP courses" and "final GPA".

  • The word "linear" is included.

Problem 2:

An ecologist collects data on the trunk diameter (in cm) and height (in meters) of a sample of 8 young trees of the same species. The data are shown below.

Diameter (cm)5.25.96.57.17.88.49.09.7
Height (m)10.111.512.013.213.414.815.116.3

(a) Calculate the correlation coefficient .

(b) Interpret the value of in context.

Solution:

(a) Calculate the correlation coefficient .

  1. Turn on Diagnostics (if not already on):2nd -> [CATALOG] -> -> ENTER -> ENTER.

  2. Enter Data: Press STAT -> 1:Edit.... Enter the Diameter data into L1 and the Height data into L2.

  3. Calculate: Press STAT -> CALC -> 8:LinReg(a+bx). Ensure Xlist: L1 and `Ylist: L2$. Select .

  4. Result: The calculator output shows .

(b) Interpret the value of in context.

Using the FRQ interpretation template with :

"There is a strong, positive, linear relationship between a young tree's trunk diameter (in cm) and its height (in meters)."

  • Strength: "strong" (r = 0.988 is extremely close to 1)

  • Direction: "positive" (r is positive)

  • Context: The variables are "trunk diameter (in cm)" and "height (in meters)".

  • The word "linear" is included.

Common Mistakes to Avoid

  • Correlation \neq Causation: This is the single most important error to avoid. Never state that a change in the explanatory variable causes a change in the response variable based only on a strong correlation. Always be on the lookout for potential lurking variables. For example, writing "A larger trunk diameter causes a tree to be taller" is an incorrect interpretation.

  • Forgetting the Word "Linear": The correlation coefficient only measures the strength of a linear relationship. When interpreting , you must include the word "linear" in your description to be precise and earn full credit. Saying "a strong, positive relationship" is incomplete.

  • Applying Correlation to Categorical Data: The correlation coefficient can only be calculated for two quantitative variables. It is meaningless to try to calculate the correlation between eye color and gender, for example.

  • Confusing Correlation with Slope: While and the slope of the regression line share the same sign (both positive or both negative), they are different measures. The slope has units (e.g., meters of height per cm of diameter) and describes the rate of change. Correlation has no units and describes the strength and direction of the linear association.

  • Ignoring the Scatterplot: Never interpret without first looking at the associated scatterplot. A strong, curved (non-linear) relationship can have an value close to 0. If you see a clear curve, is not an appropriate measure to describe the relationship, even if your calculator gives you a value.