Quick Summary
This guide will equip you to visualize and describe the relationship between two quantitative variables. You will learn how to construct a scatterplot, the primary graphical tool for this task, and how to provide a complete and precise description of the relationship by analyzing its direction, form, strength, and any unusual features. Mastering this skill is foundational for understanding correlation and regression in later units.
Key Concepts
When we collect data on two different quantitative variables for the same set of individuals, we are working with bivariate data. Our primary goal is to determine if there is a relationship, or association, between these two variables.
1. Scatterplots: Visualizing the Relationship
A scatterplot is a graph that shows the relationship between two quantitative variables measured on the same individuals. Each individual in the data appears as a single point in the plot.
Construction:
The explanatory variable (also called the independent variable) is plotted on the horizontal axis (x-axis). This is the variable we believe might influence or explain changes in the other variable.
The response variable (also called the dependent variable) is plotted on the vertical axis (y-axis). This is the variable that measures the outcome of interest.
Each point on the plot represents one individual's data for the two variables.
[Image: A basic scatterplot with labeled axes. The x-axis is labeled "Explanatory Variable (in units)" and the y-axis is labeled "Response Variable (in units)". Several points are plotted in the first quadrant.]
2. Describing a Scatterplot: The DUFS Framework
To get full credit on the AP exam, your description of a scatterplot must always be in context and address four key characteristics. A helpful acronym is DUFS: Direction, Unusual Features, Form, and Strength.
Direction
This describes the overall trend of the data as you read the graph from left to right.
Positive Association: As the explanatory variable (x) increases, the response variable (y) tends to increase. The points on the scatterplot will generally trend upwards from left to right.
- Example: The relationship between hours spent studying and exam scores.
Negative Association: As the explanatory variable (x) increases, the response variable (y) tends to decrease. The points on the scatterplot will generally trend downwards from left to right.
- Example: The relationship between a car's age and its resale value.
No Association: There is no clear overall trend. The points appear randomly scattered with no discernible up or down pattern.
Unusual Features
Look for points that deviate from the overall pattern.
Outliers: These are individual points that fall far from the overall pattern of the relationship. They may have an unusual x-value, an unusual y-value, or an unusual combination of both.
Clusters or Gaps: Sometimes the data may form distinct groups (clusters) or have large empty spaces (gaps) that are worth noting.
Form
This describes the general shape of the relationship.
Linear: The points appear to follow a roughly straight-line pattern. This is the most common form we study in AP Statistics.
Non-linear (or Curved): The points appear to follow a consistent curved pattern, such as a parabola or an exponential curve.
- Example: The relationship between driving speed and fuel efficiency might be curved.
Strength
This describes how closely the points follow the identified form.
Strong: The points are very tightly clustered around the form (e.g., a line or a curve). There is very little scatter.
Moderate: The points are more spread out, but the overall form is still clear.
Weak: The points are very spread out, and the form is barely visible.
[Image: A 2x2 grid of four scatterplots. Top-left: Strong, positive, linear. Top-right: Moderate, negative, linear. Bottom-left: Weak, positive, linear. Bottom-right: Strong, non-linear (curved).]
Key Vocabulary
Scatterplot: A graph that displays the relationship between two quantitative variables by plotting ordered pairs on a coordinate plane.
Explanatory Variable (x-variable): The variable that is thought to predict, explain, or influence the response variable. It is always plotted on the horizontal axis.
Response Variable (y-variable): The variable that measures the outcome of a study. It is always plotted on the vertical axis.
Association: The relationship or connection between two variables. We describe its direction, form, and strength.
Positive Association: A relationship where, as the explanatory variable increases, the response variable tends to increase.
Negative Association: A relationship where, as the explanatory variable increases, the response variable tends to decrease.
Outlier: A data point in a scatterplot that is far removed from the general pattern of the other points.
Calculator Tech (TI-84)
You can create a scatterplot on your TI-84 calculator to quickly visualize the relationship between two quantitative variables.
Scenario: You have data for an explanatory variable in List 1 (L1) and a response variable in List 2 (L2).
Step 1: Enter Your Data
Press
STAT.Select
1:Edit....Enter your explanatory variable data into
L1.Enter your corresponding response variable data into
L2. Make sure the lists are the same length!
Step 2: Set Up the Scatterplot
Press
2ndthenY=to access .Select
1:Plot1...and pressENTER.Turn the plot On.
For Type:, select the first icon, which is the scatterplot.
Set Xlist: to
L1(or whichever list holds your explanatory variable).Set Ylist: to
L2(or whichever list holds your response variable).Choose your preferred mark for the points.
Step 3: Display the Graph
Press
ZOOM.Select
9:ZoomStat. This automatically adjusts the window to fit all your data points.Your scatterplot will now be displayed. You can use the
TRACEbutton to move between points and see their coordinates.
Important: If you get an "ERR:DIM MISMATCH" message, it means your L1 and L2 are not the same length. Go back to STAT -> 1:Edit... and fix them.
How to Show Work on the FRQ
When asked to describe the relationship shown in a scatterplot on a Free Response Question, you must provide a complete description in context. Use the DUFS framework and write in full sentences.
FRQ Response Template: Describing a Scatterplot
Direction: "There is a [positive/negative/no], [form] association between [explanatory variable in context] and [response variable in context]."
Form: (This is often combined with the first sentence, as shown above). State if the form is linear or non-linear/curved.
Strength: "The association is [strong/moderate/weak] because the points are [tightly clustered around the form / somewhat spread out / very scattered]."
Unusual Features: "There [appears to be an outlier / are no apparent outliers or unusual features]. The point at approximately ([x-value], [y-value]) deviates from the overall linear pattern." (Only include the second sentence if an outlier exists).
Key to Full Credit: ALWAYS write in the context of the problem. Do not just say "x" and "y"; use the actual names of the variables being studied (e.g., "age of the car" and "resale price").
Practice Problems
Problem 1:
A real estate agent gathers data on 8 recently sold homes in a neighborhood. She records the size of each home (in square feet) and its selling price (in thousands of dollars).
| Size (sq. ft.) | Price (1000s) | | :--- | :--- | | 1400 | 245 | | 1600 | 312 | | 1700 | 279 | | 1850 | 308 | | 2100 | 355 | | 2230 | 360 | | 2400 | 405 | | 2500 | 420 | Describe the relationship between the size of a home and its selling price. **Solution:** First, we identify the variables. The size of the home is the explanatory variable (x), and the selling price is the response variable (y). We can create a scatterplot using a TI-84 or by hand. [Image: A scatterplot showing Size (sq. ft.) on the x-axis and Price ($1000s) on the y-axis. The 8 points are plotted, showing a clear upward trend.] Now, we apply the DUFS framework to describe the relationship in context. * **Direction:** There is a **positive**, **linear** association between the size of a home in square feet and its selling price in thousands of dollars. As the size of the home increases, the selling price tends to increase. * **Form:** The overall pattern of the points appears to be **linear**. * **Strength:** The association is **strong** because the points are tightly clustered around a potential line, with very little scatter. * **Unusual Features:** There are **no apparent outliers** or other unusual features in the plot. **Problem 2:** The scatterplot below displays the relationship between the number of miles a used car has been driven and its selling price for a sample of 15 cars of the same make and model. Describe the association. [Image: A scatterplot with "Miles Driven (in thousands)" on the x-axis and "Price ()" on the y-axis. The points show a clear negative, linear trend. There is one point far away from the others, at a low mileage but also a very low price, say (10 thousand miles, $5000), while other cars with low mileage are priced much higher.]
Solution:
We apply the DUFS framework, making sure to use the context provided by the graph's labels.
Direction & Form: There is a negative, linear association between the number of miles a car has been driven and its selling price. As the miles driven increase, the selling price of the car tends to decrease.
Strength: The association appears to be moderately strong. While there is a clear linear trend, the points are somewhat scattered around the general pattern.
Unusual Features: There appears to be one potential outlier. The car with approximately 10,000 miles and a price of $5,000 has a much lower price than other cars with a similar low mileage, so it deviates from the overall linear pattern.
Common Mistakes to Avoid
Forgetting Context: This is the most common mistake. Stating "There is a strong, positive, linear association" will not earn full credit. You MUST say "There is a strong, positive, linear association between the number of hours studied and the student's exam score."
Implying Causation: Association does not imply causation. Never use words like "causes," "proves," or "leads to." Instead, use phrases like "tends to be associated with," "is related to," or "as x increases, y tends to increase/decrease."
Confusing Strength and Slope: The steepness of the line does not determine the strength of the association. A relationship can have a very steep slope but be weak if the points are widely scattered. Strength is about how tightly the points hug the form, not how steep the form is.
Misidentifying Axes: Always place the explanatory variable on the x-axis and the response variable on the y-axis. If the problem states "we want to predict price from size," then size is the explanatory (x) variable.
Incomplete Descriptions: Failing to address all four parts of DUFS (Direction, Unusual Features, Form, Strength) will result in a loss of points. Make it a habit to check off each letter as you write your description.