The Big Picture
In Unit 1, you learned to describe a single variable. You could analyze the shape, center, and variability of a dataset, like the heights of students in your class. Now, in Unit 2, we move from a monologue to a conversation. We ask: are two variables related? This unit is all about finding, describing, and modeling the relationships between two variables.
Think of yourself as a data detective. You're no longer just describing a single clue; you're looking for connections. Does the number of hours a student studies seem to be connected to their exam score? Is there a relationship between a car's weight and its gas mileage? This unit provides the foundational tools—graphs, statistics, and models—to uncover and explain these connections, which is the heart of what real-world statisticians do.
Key Questions
When we look at two variables, how can we determine if a relationship even exists between them?
How do we describe the specific nature of a relationship between two quantitative variables in terms of its direction, form, and strength?
Can we create a mathematical model to represent a linear relationship, and how can we use that model to make predictions?
How do we measure the "goodness" of our model and know when a linear model isn't the right choice?
Your Learning Path
1. Analyzing Categorical Relationships
Topic 2.1 - 2.3: Describing Relationships in Categorical Data
You'll begin by revisiting the fundamental question of what it means for two variables to be related. You will then focus exclusively on categorical data. The main tools here are two-way tables and segmented bar charts. You'll learn to calculate and compare conditional distributions to determine if there is an association between two categorical variables.
2. Visualizing and Measuring Quantitative Relationships
Topic 2.4 - 2.5: Scatterplots and Correlation
This section shifts the focus to relationships between two quantitative variables. Your primary graphical tool will be the scatterplot, and you'll learn to describe what you see using four key characteristics: direction, form, strength, and unusual features. You will then quantify the strength and direction of any linear relationship using a new statistic: the correlation coefficient, r.
3. Modeling Linear Relationships
Topic 2.6 - 2.8: The Least-Squares Regression Line
Once you've identified a linear relationship, you'll learn how to model it. This involves finding the "line of best fit," called the least-squares regression line (LSRL). You'll learn to interpret the slope and y-intercept of this line in the context of the problem, use the line to make predictions, and evaluate how well the line fits the data using residuals (prediction errors) and the coefficient of determination (r-squared).
4. Evaluating Our Models
Topic 2.9: Analyzing Departures from Linearity
A linear model isn't always appropriate. In this final step, you'll learn how to use a residual plot—a graph of the prediction errors—to determine if a linear model is truly the best choice or if there's a hidden curve or other pattern in the data that your line failed to capture.
How to Succeed in This Unit
Context is King. Never just state a number. A slope of "2.5" is meaningless. A slope of "2.5" means that "for each additional hour spent studying, the predicted exam score increases by 2.5 points." Always interpret slope, y-intercept, correlation, and r-squared in the context of the variables you are given.
Memorize the Mantra: Correlation is Not Causation. This is one of the most important ideas in all of statistics. Just because two variables have a strong linear relationship (a strong correlation) does not mean that one is causing the other to change. Always be on the lookout for potential lurking variables that could be influencing both.
Describe Scatterplots with D.F.S.U. When asked to describe a scatterplot, don't just say "it looks positive." You must address all four key features to get full credit: Direction (positive, negative, or none), Form (linear or non-linear), Strength (weak, moderate, or strong), and Unusual Features (outliers, clusters, or influential points).
Use "Predicted" Language. The regression line gives a prediction, not a certainty. When using your model, always state that you are finding the "predicted" or "estimated" y-value. Using the proper notation, ŷ (read "y-hat"), is a great way to show you understand this distinction.