Quick Summary
This guide focuses on comparing the distributions of a quantitative variable across two or more groups. You will learn to move beyond describing a single dataset to using specific graphical tools and comparative language to analyze the relationships between multiple distributions. By the end of this lesson, you will be able to construct and interpret back-to-back stem-and-leaf plots and parallel boxplots, and you will be able to write a complete comparison of shape, center, spread, and unusual features in the context of a problem.
Key Concepts
When we compare distributions, we are telling the story of how two or more groups differ or are similar with respect to a quantitative variable. The goal is not to list features for each group separately, but to make explicit comparisons between them. We use the acronym CUSS (Center, Unusual, Shape, Spread) to ensure our comparison is thorough.
1. The Core Principle: Comparative Language
The most critical skill is using explicit comparative words. Avoid describing one group and then the other. Instead, integrate them into the same sentence.
Good: "The median score for Group A was higher than the median score for Group B."
Bad: "The median score for Group A was 75. The median score for Group B was 68."
Key comparative words include: greater than, less than, more/less variable, more/less skewed, similar to, approximately the same as.
2. The Framework for Comparison: CUSS
Always address these four key features when comparing distributions.
Center: Compare a measure of central tendency.
Median vs. Mean: If a distribution is skewed or has outliers, the median is a more resistant and appropriate measure of center to compare. If distributions are roughly symmetric, comparing means is also acceptable.
Comparison: "The median battery life for Brand X phones is approximately 15 hours, which is greater than the median of 12 hours for Brand Y."
Unusual Features: Compare any outliers, gaps, or clusters.
Presence: Note if one group has outliers while the other does not.
Comparison: "The distribution of salaries for the Tech division shows two high outliers, whereas the distribution for the Sales division has no apparent outliers."
Shape: Compare the overall shape of the distributions.
Key Shapes: Skewed left, skewed right, roughly symmetric, unimodal, bimodal.
Comparison: "The distribution of test scores for the morning class is roughly symmetric, while the distribution for the afternoon class is skewed to the left."
Spread: Compare a measure of variability.
IQR vs. Standard Deviation/Range: The Interquartile Range (IQR) is resistant to outliers and is best paired with the median. The standard deviation and range are sensitive to outliers and are best paired with the mean.
Comparison: "The test scores for the morning class are more variable than the afternoon class, as shown by its larger IQR (20 points vs. 12 points)."
3. Graphical Tools for Comparison
Specific graphs are designed to make comparing distributions easy.
Parallel Boxplots:
This is the most common and powerful tool for comparing distributions, especially for more than two groups.
Boxplots are plotted side-by-side (or one above the other) on the same numeric scale.
This allows for a quick visual comparison of the five-number summaries (minimum, Q1, median, Q3, maximum).
You can visually compare:
Centers: The median lines inside the boxes.
Spreads: The length of the boxes (IQR) and the overall length of the whiskers (range).
Skewness: The position of the median line within the box and the relative length of the whiskers.
Outliers: Points plotted individually.
[Image: Two parallel boxplots on a shared horizontal axis. Boxplot A is higher on the axis, wider, and symmetric. Boxplot B is lower, narrower, and its median is shifted to the left, indicating right skew.]
Back-to-Back Stem-and-Leaf Plots:
Used to compare two distributions. Not suitable for more than two.
They share a common "stem" in the middle. The "leaves" for one group are placed to the right, and the leaves for the other group are placed to the left.
The leaves on the left side should be ordered increasing as you move away from the stem.
This preserves the actual data values, which is an advantage over boxplots or histograms.
[Image: A back-to-back stem-and-leaf plot. A central stem has leaves for "Group A" on the left and "Group B" on the right. A key is included.]
Key Vocabulary
Back-to-back stem-and-leaf plot: A graphical display used to compare two datasets by placing the leaves for one group on the left and the other on the right of a shared stem.
Parallel boxplots: Two or more boxplots drawn on the same numerical scale to compare the distributions of a quantitative variable across different groups.
Comparative language: Words and phrases used to directly compare features of distributions, such as "greater than," "less than," "more variable," or "similar in shape."
Distribution: The pattern of variation of a variable, which shows the values the variable takes and how often it takes them.
Resistant measure: A statistic that is not significantly affected by extreme values (outliers) in the data. The median and IQR are resistant measures.
Center: A measure of the "typical" value in a distribution, such as the mean or median.
Spread: A measure of the variability or dispersion in a distribution, such as the range, IQR, or standard deviation.
Calculator Tech (TI-84)
The primary calculator skill for this topic is creating parallel boxplots to visually compare two or more datasets.
Steps to Create Parallel Boxplots:
Enter Data:
Press
STAT->1:Edit....Enter the data for your first group into list
L1.Enter the data for your second group into list
L2.
Set up Plot 1:
Press
2nd->Y=[STAT PLOT].Select
1:Plot1...and pressENTER.Turn the plot .
For
Type:, select the boxplot icon that shows outliers (the first of the two boxplot options).Set
Xlist:toL1.Set `Freq:to $1.
Set up Plot 2:
Press
2nd->Y=[STAT PLOT].Select
2:Plot2...and pressENTER.Turn the plot .
For
Type:, select the same boxplot icon.Set
Xlist:toL2.Set `Freq:to $1.
Display the Graphs:
Press
ZOOM->9:ZoomStat.The calculator will automatically adjust the window to display both boxplots on the same scale, ready for comparison. You can press
TRACEand use the arrow keys to see the five-number summary values for each plot.
How to Show Work on the FRQ
On Free Response Questions, you must write a coherent paragraph that compares the distributions. Simply listing characteristics will not earn full credit. Use the CUSS framework and the template below.
FRQ Response Template for Comparing Distributions:
Compare Shapes: Start by comparing the shapes of the distributions.
- Template: "The distribution of [contextual variable] for [Group 1] is [shape description, e.g., roughly symmetric, skewed right], while the distribution for [Group 2] is [shape description]."
Compare Centers: Compare the medians (or means). You MUST use an explicit comparative phrase and include the specific values.
- Template: "The center of the distribution of [variable] for [Group 1] is higher/lower than the center for [Group 2]. The median for [Group 1] is [value with units], which is greater/less than the median for [Group 2] of [value with units]."
Compare Spreads: Compare the IQRs (or standard deviations/ranges). You MUST use an explicit comparative phrase and include the specific values.
- Template: "The distribution of [variable] for [Group 1] is more/less variable than the distribution for [Group 2]. The IQR for [Group 1] is [value with units], which is larger/smaller than the IQR for [Group 2] of [value with units]."
Compare Unusual Features: Comment on outliers, gaps, or other interesting features.
- Template: "Finally, the distribution for [Group 1] has [describe unusual feature, e.g., a high outlier at 95], whereas the distribution for [Group 2] shows no apparent outliers."
Scoring Note: To get an "E" (Excellent), you must address all four points (C-U-S-S), use explicit comparative language, and do so in the context of the problem.
Practice Problems
Problem 1:
A teacher wants to compare the effectiveness of two different review methods on test scores. Group A used online flashcards, and Group B attended a review session. Their scores on a 50-point quiz are listed below.
Group A (Flashcards): 38, 42, 42, 44, 45, 45, 45, 48, 49
Group B (Review Session): 35, 36, 40, 41, 42, 42, 44, 45, 50
Construct a back-to-back stem-and-leaf plot for these data and use it to write a brief comparison of the two distributions of quiz scores.
Solution:
First, construct the back-to-back stem-and-leaf plot. The "stems" will be the tens digits (3 and 4) and the "leaves" will be the ones digits.
Group A (Flashcards) | Stem | Group B (Review Session)
--------------------|------|-------------------------
8 | 3 | 5 6
9 8 5 5 5 4 2 2 | 4 | 0 1 2 2 4 5
| 5 | 0
Key: 4 | 2 represents a score of 42.
Next, write the comparison using the FRQ template. We need to find the medians and can comment on the range for spread.
Group A Median: The 5th value is 45.
Group B Median: The 5th value is 42.
Group A Range: 49 - 38 = 11.
Group B Range: 50 - 35 = 15.
Comparison:
The distribution of quiz scores for Group A (flashcards) is skewed to the left, while the distribution for Group B (review session) is roughly symmetric. The center of the distribution of scores for Group A is higher than the center for Group B. The median score for Group A was 45 points, which is greater than the median score for Group B of 42 points. The scores for Group B appear slightly more variable than for Group A, with a range of 15 points compared to Group A's range of 11 points. Neither distribution has any apparent outliers.
Problem 2:
The parallel boxplots below show the heights (in inches) of a random sample of male and female students at a large high school.
[Image: Two parallel boxplots on a shared horizontal axis labeled "Height (inches)". The top boxplot is labeled "Male" and the bottom is labeled "Female". The "Male" boxplot is shifted to the right of the "Female" boxplot. Male data: Min=64, Q1=67, Med=69, Q3=71, Max=74. Female data: Min=59, Q1=62, Med=64, Q3=66, Max=70, plus one outlier at 55.]
Use the information in the graph to compare the distributions of heights for male and female students at this school.
Solution:
Using the FRQ template and reading the five-number summaries from the graph:
Comparison:
The distribution of heights for male students appears roughly symmetric, while the distribution for female students appears slightly skewed to the right (the median is closer to Q1). The center of the distribution of heights for males is greater than the center for females. The median height for males is 69 inches, which is 5 inches taller than the median height for females of 64 inches. The distribution of heights is more variable for females than for males. The IQR for female heights is 4 inches (66-62), which is larger than the IQR for male heights of 4 inches (71-67), but the overall range for females (70-59 = 11, ignoring the outlier) is larger than for males (74-64 = 10). Finally, the distribution of female heights has a low outlier at 55 inches, whereas the distribution of male heights has no outliers.
Common Mistakes to Avoid
The "Laundry List" Error: This is the most common mistake. Students describe the shape, center, and spread for Group A, and then separately describe the shape, center, and spread for Group B. This is not a comparison. You must use comparative words like "higher than," "more skewed than," "less variable than" to receive credit.
Vague Comparisons: Using weak comparative words like "different" or "changed." Always be specific. Don't say "the centers were different." Say "the median for males was higher than the median for females."
Forgetting Context: Your entire response should be framed in the context of the problem. Refer to "heights of students," "quiz scores," or "battery life," not just "the distribution for L1" or "the boxplot."
Comparing Mean to Median: When comparing centers, compare like with like. Compare the median of Group A to the median of Group B, or the mean of A to the mean of B. If one distribution is heavily skewed, it is best to compare medians for both groups, as the median is a more resistant measure.
Misinterpreting the Box in a Boxplot: Remember that 50% of the data lies within the box (the IQR). A wider box means a larger spread for the middle 50% of the data; it does not mean there are more data points in that group.