Quick Summary
This guide will empower you to master boxplots as a tool for visualizing and interpreting quantitative data. You will learn to calculate the five-number summary, use the 1.5*IQR rule to formally identify outliers, and construct a modified boxplot. By the end of this lesson, you will be able to expertly describe the shape, center, and spread of a distribution using only its boxplot, a critical skill for the AP exam.
Key Concepts
Boxplots are a powerful way to visualize the summary of a quantitative dataset. They are based on the five-number summary and are particularly useful for identifying outliers and comparing distributions.
1. The Five-Number Summary
This is the set of five key values that provides a concise summary of a distribution's center and spread.
Minimum: The smallest value in the dataset.
First Quartile (Q1): The 25th percentile. This value marks the point where 25% of the data falls below it. It is the median of the lower half of the data.
Median (M or Q2): The 50th percentile or the midpoint of the dataset. 50% of the data falls below it.
Third Quartile (Q3): The 75th percentile. This value marks the point where 75% of the data falls below it. It is the median of the upper half of the data.
Maximum: The largest value in the dataset.
2. Anatomy of a Boxplot
A boxplot graphically displays the five-number summary on a number line. The AP Statistics course exclusively uses the modified boxplot, which accounts for outliers.
[Image: A labeled modified boxplot. The horizontal axis is a scaled number line. A central box is drawn from Q1 to Q3. A vertical line inside the box marks the Median. A "whisker" (a horizontal line) extends from Q1 to the smallest non-outlier value. Another whisker extends from Q3 to the largest non-outlier value. A separate point (an asterisk or dot) is shown beyond the upper whisker, labeled "Outlier".]
The Box: The central box spans from Q1 to Q3. The length of the box represents the Interquartile Range (IQR), which contains the middle 50% of the data.
The Median: The line inside the box marks the median (Q2).
The Whiskers: The lines extend from the box to the smallest and largest data points that are not outliers.
The Four Sections: A key insight is that each part of the boxplot—the lower whisker, the lower half of the box (Q1 to M), the upper half of the box (M to Q3), and the upper whisker—contains approximately 25% of the data points. A longer section indicates that the data in that quartile is more spread out, not that it contains more data.
3. Identifying Outliers: The 1.5 * IQR Rule
To avoid subjective guessing, we use a formal rule to identify outliers. Any data point that falls outside the "fences" is an outlier.
Formulas:
Calculate the Interquartile Range: IQR = Q3 - Q1
Calculate the Lower Fence: Lower Fence = Q1 - 1.5 * IQR
Calculate the Upper Fence: Upper Fence = Q3 + 1.5 * IQR
Any data value that is less than the Lower Fence or greater than the Upper Fence is an outlier. In a modified boxplot, these outliers are plotted as individual points, and the whiskers stop at the next most extreme value within the fences.
Key Vocabulary
Five-Number Summary: A set of descriptive statistics that provides information about a dataset, consisting of the minimum, first quartile (Q1), median, third quartile (Q3), and maximum.
Quartiles (Q1, Q3): Values that divide a dataset into four equal parts. Q1 is the 25th percentile, and Q3 is the 75th percentile.
Median: The midpoint of a distribution (the 50th percentile), separating the lower half of the data from the upper half.
Interquartile Range (IQR): A measure of statistical dispersion, being equal to the difference between the third and first quartiles (IQR = Q3 - Q1). It represents the range of the middle 50% of the data.
Boxplot: A standardized way of displaying the distribution of data based on the five-number summary. The AP course focuses on the "modified" boxplot, which displays outliers separately.
Outlier: An observation that lies an abnormal distance from other values in a random sample from a population. Formally, a point outside the fences defined by the 1.5 * IQR rule.
Whisker: The lines extending from the box of a boxplot that represent the extent of the data, excluding outliers.
Calculator Tech (TI-84)
You can use the TI-84 to calculate the five-number summary and create a boxplot.
1. Enter Your Data:
Press
STAT->1:Edit...Type your data values into a list, such as
L1.
2. Calculate the Five-Number Summary:
Press
STAT->CALC->1:1-Var Stats.Ensure
List:is set to the list containing your data (e.g.,L1).FreqList:should be blank.Select .
The results screen will show many values. Scroll down to see the five-number summary: ,
Q1$, (Median),Q3$, and .
3. Create the Boxplot:
Press
2nd->Y=[STAT PLOT].Select
1:Plot1...and pressENTER.Turn the plot .
For
Type:, select the modified boxplot icon. It is the first of the two boxplot options and has two small dots representing outliers.Set
Xlist:to your data list (e.g.,L1).Set `Freq:to $1.
Press
ZOOM->9:ZoomStat. This automatically adjusts the window to fit your data.You can press the
TRACEbutton and use the arrow keys to see the exact values for the min, Q1, median, Q3, and max on the graph.
How to Show Work on the FRQ
On Free Response Questions, you will be asked to identify outliers or describe/compare distributions shown in boxplots. Clear communication and justification are essential for full credit.
Template for Identifying Outliers
State the Rule: "To identify outliers, I will use the 1.5 * IQR rule. An observation is an outlier if it is less than Q1 - 1.5(IQR) or greater than Q3 + 1.5(IQR)."
Calculate Values: "First, I calculate the IQR: IQR = Q3 - Q1 = [value] - [value] = [value].
Next, I find the fences:
Lower Fence = Q1 - 1.5(IQR) = [value] - 1.5([value]) = [value].
Upper Fence = Q3 + 1.5(IQR) = [value] + 1.5([value]) = [value]."
Compare and Conclude: "Since the data point [value] is greater than the upper fence of [fence value], it is an outlier. There are no data points less than the lower fence of [fence value]."
Template for Describing a Distribution from a Boxplot (C.U.S.S.)
When asked to describe the distribution, address Center, Unusual features, Shape, and Spread in context.
Center: "The center of the distribution of [context] is described by the median, which is approximately [value with units]."
Unusual Features: "There [are/are no] outliers present. The outlier(s) at [value(s)] indicate unusually [high/low] [context]." (If you have the data, you must show the outlier test).
Shape: "The distribution of [context] appears to be [roughly symmetric / skewed to the right / skewed to the left]. This is because [justify by comparing whisker lengths and/or the position of the median in the box. E.g., 'the right whisker is much longer than the left whisker, and the median is closer to Q1 than Q3, indicating a right skew']."
Spread: "The overall spread is measured by the range, which is [Max - Min = value]. A better measure of spread for this distribution is the IQR, which is [Q3 - Q1 = value]. This means the middle 50% of [context] has a range of [IQR value with units]."
Practice Problems
Problem 1:
The following data represent the number of hours a group of 11 students spent studying for a final exam: .
(a) Find the five-number summary for these data.
(b) Identify any outliers using the 1.5 * IQR rule. Show your work.
Solution:
(a) First, order the data: .
Minimum: 8
Median (Q2): The middle value is the 6th value, which is 14.
Q1: The median of the lower half () is 12.
Q3: The median of the upper half () is 18.
Maximum: 30
The five-number summary is: Min=8, Q1=12, Median=14, Q3=18, Max=30.
(b)
State the Rule: To identify outliers, I will use the 1.5 * IQR rule. An observation is an outlier if it is less than Q1 - 1.5(IQR) or greater than Q3 + 1.5(IQR).
Calculate Values:
IQR = Q3 - Q1 = 18 - 12 = 6.
Lower Fence = Q1 - 1.5(IQR) = 12 - 1.5(6) = 12 - 9 = 3.
Upper Fence = Q3 + 1.5(IQR) = 18 + 1.5(6) = 18 + 9 = 27.
Compare and Conclude: The fences are at 3 and 27. The data point 30 is greater than the upper fence of 27, so 30 is an outlier. The minimum value, 8, is greater than the lower fence of 3, so there are no low outliers.
Problem 2:
The parallel boxplots below show the final exam scores for students in two different AP Statistics classes, Class A and Class B.
[Image: Two parallel boxplots on the same scale from 50 to 100.
Class A: Min=55, Q1=70, Med=75, Q3=80, Max=85. The boxplot is skewed left.
Class B: Min=60, Q1=80, Med=88, Q3=92, Max=98. The boxplot is roughly symmetric.]
Compare the distributions of final exam scores for Class A and Class B.
Solution:
When comparing distributions, you must use comparative language (e.g., "greater than," "less than," "similar to").
Center: Class B has a higher center than Class A. The median score for Class B (88) is significantly higher than the median score for Class A (75).
Unusual Features: Neither distribution appears to have any outliers, as there are no individual points marked on the plots.
Shape: The distribution of scores for Class A is skewed to the left, as the left whisker is longer and the median is closer to Q3. The distribution for Class B is roughly symmetric, as the median is near the center of the box and the whiskers are of similar length.
Spread: Class A has a larger overall spread than Class B. The range for Class A (85 - 55 = 30) is greater than the range for Class B (98 - 60 = 38). However, Class B has a slightly larger spread in its middle 50% of scores, with an IQR of 12 (92-80) compared to Class A's IQR of 10 (80-70).
Common Mistakes to Avoid
Confusing Box Width with Frequency: The width of a box or whisker section does not represent the number of data points. Each of the four sections (min-Q1, Q1-M, M-Q3, Q3-max) contains approximately 25% of the data. A wider section simply means the data in that quartile is more spread out.
Identifying Outliers "By Eye": Never state a point is an outlier just because it looks far away. You must perform the 1.5 * IQR test and show your calculations for the fences to receive credit on an FRQ.
Using Mean with Boxplots: Boxplots are based on the median and quartiles. Do not refer to the mean or standard deviation when describing a distribution from a boxplot, as those values cannot be determined from the graph.
Drawing Whiskers Incorrectly: In a modified boxplot, if there are outliers, the whisker does not extend to the outlier. It extends to the most extreme data value that is within the calculated fence.
Weak Shape Descriptions: Do not just say a distribution is "not symmetric." Be specific. "The distribution is skewed right because the right whisker is longer than the left whisker and the median is closer to Q1." This level of justification is required.