Representing a Quantitative | AP Stats Unit 1 Study Guide

Quick Summary

This guide will equip you to master the graphical representation of quantitative data. You will learn to construct and interpret three essential graphs—dotplots, stemplots, and histograms—and use them to describe the key features of a data distribution, including its shape, center, variability, and any unusual characteristics. This skill is foundational for nearly all data analysis in AP Statistics.

Key Concepts

There are three primary ways to graph a single quantitative variable. Each has its own strengths and is appropriate for different situations. After creating a graph, our goal is always to describe what we see.

1. Types of Graphs for Quantitative Data

Dotplot
- What it is: A simple graph where each data value is shown as a dot above its location on a number line.
- Best for: Small datasets, as it shows every individual data value. It's excellent for quickly visualizing the shape and spread of the data.
- Construction:
  1. Draw a horizontal axis (a number line) and label it with the variable's name and units.
  2. Scale the axis to cover the full range of the data.
  3. For each data value, place a dot above its corresponding location on the number line. Stack dots vertically if values are repeated.
- [Image: A dotplot showing the distribution of quiz scores for a class of 20 students. The x-axis is labeled "Quiz Score (out of 10)" and ranges from 0 to 10. Dots are clustered around 7, 8, and 9, with a single dot at 2.]
Stem-and-Leaf Plot (Stemplot)
- What it is: A graph that separates each data value into a "stem" (all but the final digit) and a "leaf" (the final digit). It preserves the individual data values while giving a picture of the distribution's shape.
- Best for: Small to moderately sized datasets. It's like a histogram turned on its side, but it retains the raw data.
- Construction:
  1. Separate each observation into a stem and a leaf.
  2. Write the stems in a vertical column with the smallest at the top. Draw a vertical line to the right of this column.
  3. Write each leaf in the row to the right of its stem, in increasing order out from the stem.
  4. CRITICAL: Provide a key that explains what the stems and leaves represent (e.g., "Key: 4|1 means 41 inches").
- Variations:
  - Splitting Stems: If too many leaves are crowded on one stem, you can split the stem. For example, a "2" stem could be split into two: one for leaves 0-4 and another for leaves 5-9.
  - Back-to-Back Stemplots: Used to compare two distributions. The stems are in the center, with leaves for one group going to the right and leaves for the other group going to the left.
- [Image: A stem-and-leaf plot of student heights. Stems (6, 7) are on the left of a vertical line. Leaves (individual inches) are on the right. A key below reads "Key: 6|2 means 62 inches".]
Histogram
- What it is: The most common graph for quantitative data. It groups data values into intervals of equal width, called bins or classes, and shows how many values (frequency) or what percentage of values (relative frequency) fall into each bin.
- Best for: Larger datasets where showing every individual value would be overwhelming.
- Construction:
  1. Divide the range of the data into equal-width bins. There is no single "right" choice for bin width, but 5-10 bins is a good starting point.
  2. Find the count (frequency) or percent (relative frequency) of observations in each bin.
  3. Draw and label the axes. The horizontal axis is the variable being measured. The vertical axis is the frequency or relative frequency.
  4. Draw bars for each bin. The height of the bar corresponds to its frequency/relative frequency. The bars in a histogram must touch, indicating that the variable is continuous.
- Important Note: Histograms lose individual data values. Once the data is grouped into bins, you only know how many values are in that bin, not what the specific values were.
- [Image: A histogram showing the distribution of ages of CEOs. The x-axis is "Age" grouped into bins (40-45, 45-50, etc.). The y-axis is "Frequency". The bars are touching.]

2. Describing a Distribution (The SOCS Framework)

When asked to describe a distribution from a graph, you must always address four key features. The acronym SOCS is a great way to remember them. Always write in the context of the problem!

S - Shape
- Modality: How many peaks does the graph have?
  - Unimodal: One main peak.
  - Bimodal: Two distinct peaks.
  - Multimodal: More than two peaks.
  - Uniform: The bars or dots are all approximately the same height; there is no clear peak.
- Symmetry/Skewness:
  - Roughly Symmetric: The right and left sides of the graph are approximately mirror images of each other. The mean and median will be close.
  - Skewed to the Right (Positively Skewed): The "tail" of the graph (the lower-frequency values) extends far to the right. The mean will be greater than the median.
  - Skewed to the Left (Negatively Skewed): The "tail" of the graph extends far to the left. The mean will be less than the median.
- [Image: Three small histograms side-by-side. The first is labeled "Skewed Left" with a tail to the left. The second is "Symmetric" with a bell shape. The third is "Skewed Right" with a tail to the right.]
O - Outliers and Unusual Features
- Outliers: Data points that fall far away from the main pattern of the distribution. Mention any potential outliers you see and give their approximate value.
- Gaps: Areas of the distribution where there are no data values.
- Clusters: Concentrations of data in specific areas.
C - Center
- Describe the "middle" of the distribution. For now, you can estimate the midpoint visually. Later, we will calculate specific measures of center like the mean and median.
- For a symmetric distribution, the center is near the peak. For a skewed distribution, the median is a better measure of center.
S - Spread (Variability)
- Describe how spread out the data are. For now, you can state the overall range (Maximum - Minimum). Later, we will calculate more robust measures like the interquartile range (IQR) and standard deviation.

Key Vocabulary

Distribution: The pattern of variation of a variable. It shows what values the variable takes and how often it takes them.
Histogram: A graph for quantitative data where bars represent the frequency of data points falling into specific intervals (bins). The bars must touch.
Stem-and-leaf Plot (Stemplot): A graph that separates each data value into a stem and a leaf, preserving individual data values while showing the distribution's shape.
Dotplot: A graph where each data value is represented by a dot above its location on a number line.
Skewed (Right/Left): A distribution is skewed if one of its tails is longer than the other. It is skewed right if the long tail is on the right (positive) side, and skewed left if the long tail is on the left (negative) side.
Symmetric: A distribution where the right and left sides are approximate mirror images of each other.
Outlier: An individual value that falls far outside the overall pattern of the rest of the data.
Modality: The number of peaks in a distribution (e.g., unimodal, bimodal).

Calculator Tech (TI-84)

You can create a histogram on the TI-84.

Example: Create a histogram for the data: {12, 15, 16, 16, 18, 22, 23, 25, 25, 26, 30, 31, 35}

Enter Data into a List:
- Press STAT -> 1:Edit....
- If there is data in L1, move the cursor to highlight L1, press CLEAR, then ENTER.
- Type your data values into L1, pressing ENTER after each one.
Set Up the Stat Plot:
- Press 2nd -> Y= (for [STAT PLOT]).
- Select 1:Plot1... and press ENTER.
- Turn the plot On.
- For Type, select the third icon, which looks like a histogram.
- Set Xlist: to L1 (or whichever list you used).
- Set Freq: to $1$ .
Adjust the Window (CRITICAL STEP):
- Press WINDOW. This controls the scaling of your graph.
- Xmin: A value slightly less than your minimum data value. (e.g., $10$ ).
- Xmax: A value slightly more than your maximum data value. (e.g., $40$ ).
- Xscl: This is the bin width. This is the most important setting. Let's choose a bin width of 5. Set $X sc l$ to $5$ .
- Ymin: $0$ (frequency can't be negative).
- Ymax: A value slightly larger than the highest frequency you expect in any bin. (e.g., $5$ ).
- Yscl: $1$ .
- Xres: $1$ .
Graph and Trace:
- Press GRAPH. You should see the histogram.
- Press TRACE. Use the left and right arrow keys to move between the bars. The calculator will show you the range for each bin ( $min$ and $ma x$ ) and the frequency ( $n$ ) within that bin.

How to Show Work on the FRQ

For Free Response Questions that ask you to describe or compare distributions, you must use the SOCS framework and always write in context. Simply listing features is not enough; you must tie them to the variable being measured.

Template for Describing a Single Distribution:

"The distribution of [quantitative variable in context] is [describe the shape: skewed left/right, roughly symmetric, and modality]. The center of the distribution is approximately [estimate the median or mean]. The data varies from [minimum value] to [maximum value], with a range of [calculate range]. There appear to be [mention any potential outliers, gaps, or clusters, giving their approximate values]."

Example using the template:

"The distribution of salaries for employees at the company is skewed to the right and appears to be unimodal. The center of the distribution is approximately $55,000. The salaries vary from $35,000 to $150,000, with a range of $115,000. There appears to be a potential outlier at $150,000, which is significantly higher than the other salaries." ## Practice Problems **Problem 1:** The following data represent the number of minutes 15 randomly selected students spent on homework last night: $15, 22, 25, 28, 30, 35, 35, 36, 40, 42, 45, 45, 55, 60, 95$

(a) Create a stem-and-leaf plot of these data.

(b) Describe the distribution.

Solution:

(a) First, we identify the stems (the tens digits) and the leaves (the ones digits). The stems will range from 1 to 9.


1 | 5

2 | 2 5 8

3 | 0 5 5 6

4 | 0 2 5 5

5 | 5

6 | 0

7 |

8 |

9 | 5


Key: 1|5 means 15 minutes

(b) Using the SOCS framework and the FRQ template:

The distribution of homework time for these students is skewed to the right and unimodal. The center (median) of the distribution is the 8th value, which is 36 minutes. The time spent on homework varies from a minimum of 15 minutes to a maximum of 95 minutes. There is a potential outlier at 95 minutes, as it is separated from the rest of the data by a large gap (no values in the 70s or 80s).

Problem 2:

The histogram below shows the distribution of the number of electoral votes for each of the 50 states and the District of Columbia. Describe the distribution.

[Image: A histogram titled "Electoral Votes per State". The x-axis is "Number of Electoral Votes" and the y-axis is "Frequency (Number of States)". The bins are 0-5, 5-10, 10-15, etc., up to 55-60. The vast majority of the bars are on the left, with a very high bar for the 0-5 bin, and the frequencies decrease rapidly, with a few very short bars on the far right.]

Solution:

Using the SOCS framework and the FRQ template:

The distribution of electoral votes per state is strongly skewed to the right and is unimodal, with the main peak in the 0-5 votes bin. This indicates that most states have a small number of electoral votes. The center (median) of the distribution is in the 5-10 votes bin. The number of electoral votes varies from a minimum of 3 to a maximum in the 50-55 votes bin. There are several potential outliers on the high end, such as the states in the 30+ vote bins, which have far more votes than the vast majority of other states.

Common Mistakes to Avoid

Forgetting the Key on a Stemplot: A stemplot is meaningless without a key to interpret the values. Always include one (e.g., "Key: 2|3 = 23"). This is an easy point to lose on the AP exam.
Confusing Histograms and Bar Charts: Histograms are for quantitative data, and the bars must touch. Bar charts are for categorical data, and the bars should have gaps between them. Never call a histogram a bar chart.
Describing Shape Incorrectly: Don't confuse skewed left and skewed right. Remember that the "skew" is in the direction of the long tail, not the direction of the peak.
Using Vague or Non-Statistical Language: Avoid subjective terms like "normal," "average," or "spread out." Use precise statistical terms: "roughly symmetric," "skewed right," "the range is 50," "the center is approximately 25."
Forgetting Context: Your description must be about the variable you are analyzing. Don't just say "The distribution is skewed right." Say "The distribution of salaries is skewed right." Context is king on the AP exam.

Representing a Quantitative Variable with Graphs - AP Statistics Study Guide

Quick Summary

Key Concepts

1. Types of Graphs for Quantitative Data

2. Describing a Distribution (The SOCS Framework)

Key Vocabulary

Calculator Tech (TI-84)

How to Show Work on the FRQ

Common Mistakes to Avoid