Quick Summary
This guide will equip you to master the representation of categorical data. You will learn to organize data into frequency and relative frequency tables, which are essential for understanding the distribution of a single categorical variable. Using this information, you will be able to construct and interpret bar charts, the primary graphical tool for this type of data, and clearly describe the key features of the distribution in the context of a problem.
Key Concepts
The first step in statistics is often to organize and visualize data. When dealing with variables that place individuals into categories (like eye color, car brand, or opinion), we use specific tables and graphs.
1. Frequency and Relative Frequency Tables
Before we can create a graph, we must summarize the data in a table.
Categorical Variable: A variable that places an individual into one of several distinct groups or categories. Examples: Favorite subject (Math, Science, English), Blood type (A, B, AB, O), State of residence.
Frequency: The count of how many times a specific category appears in a dataset. A frequency table (or frequency distribution) lists each category and its corresponding frequency.
Relative Frequency: The proportion or percentage of observations that fall into a specific category. A relative frequency table lists each category and its relative frequency.
Formula for Relative Frequency:
To express this as a percentage, multiply the result by 100.
Key Property: The sum of all frequencies must equal the total number of observations (the sample size, n). The sum of all relative frequencies must equal 1 (or 100%). This is a great way to check your work.
Example: A survey asked 200 high school students to name their primary source of news. The results are summarized below.
[Image: A frequency and relative frequency table for the "Primary News Source" survey.]
| Primary News Source | Frequency (Count) | Relative Frequency (Proportion) | Relative Frequency (Percent) |
|---|---|---|---|
| Social Media | 110 | 110 / 200 = 0.55 | 55% |
| TV News | 50 | 50 / 200 = 0.25 | 25% |
| News Websites | 30 | 30 / 200 = 0.15 | 15% |
| Print Newspaper | 10 | 10 / 200 = 0.05 | 5% |
| Total | 200 | 1.00 | 100% |
2. Bar Charts (or Bar Graphs)
The most common and effective way to graph a single categorical variable is a bar chart.
Purpose: To display the frequency or relative frequency of each category. The length or height of each bar is proportional to the value it represents.
Construction:
Title: Give the graph a clear, descriptive title.
Horizontal Axis (x-axis): Label this axis with the names of the different categories.
Vertical Axis (y-axis): Label this axis "Frequency" or "Relative Frequency." The scale must start at 0 and be clearly marked with consistent increments.
Bars: Draw a bar for each category. The height of the bar corresponds to its frequency or relative frequency from your table.
Gaps:Crucially, the bars for different categories must not touch. There should be equal-sized gaps between the bars. This visually reinforces that the categories are distinct and separate.
Example using the News Source data:
Frequency Bar Chart:
The y-axis shows the raw counts.
[Image: A frequency bar chart for the "Primary News Source" survey. The x-axis has categories 'Social Media', 'TV News', etc. The y-axis is labeled 'Frequency' and goes from 0 to 120. The bar for 'Social Media' reaches 110, 'TV News' reaches 50, and so on.]
Relative Frequency Bar Chart:
The y-axis shows the proportions or percentages. Notice that the overall shape and visual comparison between bars is identical to the frequency chart.
[Image: A relative frequency bar chart for the "Primary News Source" survey. The x-axis is the same. The y-axis is labeled 'Relative Frequency (%)' and goes from 0 to 60%. The bar for 'Social Media' reaches 55%, 'TV News' reaches 25%, etc.]
3. Describing the Distribution of a Categorical Variable
When asked to describe the distribution, you are not looking for shape, center, and spread as you would with quantitative data. Instead, focus on these key features:
Identify the Mode: State which category has the highest frequency (the most common outcome).
Identify the Lowest Frequency: State which category has the lowest frequency.
Make Comparisons: Explicitly compare the frequencies or relative frequencies of different categories. Use phrases like "twice as many," "slightly more than," or use the actual data values to support your comparison.
Key Vocabulary
Categorical Variable: A variable that assigns individuals to distinct groups or categories based on a qualitative characteristic (e.g., favorite color, type of car).
Frequency: The number of times a value or category occurs in a dataset; a raw count.
Relative Frequency: The proportion (or percent) of observations that fall within a specific category, calculated as the category's frequency divided by the total number of observations.
Frequency Table: A table that organizes data by listing each category and its corresponding frequency.
Bar Chart: A graphical display for categorical data where rectangular bars are used to represent the frequency or relative frequency of each category. The bars do not touch.
Mode: The category that appears most frequently in a distribution of categorical data.
Calculator Tech (TI-84)
No major calculator functions are required for this topic. Constructing frequency tables and bar charts is typically done by hand or with computer software based on provided data. You can enter categorical data labels into one list (e.g., L1) and their corresponding frequencies into another (e.g., L2) for organizational purposes, but the TI-84's built-in plotting functions are not designed to create proper bar charts from this summary data.
How to Show Work on the FRQ
On Free Response Questions, you will most often be asked to create a graph or interpret a given graph. When asked to describe the distribution of a single categorical variable shown in a bar chart, provide a concise description in context.
Template for Describing a Categorical Distribution:
Identify the variable and provide context. Start by stating what the graph is displaying.
- Sentence Starter: "This bar chart displays the distribution of [variable name] for [context of the study]..."
Identify the mode (most frequent category). State the category with the tallest bar and provide its specific frequency or relative frequency.
- Sentence Starter: "The most common [category type] was [name of modal category], which occurred [frequency] times." or "...which accounted for [relative frequency]% of the total."
Identify the least frequent category. State the category with the shortest bar and provide its value.
- Sentence Starter: "The least common [category type] was [name of least frequent category] with a frequency of [frequency]."
Make a comparison. Choose two or more interesting categories and explicitly compare them using their values.
- Sentence Starter: "The frequency of [Category A] was [comparison word, e.g., 'roughly double'] the frequency of [Category B] ([value A] vs. [value B])."
Practice Problems
Problem 1:
A local animal shelter recorded the type of pet adopted during a one-month period. The raw data is listed below:
Dog, Cat, Cat, Dog, Dog, Rabbit, Cat, Dog, Cat, Dog, Dog, Cat, Cat, Rabbit, Dog, Cat, Dog, Cat, Dog, Cat
(a) Construct a frequency and relative frequency table for these data.
(b) Create a well-labeled bar chart to display the distribution of pet types adopted.
Solution:
(a) First, we count the occurrences of each pet type.
Dog: 9
Cat: 9
Rabbit: 2
Total observations = 9 + 9 + 2 = 20.
Next, we calculate the relative frequencies:
Dog: 9 / 20 = 0.45 (or 45%)
Cat: 9 / 20 = 0.45 (or 45%)
Rabbit: 2 / 20 = 0.10 (or 10%)
Now, we construct the table:
| Pet Type | Frequency | Relative Frequency |
|---|---|---|
| Dog | 9 | 0.45 |
| Cat | 9 | 0.45 |
| Rabbit | 2 | 0.10 |
| Total | 20 | 1.00 |
(b) Using the frequency table, we create the bar chart.
Distribution of Adopted Pet Types
[Image: A bar chart titled "Distribution of Adopted Pet Types". The x-axis is labeled "Pet Type" with categories "Dog", "Cat", and "Rabbit". The y-axis is labeled "Frequency" and is scaled from 0 to 10. The bar for "Dog" goes up to 9. The bar for "Cat" goes up to 9. The bar for "Rabbit" goes up to 2. There are clear gaps between the bars.]
The chart has a clear title.
The x-axis is labeled with the categories.
The y-axis is labeled "Frequency" and has a consistent scale starting at 0.
The bars for Dog and Cat are of equal height (9), and the bar for Rabbit is shorter (2).
There are gaps between the bars.
Problem 2:
The bar chart below shows the results of a survey that asked a random sample of 400 adults about their highest level of education completed.
[Image: A relative frequency bar chart titled "Highest Level of Education for 400 U.S. Adults". The y-axis is labeled "Relative Frequency (%)" and is scaled from 0 to 40%. The categories on the x-axis and their corresponding bar heights are: "No HS Diploma" (15%), "HS Diploma" (35%), "Some College" (25%), "Bachelor's Degree" (20%), "Graduate Degree" (5%).]
(a) How many adults in the survey reported having a Bachelor's Degree?
(b) Describe the distribution of the highest level of education for this sample of adults.
Solution:
(a) The bar chart shows that 20% of the respondents have a Bachelor's Degree. The total sample size is 400 adults.
To find the number of adults, we calculate 20% of 400.
Number = 0.20 * 400 = 80.
80 adults in the survey reported having a Bachelor's Degree.
(b) Using the FRQ template to describe the distribution:
This bar chart displays the distribution of the highest level of education completed for a sample of 400 U.S. adults. The most common level of education was a High School Diploma, which was reported by 35% of the sample. The least common level was a Graduate Degree, reported by only 5% of the adults. The percentage of adults with a High School Diploma (35%) was substantially higher than the percentage with a Bachelor's Degree (20%).
Common Mistakes to Avoid
Confusing Bar Charts and Histograms: This is the most common graphical error. Remember: Bar charts are for categorical data and must have gaps between the bars. Histograms are for quantitative data, and the bars typically touch to represent continuous intervals.
Applying Quantitative Descriptions to Categorical Data: Never describe a bar chart as "skewed" or "symmetric." These terms apply only to quantitative data. Do not try to find the "mean" or "median" of categorical data. Focus only on frequencies, relative frequencies, and the mode.
Forgetting Labels, Title, and Scale (LTS): On the AP exam, a graph without a descriptive title, clearly labeled axes, and a properly scaled y-axis (starting at 0) will lose points. Always double-check your LTS.
Using Counts When Relative Frequencies are Needed: Read the question carefully. If it asks for a proportion or percentage, do not provide the raw count. If you are asked to create a relative frequency bar chart, make sure your y-axis is scaled for proportions (0 to 1) or percentages (0 to 100).
Misinterpreting the Y-Axis: Always check if the y-axis represents frequency (counts) or relative frequency (percentages). This is critical for answering questions correctly, like in Practice Problem 2(a).