PrepGo

Introducing Statistics: Do the Data We Collected Tell the Truth? - AP Statistics Study Guide

Written by AP Content Team, Verified for 2026 AP Exams, Last updated: May 2026

Learn with study guides reviewed by top AP teachers. This guide takes about 12 minutes to read.

Quick Summary

This lesson introduces the foundational principles of data collection, focusing on how to gather data that accurately answers a statistical question. You will learn to distinguish between populations and samples, and to identify the critical differences between observational studies and experiments. By the end of this guide, you will be able to determine the appropriate scope of conclusions—specifically, when a cause-and-effect relationship can be established—based on the method used to collect the data.

Key Concepts

The primary goal of statistics is to gain insight about a large group by examining data from a small part of that group. The validity of our conclusions depends entirely on the quality of our data collection methods.

  • Population, Census, and Sample

    • A population is the entire group of individuals we want information about. It is defined by the research question. For example, if we want to know the average GPA of students at a specific high school, the population is all students at that high school.

    • A census is a method that attempts to collect data from every single individual in the population. While it provides a complete picture, a census is often impractical due to high costs, time constraints, and logistical difficulties. Imagine trying to get a response from every single adult in the United States—it's nearly impossible.

    • A sample is a subset of individuals from the population from which we actually collect data. We use data from a sample to draw conclusions (make inferences) about the entire population. The key is to select a sample that is representative of the population.

  • Data Collection Methods: The Two Main Paths

    Once you've defined your population and decided to use a sample, there are two primary ways to gather data: observational studies and experiments. The choice between them is the most important factor in determining what kind of conclusions you can draw.

  • Observational Studies

    • An observational study observes individuals and measures variables of interest but does not attempt to influence the responses. The researcher is a passive observer, simply gathering data on what is already happening.

    • Example: A researcher wants to study the relationship between regular exercise and stress levels. They survey a group of 100 office workers, asking them about their exercise habits and their current stress levels. The researcher is not telling anyone to exercise; they are just recording pre-existing behaviors.

    • Key Limitation: The single most important concept to remember is that an observational study, no matter how well-designed, cannot be used to establish a cause-and-effect relationship. This is because of confounding variables.

    • A confounding variable is a variable that is associated with both the explanatory variable (the one being studied) and the response variable, and can create a misleading association between them. In our exercise example, people who exercise regularly might also have healthier diets, better sleep habits, or more free time. Any of these factors (the confounders) could be the real cause of lower stress, not the exercise itself.

    • Types of Observational Studies:

      • Retrospective Study: Researchers examine existing data from the past. For example, analyzing hospital records from the last 10 years to look for a relationship between a certain medication and patient recovery times.

      • Prospective Study: Researchers identify subjects in advance and collect data as events unfold, tracking them into the future. For example, selecting a group of new high school graduates and following them for 20 years to study career choices and income.

  • Experiments

    • An experiment deliberately imposes some treatment on individuals to measure their responses. The goal is to see how the treatment causes a change in the response.

    • Example: To study the effect of a new fertilizer on tomato plant yield, a researcher gets 30 tomato plants. They randomly assign 15 plants to receive the new fertilizer (the treatment group) and 15 plants to receive a standard fertilizer (the control group). They then measure and compare the total weight of tomatoes produced by each group.

    • Key Strength: A well-designed, randomized experiment is the only statistical method that can establish a cause-and-effect relationship. By actively assigning treatments, researchers can control for other factors and isolate the effect of the variable they are studying. Random assignment helps ensure that the treatment groups are roughly equivalent at the start, balancing out the effects of potential confounding variables.

    • Essential Vocabulary for Experiments:

      • Experimental Units: The individuals to which treatments are applied. When the units are human beings, they are often called subjects. (In the example, the 30 tomato plants are the experimental units).

      • Explanatory Variable (or Factor): The variable that is purposefully manipulated by the researcher. (The type of fertilizer).

      • Treatment: A specific condition applied to the experimental units. A factor may have several levels, which are different values of the factor. (The treatments are "new fertilizer" and "standard fertilizer").

      • Response Variable: The variable that measures the outcome of the study. (The total weight of tomatoes produced).

[Image: A side-by-side diagram. Left side labeled "Observational Study" shows a researcher with a clipboard watching two distinct, pre-existing groups (e.g., coffee drinkers and non-coffee drinkers) and measuring an outcome. Right side labeled "Experiment" shows a single group of subjects being randomly split into two groups by the researcher, who then gives one group a treatment (e.g., a new pill) and the other a placebo, before measuring the outcome.]

Key Vocabulary

  • Population: The entire group of individuals that is the target of our interest in a statistical study.

  • Sample: A subset of the population from which we actually collect data, used to draw conclusions about the whole.

  • Observational Study: A study in which researchers gather data by observing individuals without attempting to impose a treatment or influence their responses.

  • Experiment: A study in which researchers deliberately impose treatments on individuals (experimental units) to measure their responses and determine a cause-and-effect relationship.

  • Confounding Variable: A variable, other than the explanatory variable being studied, that may influence the response variable and create a false perception of an association.

  • Treatment: A specific condition applied to the individuals in an experiment.

  • Census: A study that attempts to collect data from every single member of a population.

Calculator Tech (TI-84)

No major calculator functions are required for this topic. This unit is focused on the conceptual understanding of data collection methods.

How to Show Work on the FRQ

On the AP exam, you will frequently be asked to identify the type of study and justify your choice. Most importantly, you must be able to explain what conclusions can and cannot be drawn. Use the following template to structure your answers for clarity and full credit.

Template: Identifying Study Type and Justifying Conclusions

  1. Identify the Study Type: "This is an [observational study OR experiment]."

  2. Justify Your Identification:

    • For an Observational Study: "This is an observational study because the researchers did not impose a treatment. They simply [observed / measured / surveyed] the subjects and recorded [the explanatory variable] and [the response variable] without assigning individuals to any groups."

    • For an Experiment: "This is an experiment because the researchers deliberately imposed a treatment. Specifically, they [describe how the treatment was assigned] to the [experimental units/subjects] to see its effect on [the response variable]."

  3. State the Appropriate Conclusion:

    • For an Observational Study: "Because this is an observational study, we cannot conclude that [explanatory variable] causes a change in [response variable]. There may be one or more confounding variables that are influencing the results. For example, [propose a specific, plausible confounding variable] could be associated with both the explanatory and response variables."

    • For an Experiment: "Because the researchers conducted a randomized experiment, if the difference in [the response variable] between the groups is statistically significant, they can conclude that [the treatment] causes a change in [the response variable] for these subjects."

Practice Problems

Problem 1:

A local health department wants to investigate a potential association between living near major roadways and the incidence of childhood asthma. Researchers identify 500 children with asthma from hospital records and then select 500 children without asthma from the same hospitals. They then examine the medical records and home addresses of all 1000 children to determine how close each child lives to a major roadway. The study found that children with asthma were significantly more likely to live close to a major roadway. The health department issues a press release stating that "living near major roadways causes childhood asthma."

Is this study an observational study or an experiment? Explain what conclusion can and cannot be drawn from this study.

Solution:

This is an observational study. The researchers did not impose a treatment; they simply observed and collected data from existing medical records. They did not assign children to live near or far from a major roadway. Because this is an observational study, the health department cannot conclude that living near major roadways causes childhood asthma. There are potential confounding variables that could explain the association. For example, socioeconomic status could be a confounding variable; families with lower incomes may be more likely to live in areas near major roadways and may also have other risk factors for asthma, such as different housing conditions or access to healthcare.

Problem 2:

A company has developed a new "brain-training" app and wants to test if it improves memory. Researchers recruit 100 high school student volunteers. They randomly assign 50 students to use the new app for 20 minutes a day for one month. The other 50 students are assigned to use a standard puzzle app (a placebo) for the same amount of time. At the end of the month, all 100 students are given the same standardized memory test.

Identify the experimental subjects, the explanatory and response variables, and the treatments. What type of conclusion could be drawn from this study?

Solution:

This is an experiment because the researchers deliberately imposed a treatment (the type of app) on the subjects to measure the response. The experimental subjects are the 100 high school student volunteers. The explanatory variable is the type of app used, and the response variable is the score on the standardized memory test. The treatments are the "brain-training" app and the standard puzzle app. Because this was a randomized experiment, if the memory scores for the brain-training app group are found to be statistically significantly higher than the puzzle app group, the company can conclude that their new app causes an improvement in memory for high school students like the ones in the study.

Common Mistakes to Avoid

  • Assuming Correlation Implies Causation: This is the most critical error in introductory statistics. If you see a study that was not a randomized experiment, you can never make a cause-and-effect conclusion. Always look for potential confounding variables.

  • Confusing Observational Studies with Experiments: An experiment must involve the researchers actively imposing a treatment. If the researchers are just grouping subjects based on their existing characteristics or behaviors (e.g., people who already smoke vs. those who don't), it is an observational study.

  • Using "Experiment" in a Casual Sense: In everyday language, "experiment" can mean "to try something out." In statistics, it has a precise definition: a study where treatments are imposed. Do not call a survey or an observational study an "experiment" on the AP exam.

  • Misidentifying the Population: Be careful to distinguish between the sample (the individuals you have data on) and the population (the larger group you want to generalize your findings to). A conclusion from a study on high school volunteers may not apply to the entire adult population.