PrepGo

Introduction to Planning a Study - AP Statistics Study Guide

Written by AP Content Team, Verified for 2026 AP Exams, Last updated: May 2026

Learn with study guides reviewed by top AP teachers. This guide takes about 20 minutes to read.

Quick Summary

This lesson introduces the foundational principles of data collection, enabling you to differentiate between an observational study and an experiment. You will learn that only a well-designed experiment, which involves the deliberate imposition of treatments and random assignment, can establish a cause-and-effect relationship. Understanding the critical roles of random sampling for generalizability and random assignment for causal inference is the key to evaluating the validity of statistical conclusions.

Key Concepts

  • Observational Study vs. Experiment: This is the most fundamental distinction in data collection.

    • An observational study observes individuals and measures variables of interest but does not attempt to influence the responses. Researchers are passive observers. For example, a study that tracks a group of smokers and a group of non-smokers over 20 years to compare lung cancer rates is observational. The researchers did not assign people to be smokers.

    • An experiment deliberately imposes some treatment on individuals to measure their responses. Researchers are active participants who manipulate a variable to see its effect. For example, a study where 100 volunteers are randomly assigned to either a new medication or a placebo to see if the medication lowers blood pressure is an experiment.

  • Why Experiments are Necessary for Causation:

    • Observational studies can reveal an association between two variables, but they cannot prove a cause-and-effect relationship.

    • This is because of confounding variables. A confounding variable is a variable that is related to both the explanatory variable (the one being studied) and the response variable, making it impossible to untangle their effects on the response.

    • Classic Example: A study might find a strong positive association between ice cream sales and the number of drowning deaths. Does eating ice cream cause drowning? No. The confounding variable is temperature. When it's hot, people buy more ice cream, and people also go swimming more, leading to more drownings. Temperature is associated with both ice cream sales (explanatory) and drownings (response).

    • [Image: A diagram showing a confounding variable (e.g., Temperature) with arrows pointing to both the explanatory variable (Ice Cream Sales) and the response variable (Drowning Deaths).]

  • The Language of Experiments:

    • Explanatory Variable (or Factor): The variable that is intentionally manipulated by the researchers.

    • Response Variable: The variable that measures the outcome of the study. It is the result that is measured and compared.

    • Treatments: The specific conditions applied to the individuals in an experiment. An experiment may have one or more treatments.

    • Experimental Units: The individuals (people, animals, objects) to which the treatments are applied. When the units are human beings, they are often called subjects.

  • The Two Critical Roles of Randomness: Randomness is not used haphazardly; it has two distinct and vital purposes in statistics.

    • Random Sampling (Surveys/Studies): This involves using a chance process to determine which members of a population are included in the sample.

      • Purpose: To create a representative sample.

      • Conclusion: If random sampling is used, the results of the study can be generalized to the population from which the sample was drawn.

    • Random Assignment (Experiments): This is used only in experiments to assign experimental units to treatment groups using a chance process.

      • Purpose: To create treatment groups that are as similar as possible at the start of the experiment, balancing out the effects of potential confounding variables.

      • Conclusion: If random assignment is used, we can make a cause-and-effect conclusion, attributing differences in the response variable to the treatments.

  • Scope of Conclusions: The type of study design determines the conclusions we can draw.

    • [Image: A 2x2 grid titled "Scope of Conclusions". Rows are "Random Assignment (Yes/No)" and Columns are "Random Sampling (Yes/No)".

    • Cell 1 (Yes RA, Yes RS): Causal conclusion, generalizable to the whole population. (The gold standard).

    • Cell 2 (Yes RA, No RS): Causal conclusion, but only for the subjects in the experiment. Not generalizable.

    • Cell 3 (No RA, Yes RS): No causal conclusion (only association), but results are generalizable to the population. (Typical for polls and surveys).

    • Cell 4 (No RA, No RS): No causal conclusion (only association), and results are not generalizable. (Results are limited to the participants).]

Key Vocabulary

  • Observational Study: A study in which researchers observe subjects and measure variables without assigning treatments or intervening. It can show association but not causation.

  • Experiment: A study in which researchers deliberately impose treatments on subjects and then measure their responses to establish a cause-and-effect relationship.

  • Confounding: Occurs when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other. This is the primary reason observational studies cannot prove causation.

  • Explanatory Variable: A variable that is intentionally manipulated or observed in a study to see if it brings about a change in another variable.

  • Response Variable: A variable that measures the outcome of a study.

  • Random Sampling: The process of using chance to select a sample from a population. Its purpose is to allow results to be generalized to the larger population.

  • Random Assignment: The process of using chance to assign experimental units to different treatment groups. Its purpose is to create balanced groups to allow for cause-and-effect conclusions.

Calculator Tech (TI-84)

No major calculator functions are required for this topic. However, the random integer generator can be useful for demonstrating or carrying out a random process.

To generate random integers to assign subjects to groups:

  1. Press MATH.

  2. Arrow over to the PRB (Probability) menu.

  3. Select 5: randInt(.

  4. The syntax is .

    • Example: To randomly assign 20 subjects (labeled 1-20) to two groups of 10, you could use . The first 10 unique numbers generated would be assigned to Treatment 1, and the remaining subjects would go to Treatment 2.

How to Show Work on the FRQ

On the AP exam, you will be asked to identify study types and, more importantly, to describe how to implement a study. Your description must be clear, detailed, and replicable.

Template for Describing a Completely Randomized Experiment:

Your description must clearly address these four components in context:

  1. Subjects/Units & Random Assignment: Start with your subjects and describe the random assignment process in detail.

    • Script: "First, obtain a list of all [number] [subjects/experimental units]. For each subject, we will [describe a random process]. For example, assign each subject a unique number from 1 to [N]. Then, use a random number generator to select [n] unique numbers between 1 and [N]. The subjects corresponding to these numbers will be assigned to the first treatment group."
  2. Treatments: Clearly state what each group will do or receive. Be specific.

    • Script: "The first group of [n] subjects will receive [describe Treatment 1]. The second group of [N-n] subjects will receive [describe Treatment 2, which could be a placebo or the standard treatment]."
  3. Control Other Variables: Briefly mention that other variables should be kept the same for all groups.

    • Script: "All other conditions (e.g., diet, environment, daily routine) will be kept as consistent as possible for all subjects throughout the experiment."
  4. Response Variable & Comparison: State what you will measure at the end and what you will compare.

    • Script: "After a period of [time], we will measure the [name of the response variable] for each subject. Finally, we will compare the average [response variable] between the [Treatment 1 group] and the [Treatment 2 group] to see if there is a statistically significant difference."

Practice Problems

Problem 1: A local health department wants to investigate the relationship between regular exercise and stress levels in high school students. They survey 500 students at a large high school. Each student is asked whether they exercise regularly (at least 3 times per week) and to rate their current stress level on a scale of 1 to 10. The study found that students who exercise regularly have, on average, a lower stress level than students who do not.

(a) Is this an observational study or an experiment? Explain.

(b) Can the health department conclude that regular exercise causes a reduction in stress? Explain why or why not.

Solution:

(a) This is an observational study. The researchers did not impose any treatments on the students; they did not assign students to an exercise group or a non-exercise group. They simply gathered data on the students' existing exercise habits and stress levels.

(b) The health department cannot conclude that regular exercise causes a reduction in stress. Because this is an observational study and not an experiment, there could be confounding variables. For example, students who have more free time might be more likely to exercise and also have lower stress levels because they have fewer demanding commitments. In this case, the amount of free time is a confounding variable, as it is associated with both exercise habits (the explanatory variable) and stress level (the response variable), making it impossible to determine if the exercise itself is responsible for the lower stress.

Problem 2: A company has developed a new type of fertilizer that they believe increases the yield of tomato plants. They have 60 identical tomato seedlings to use in a study. Design a completely randomized experiment to test the effectiveness of the new fertilizer compared to their current standard fertilizer.

Solution:

First, we will take the 60 tomato seedlings and assign each a unique identification number from 1 to 60. To perform the random assignment, we will use a random number generator to select 30 unique integers from 1 to 60. The 30 seedlings corresponding to these numbers will be assigned to the new fertilizer group. The remaining 30 seedlings will be assigned to the standard fertilizer group. The first group will be treated with the new fertilizer according to its instructions. The second group will be treated with the standard fertilizer. All other conditions, such as the amount of water, sunlight, and soil type, will be kept identical for all 60 plants to avoid confounding. After 90 days, we will harvest all the tomatoes from each plant and measure the total weight of tomatoes (the yield) for each plant. Finally, we will compare the average yield of the new fertilizer group to the average yield of the standard fertilizer group to determine if the new fertilizer causes a statistically significant increase in tomato yield.

Common Mistakes to Avoid

  • Confusing Observational Studies with Experiments: The single deciding factor is: Was a treatment deliberately imposed by the researchers? If researchers just gather data on existing conditions (like in a survey), it's an observational study. If they actively do something to the subjects (like give them a pill or make them follow a diet), it's an experiment.

  • Mixing Up Random Sampling and Random Assignment: These are two different processes with two different goals.

    • Random Sampling -> Generalize results to a population.

    • Random Assignment -> Establish Cause and Effect.

    You can have one without the other. An experiment on volunteers uses random assignment but not random sampling, so you can claim causation but can't generalize the results beyond the volunteers.

  • Vague or Incomplete Descriptions of Experiments: When asked to design an experiment, simply saying "randomly assign them to groups" is not enough. You must describe how you will randomize (e.g., "put names in a hat," "use a random number generator"). You must also explicitly state what you will measure (the response variable) and compare at the end.

  • Identifying a Lurking Variable that is NOT Confounding: A variable is only confounding if it is associated with both the explanatory and response variables. For example, in the exercise/stress study, a student's hair color is a lurking variable, but it's not confounding because it's unlikely to be associated with both their exercise habits and their stress level.