Introducing Statistics: Why | AP Stats Unit 6 Study Guide

Quick Summary

This guide introduces the Normal distribution, the most important probability distribution in statistics. You will learn to describe the shape, center, and spread of any Normal curve using its mean and standard deviation. After mastering this lesson, you will be able to apply the Empirical Rule to estimate probabilities and use z-scores to standardize values, allowing you to calculate precise probabilities and compare data from different Normal distributions.

Key Concepts

The Normal distribution is a continuous probability distribution that is fundamental to inferential statistics. It describes how the values of a variable are distributed and is characterized by its symmetric, bell-shaped curve.

Properties of a Normal Distribution:
- Shape: It is unimodal (has one peak) and symmetric about the mean. The curve is often described as bell-shaped.
- Center: The mean (μ), median, and mode are all equal and located at the exact center of the distribution.
- Spread: The spread is determined by the standard deviation (σ). The curve extends indefinitely in both directions, approaching but never touching the horizontal axis (it is asymptotic).
- Area: The total area under any Normal curve is exactly 1, representing 100% of the probability.
Parameters of the Normal Distribution:
The specific shape and location of a Normal curve are defined by two parameters: the population mean (μ) and the population standard deviation (σ). We use the notation N(μ, σ) to describe a Normal distribution with a specific mean and standard deviation.
- Mean (μ): This parameter controls the center or location of the curve. Changing μ shifts the entire curve to the left or right along the horizontal axis without changing its shape.
- Standard Deviation (σ): This parameter controls the spread or variability of the curve. A smaller σ results in a taller, narrower curve, indicating that data points are clustered closely around the mean. A larger σ results in a shorter, wider curve, indicating that data points are more spread out.
[Image: Two Normal curves with the same standard deviation but different means, showing a shift in location. Beside it, two Normal curves with the same mean but different standard deviations, showing a change in spread.]
The Empirical Rule (The 68-95-99.7 Rule):
This rule provides a useful approximation for the percentage of data that falls within a certain number of standard deviations from the mean in a Normal distribution.
- Approximately 68% of the observations fall within 1 standard deviation of the mean (μ ± 1σ).
- Approximately 95% of the observations fall within 2 standard deviations of the mean (μ ± 2σ).
- Approximately 99.7% of the observations fall within 3 standard deviations of the mean (μ ± 3σ).
[Image: A labeled Normal curve showing the 68-95-99.7 rule percentages. The central section from -1σ to +1σ is labeled 68%. The sections from -2σ to +2σ are labeled 95%. The sections from -3σ to +3σ are labeled 99.7%. The individual sections (e.g., between 1σ and 2σ) are also labeled with their respective percentages (13.5%, 2.35%, 0.15%).]
Standardizing with Z-Scores:
Since every Normal distribution is different depending on its μ and σ, we need a standard way to measure and compare values. We do this by standardizing the values, converting them into z-scores.
- Interpretation: A z-score measures how many standard deviations an observation (x) is from the mean (μ).
  - A positive z-score indicates the observation is above the mean.
  - A negative z-score indicates the observation is below the mean.
  - A z-score of 0 indicates the observation is exactly at the mean.
- Formula:
  z = (x - μ) / σ
  Where:
  - x is the data value (observation).
  - μ is the population mean.
  - σ is the population standard deviation.
The Standard Normal Distribution:
When we convert all values from a Normal distribution N(μ, σ) into z-scores, they form a special distribution called the Standard Normal Distribution. This distribution always has a mean of 0 and a standard deviation of 1, denoted as N(0, 1). This allows us to use a single reference distribution to find probabilities for any Normal distribution.

Key Vocabulary

Normal Distribution: A continuous probability distribution characterized by a symmetric, bell-shaped curve. Its shape is determined by the mean and standard deviation.
Mean (μ): A parameter that specifies the center (and point of maximum density) of a Normal distribution.
Standard Deviation (σ): A parameter that specifies the spread or variability of a Normal distribution. It is the distance from the mean to the inflection points of the curve.
Empirical Rule (68-95-99.7 Rule): A rule stating the approximate percentage of data that falls within 1, 2, and 3 standard deviations of the mean in a Normal distribution.
Z-score (Standardized Score): A measure of how many standard deviations a specific data point is from the mean. It is calculated as z = (x - μ) / σ.
Standard Normal Distribution: A specific Normal distribution with a mean of 0 and a standard deviation of 1. It is the distribution of z-scores.

Calculator Tech (TI-84)

For this topic, you will use two primary functions found in the DISTR (distributions) menu.

$n or ma l c df ()$ - Finding an Area/Probability
This function calculates the area (proportion or probability) under a Normal curve between two boundaries.
- Keystrokes:2nd -> $V A RS [D I STR]$ -> 2: normalcdf(
- Syntax: $n or ma l c df (l o w er b o u n d, u pp er b o u n d, μ, σ)$
- Inputs:
  - $l o w er b o u n d$ : The left boundary of the area you want to find. For "less than" problems (e.g., P(X < 50)), use a very small number like -1E99 (enter $- 1$ , then 2nd, then $,$ for EE, then $99$ ).
  - $u pp er b o u n d$ : The right boundary of the area. For "greater than" problems (e.g., P(X > 60)), use a very large number like 1E99.
  - $μ$ : The mean of the distribution.
  - $σ$ : The standard deviation of the distribution.
$in v N or m ()$ - Finding a Value from an Area/Percentile
This function does the reverse of $n or ma l c df ()$ . Given a probability (area), it finds the corresponding x-value (boundary).
- Keystrokes:2nd -> $V A RS [D I STR]$ -> 3: invNorm(
- Syntax: $in v N or m (a re a, μ, σ)$
- Inputs:
  - $a re a$ : CRITICAL: This must be the cumulative area to the LEFT of the value you are looking for. It represents the percentile. If you are given the "top 10%", you must enter $0.90$ as the area.
  - $μ$ : The mean of the distribution.
  - $σ$ : The standard deviation of the distribution.

How to Show Work on the FRQ

When asked to calculate a probability or find a value for a Normal distribution on a Free Response Question, you must clearly communicate your method to earn full credit. A simple calculator answer is not enough. Use the following structure.

Four Key Components for Full Credit:

State the Distribution and Identify Values: Clearly state the distribution and its parameters. Define the variable of interest.
- Example: "Let X be the height of adult males. The distribution of X is Normal with a mean μ = 70 inches and standard deviation σ = 3 inches, or N(70, 3). We want to find the probability that a randomly selected male is taller than 74 inches, P(X > 74)."
Draw and Shade a Picture: Sketch a Normal curve. Label the mean (μ) on the horizontal axis. Mark the value of interest (x) and shade the area corresponding to the probability you are finding. This demonstrates understanding.
Show the Calculation (Standardize): Write the z-score formula, substitute the values, and state the calculated z-score. This is mandatory even if you use $n or ma l c df$ with the original values.
- Example: $z = (x - μ) / σ = (74 - 70) /3 = 1.33$
State the Final Answer with Calculator Command and Context: Write down the calculator command and inputs you used. Then, state your final answer as a complete sentence in the context of the problem.
- Example: "Using normalcdf(lower: 74, upper: 1E99, μ: 70, σ: 3), the probability is 0.0912. There is a 9.12% chance that a randomly selected adult male will be taller than 74 inches."

Practice Problems

Problem 1:

The scores on a standardized test are approximately Normally distributed with a mean of 500 and a standard deviation of 100.

(a) What percentage of students score between 400 and 600?

(b) A student scores 720 on the test. Calculate and interpret their z-score.

Solution:

Let X = the score on the standardized test. The distribution is N(μ=500, σ=100).

(a) Solution using the Empirical Rule:

State: We want to find the percentage of scores between 400 and 600.
Plan: Notice that 400 is one standard deviation below the mean (500 - 100) and 600 is one standard deviation above the mean (500 + 100). We can use the Empirical Rule.
Do: According to the Empirical Rule, approximately 68% of observations fall within 1 standard deviation of the mean.
Conclude: Approximately 68% of students score between 400 and 600 on the test.

(b) Solution for z-score:

State: We need to calculate the z-score for x = 720.
Plan: We will use the z-score formula.
Do: $z = (x - μ) / σ = (720 - 500) /100 = 220/100 = 2.2$
Conclude: The student's z-score is 2.2. This means their score of 720 is 2.2 standard deviations above the average score of 500.

(c) Solution for probability:

State: We want to find P(X < 450). The distribution is N(500, 100).
Plan: We will standardize the value x=450 by finding its z-score and then use the calculator to find the area to the left. We will sketch the curve.
[Image: Sketch of a Normal curve centered at 500. The value 450 is marked to the left, and the area to the left of 450 is shaded.]
Do:
1. Calculate the z-score: $z = (450 - 500) /100 = - 0.5$
2. Use the calculator: normalcdf(lower: -1E99, upper: 450, μ: 500, σ: 100) = 0.3085
Conclude: The probability that a randomly selected student scores below 450 is approximately 0.3085.

Problem 2:

The lifespan of a certain brand of car tires is approximately Normally distributed with a mean of 45,000 miles and a standard deviation of 3,500 miles.

(a) What proportion of tires last longer than 50,000 miles?

(b) The company wants to offer a warranty that covers the tires that fail earliest. If they want to replace no more than the bottom 5% of tires, what mileage should they set for their warranty?

Solution:

Let X = the lifespan of a tire in miles. The distribution is N(μ=45000, σ=3500).

(a) Solution for proportion:

State: We want to find the proportion of tires with a lifespan greater than 50,000 miles, P(X > 50000). The distribution is N(45000, 3500).
Plan: We will standardize the value x=50000 and use the calculator to find the area to the right. We will sketch the curve.
[Image: Sketch of a Normal curve centered at 45,000. The value 50,000 is marked to the right, and the area to the right of 50,000 is shaded.]
Do:
1. Calculate the z-score: $z = (50000 - 45000) /3500 \approx 1.43$
2. Use the calculator: normalcdf(lower: 50000, upper: 1E99, μ: 45000, σ: 3500) = 0.0766
Conclude: The proportion of tires that last longer than 50,000 miles is approximately 0.0766.

(b) Solution for warranty mileage:

State: We need to find the mileage value (x) that corresponds to the 5th percentile (bottom 5%) of the distribution N(45000, 3500).
Plan: We are given an area (0.05) to the left and need to find the corresponding boundary value. We will use the $in v N or m$ function on the calculator. We will sketch the curve.
[Image: Sketch of a Normal curve centered at 45,000. A small area in the far-left tail is shaded, labeled "0.05". The unknown boundary value 'x' is marked.]
Do:
1. Use the calculator: invNorm(area: 0.05, μ: 45000, σ: 3500) = 39,243 miles.
Conclude: To replace no more than the bottom 5% of tires, the company should set its warranty at approximately 39,243 miles.

Common Mistakes to Avoid

Confusing Z-scores with Areas (Probabilities): A z-score is a location on the horizontal axis (e.g., z=1.5), telling you how many standard deviations a value is from the mean. An area is a probability, a value between 0 and 1 (e.g., P(Z < 1.5) = 0.9332). Never say "the z-score is 0.9332."
Misusing the Empirical Rule: The 68-95-99.7 rule is an approximation and only applies to values that are exactly 1, 2, or 3 standard deviations from the mean. Do not try to use it for a z-score of 1.5 or -0.8. For those, you must use technology or a z-table.
$in v N or m$ Area Input Error: The $in v N or m$ function on the TI-84 requires the area to the LEFT of the value you're looking for. If a problem asks for the value that marks the "top 10%," you must use an area of $1 - 0.10 = 0.90$ in the calculator.
Forgetting Context on FRQs: Your final answer must be a sentence that answers the question in the context of the problem. Simply writing "x = 39,243" is not enough. You must state, "The warranty mileage should be 39,243 miles."
Showing No Work for Calculations: On an FRQ, you will not get full credit by just writing down the answer from $n or ma l c df$ . You must show the z-score formula and your substitutions, and it is highly recommended to draw a shaded sketch of the curve. This demonstrates your understanding of the process.

Introducing Statistics: Why Be Normal? - AP Statistics Study Guide