Combining Random Variables | AP Stats Unit 4 Study Guide

Quick Summary

This guide covers the rules for combining two or more independent random variables. You will learn how to calculate the resulting mean and standard deviation when you add or subtract random variables. Mastering this topic will allow you to determine the parameters of a new distribution created from others and calculate probabilities associated with it, a key skill for understanding variability in real-world scenarios.

Key Concepts

When we have two or more random variables, we can create a new random variable by performing arithmetic operations on them, most commonly addition and subtraction. The key is to understand how the mean (center) and standard deviation (spread) of the new variable are derived from the original ones.

1. Rules for Means (μ)

The rules for combining means are straightforward and intuitive. Whether you are adding or subtracting random variables, their means behave in the same way.

Sum of Random Variables: The mean of the sum of two random variables is the sum of their individual means.
- Formula: E(X + Y) = E(X) + E(Y)
- Notation: μ_(X+Y) = μ_X + μ_Y
Difference of Random Variables: The mean of the difference of two random variables is the difference of their individual means.
- Formula: E(X - Y) = E(X) - E(Y)
- Notation: μ_(X-Y) = μ_X - μ_Y

Example: Let X be the time it takes a student to complete the multiple-choice section of a test, with μ_X = 55 minutes. Let Y be the time to complete the free-response section, with μ_Y = 30 minutes. The mean time for the total test (T = X + Y) is:

μ_T = μ_X + μ_Y = 55 + 30 = 85 minutes.

2. Rules for Variances (σ^2) and Standard Deviations (σ)

The rules for combining measures of spread are less intuitive and require a critical condition.

The Independence Condition: These rules only apply if the random variables are independent. This means the outcome of one variable does not influence the outcome of the other. In practice problems, you must always confirm that this condition is met or can be reasonably assumed.
Rule for Variances: If X and Y are independent random variables, the variance of their sum OR difference is the sum of their individual variances.
- Formula (Sum): Var(X + Y) = Var(X) + Var(Y)
- Formula (Difference): Var(X - Y) = Var(X) + Var(Y)
- Notation (Sum): σ^2_(X+Y) = σ^2_X + σ^2_Y
- Notation (Difference): σ^2_(X-Y) = σ^2_X + σ^2_Y
Crucial Point: Notice that variances always add. Whether you are combining variables with addition or subtraction, the new variance is always the sum of the old variances. This is because both variables contribute to the overall uncertainty or variability of the outcome. Subtracting variables does not cancel out their variability.
Rule for Standard Deviations: You cannot add or subtract standard deviations directly. You must first convert them to variances, apply the addition rule, and then convert the result back to standard deviation by taking the square root.
- Formula: σ_(X±Y) = √[Var(X) + Var(Y)] = √[σ^2_X + σ^2_Y]

Example (continued): Suppose the standard deviation for the multiple-choice section is σ_X = 5 minutes, and for the free-response is σ_Y = 4 minutes. Assume the times are independent. To find the standard deviation of the total test time (T = X + Y):

Find the variances:
- σ^2_X = (5)^2 = 25
- σ^2_Y = (4)^2 = 16
Add the variances:
- σ^2_T = σ^2_X + σ^2_Y = 25 + 16 = 41
Take the square root:
- σ_T = √41 \approx 6.403 minutes

3. Combining Normal Random Variables

A powerful extension of these rules applies when the original variables are Normally distributed.

The Resulting Distribution: If X and Y are independent Normal random variables, then their sum (X + Y) and their difference (X - Y) are also Normally distributed.
Putting It All Together: If X ~ N(μ_X, σ_X) and Y ~ N(μ_Y, σ_Y) are independent, then:
- Sum (T = X + Y): The total T is Normally distributed with:
  - Mean: μ_T = μ_X + μ_Y
  - Standard Deviation: σ_T = √[σ^2_X + σ^2_Y]
  - So, T ~ N(μ_X + μ_Y, √[σ^2_X + σ^2_Y])
- Difference (D = X - Y): The difference D is Normally distributed with:
  - Mean: μ_D = μ_X - μ_Y
  - Standard Deviation: σ_D = √[σ^2_X + σ^2_Y]
  - So, D ~ N(μ_X - μ_Y, √[σ^2_X + σ^2_Y])

[Image: A diagram showing two separate Normal distribution curves, N(10, 2) and N(15, 3). An arrow points from them to a new, wider Normal distribution curve labeled "Sum: N(25, 3.61)", illustrating that the new mean is the sum of the old means and the new curve is more spread out.]

This property is extremely useful because it allows us to calculate probabilities for the combined variable using standard Normal distribution techniques (like $n or ma l c df$ on a calculator).

Key Vocabulary

Random Variable: A variable whose numerical value is determined by the outcome of a random event.
Mean of a Random Variable (Expected Value): The theoretical long-run average value of a random variable, denoted μ or E(X).
Variance of a Random Variable: The average of the squared differences from the Mean. It measures the spread of the distribution, denoted σ^2 or Var(X).
Standard Deviation of a Random Variable: The square root of the variance, representing the typical distance of an outcome from the mean, denoted σ.
Independent Random Variables: Two random variables for which knowing the value of one does not help predict the value of the other.
Normal Distribution: A continuous probability distribution that is symmetric and bell-shaped, described by its mean (μ) and standard deviation (σ).

Calculator Tech (TI-84)

While the rules for combining means and variances are applied by hand, the TI-84 is essential for finding probabilities once you have determined the parameters of the new, combined Normal distribution.

To find the probability for a combined Normal random variable:

Use the $n or ma l c df ()$ function.

2nd -> VARS [DISTR] -> 2: normalcdf()

Syntax: $n or ma l c df (l o w er, u pp er, m e an, s t an d a r d_{d} e v)$

$l o w er$ : The lower bound of the interval for which you are finding the probability. Use -1E99 (a very small number) for negative infinity.
$u pp er$ : The upper bound of the interval. Use 1E99 (a very large number) for positive infinity.
$m e an$ : The new mean you calculated for the sum or difference (e.g., μ_(X+Y) or μ_(X-Y)).
$s t an d a r d_{d} e v$ : The new standard deviation you calculated for the sum or difference (e.g., σ_(X+Y) or σ_(X-Y)).

Example: Suppose the total test time T = X + Y is Normally distributed with a mean of 85 minutes and a standard deviation of 6.403 minutes. To find the probability a student takes between 80 and 90 minutes:

$l o w er$ : 80
$u pp er$ : 90
$m e an$ : 85
$s t an d a r d_{d} e v$ : 6.403
Keystrokes: $n or ma l c df (80, 90, 85, 6.403)$ which gives approximately 0.566.

How to Show Work on the FRQ

To receive full credit on Free Response Questions involving combining random variables, you must clearly communicate your process. Use the following four-step structure for calculations.

Step 1: Define Variables and the Combination

Clearly define each random variable in context (e.g., "Let A = the weight of a randomly selected apple...").
Define the new combined random variable (e.g., "Let T = A + B be the total weight of two apples.").

Step 2: State and Check Conditions

Independence: State that the calculations require the variables to be independent. Justify this assumption based on the problem context (e.g., "The weights of two randomly selected apples can be assumed to be independent.").

Step 3: Calculate the New Mean

State the appropriate formula for the mean (sum or difference).
Substitute the values and calculate the result, including units.
Example: μ_T = μ_A + μ_B = 8 + 8 = 16 ounces.

Step 4: Calculate the New Standard Deviation

Show the variance calculation first. This is a critical step for scoring. State the formula for variance.
Substitute the variances (or squared standard deviations) and calculate the new variance.
Example: σ^2_T = σ^2_A + σ^2_B = (0.5)^2 + (0.5)^2 = 0.25 + 0.25 = 0.50.
Take the square root of the new variance to find the new standard deviation, including units.
Example: σ_T = √0.50 \approx 0.707 ounces.

Step 5 (If asked for a probability): Describe the Distribution and Calculate

If the original variables were Normal, state that the combined variable is also Normal and specify its parameters.
Example: "The total weight T is Normally distributed with a mean of 16 oz and a standard deviation of 0.707 oz, or T ~ N(16, 0.707)."
Write down the probability statement and the calculator command with labeled inputs.
Example: P(T > 17) = normalcdf(lower: 17, upper: 1E99, μ: 16, σ: 0.707).

Practice Problems

Problem 1:

A company produces bags of sugar. The weight of sugar in a single bag, S, is Normally distributed with a mean of 5.1 pounds and a standard deviation of 0.1 pounds. The bags are packed in boxes that each contain two bags of sugar. The weight of an empty box, B, is Normally distributed with a mean of 1.2 pounds and a standard deviation of 0.2 pounds. Assume the weights of the bags and the box are independent.

What is the probability that a randomly selected, fully packed box weighs more than 11.5 pounds?

Solution:

Step 1: Define Variables and the Combination

Let S1 be the weight of the first bag of sugar, with S1 ~ N(5.1, 0.1).
Let S2 be the weight of the second bag of sugar, with S2 ~ N(5.1, 0.1).
Let B be the weight of the empty box, with B ~ N(1.2, 0.2).
We are interested in the total weight, T = S1 + S2 + B.

Step 2: State and Check Conditions

Independence: The problem states that the weights are independent. This allows us to add their variances.
Normality: All variables are stated to be Normally distributed, so their sum will also be Normally distributed.

Step 3: Calculate the New Mean

The mean of the total weight is the sum of the individual means.
μ_T = μ_S1 + μ_S2 + μ_B = 5.1 + 5.1 + 1.2 = 11.4 pounds.

Step 4: Calculate the New Standard Deviation

First, we find the new variance by adding the individual variances.
σ^2_T = σ^2_S1 + σ^2_S2 + σ^2_B
σ^2_T = (0.1)^2 + (0.1)^2 + (0.2)^2 = 0.01 + 0.01 + 0.04 = 0.06.
Now, we take the square root to find the standard deviation.
σ_T = √0.06 \approx 0.245 pounds.

Step 5: Describe the Distribution and Calculate Probability

The total weight T is Normally distributed with a mean of 11.4 pounds and a standard deviation of 0.245 pounds. So, T ~ N(11.4, 0.245).
We need to find the probability that the total weight is more than 11.5 pounds, P(T > 11.5).
P(T > 11.5) = normalcdf(lower: 11.5, upper: 1E99, μ: 11.4, σ: 0.245)
P(T > 11.5) \approx 0.341.
There is approximately a 34.1% chance that a randomly selected full box weighs more than 11.5 pounds.

Problem 2:

At a local high school, the heights of male students (M) are approximately Normally distributed with a mean of 69 inches and a standard deviation of 3 inches. The heights of female students (F) are approximately Normally distributed with a mean of 64 inches and a standard deviation of 2.5 inches. If one male and one female student are chosen at random, what is the probability that the male student is taller than the female student?

Solution:

Step 1: Define Variables and the Combination

Let M be the height of a randomly selected male student, with M ~ N(69, 3).
Let F be the height of a randomly selected female student, with F ~ N(64, 2.5).
We want to find the probability that the male is taller than the female, which is P(M > F). This is equivalent to finding the probability that the difference in their heights is positive: P(M - F > 0).
Let D = M - F be the difference in their heights.

Step 2: State and Check Conditions

Independence: Since the students are chosen at random, their heights can be assumed to be independent.
Normality: Both M and F are stated to be Normally distributed, so their difference, D, will also be Normally distributed.

Step 3: Calculate the New Mean

The mean of the difference is the difference of the means.
μ_D = μ_M - μ_F = 69 - 64 = 5 inches.

Step 4: Calculate the New Standard Deviation

First, we find the new variance by adding the individual variances (variances always add!).
σ^2_D = σ^2_M + σ^2_F
σ^2_D = (3)^2 + (2.5)^2 = 9 + 6.25 = 15.25.
Now, we take the square root to find the standard deviation.
σ_D = √15.25 \approx 3.905 inches.

Step 5: Describe the Distribution and Calculate Probability

The difference in heights D is Normally distributed with a mean of 5 inches and a standard deviation of 3.905 inches. So, D ~ N(5, 3.905).
We need to find the probability that the difference is greater than 0, P(D > 0).
P(D > 0) = normalcdf(lower: 0, upper: 1E99, μ: 5, σ: 3.905)
P(D > 0) \approx 0.899.
There is approximately an 89.9% chance that a randomly selected male student is taller than a randomly selected female student.

Common Mistakes to Avoid

Adding/Subtracting Standard Deviations Directly: This is the most common error. You cannot add or subtract standard deviations (e.g., σ_(X+Y) \neq σ_X + σ_Y). You must always convert to variances (square them), add the variances, and then take the square root of the result.
Subtracting Variances for a Difference: When calculating the variance of a difference (X - Y), many students are tempted to subtract the variances (σ^2_X - σ^2_Y). This is incorrect. Variability always increases when combining variables, so variances always add: σ^2_(X-Y) = σ^2_X + σ^2_Y.
Forgetting the Independence Condition: The rules for combining variances and standard deviations are only valid if the variables are independent. On an FRQ, you must explicitly state and check (or assume) this condition to get full credit.
Forgetting to Take the Square Root: A simple arithmetic error. After correctly adding the variances to get σ^2_new, students sometimes forget the final step of taking the square root to find the standard deviation, σ_new. Always double-check your final step.

Combining Random Variables - AP Statistics Study Guide

Quick Summary

Key Concepts

1. Rules for Means (μ)

2. Rules for Variances (σ^2) and Standard Deviations (σ)

3. Combining Normal Random Variables

Key Vocabulary

Calculator Tech (TI-84)

How to Show Work on the FRQ

Practice Problems

Common Mistakes to Avoid