Carrying Out a | AP Stats Unit 9 Study Guide

Quick Summary

This guide will equip you to determine if a linear relationship observed in a sample of bivariate data provides convincing evidence of a true linear relationship in the population. You will learn to perform a complete t-test for the slope of a regression model, from stating hypotheses and verifying the necessary conditions using residual plots, to calculating a test statistic and p-value, and ultimately drawing a conclusion in the context of the problem.

Key Concepts

The primary goal of a t-test for the slope is to assess whether the slope of the population regression line, β (beta), is different from a hypothesized value, almost always zero. If we have strong evidence that β is not zero, we can conclude there is a statistically significant linear relationship between the two variables.

1. The Hypotheses

The null hypothesis typically states that there is no linear relationship, meaning the true population slope is zero. The alternative hypothesis can be two-sided or one-sided.

Null Hypothesis (H₀):β = 0
- In words: There is no linear relationship between the explanatory variable (x) and the response variable (y) in the population.
Alternative Hypothesis (Hₐ):
- β \neq 0 (There is some linear relationship)
- β > 0 (There is a positive linear relationship)
- β < 0 (There is a negative linear relationship)

2. Conditions for Inference (LINER)

To trust the results of our t-test, five conditions must be met. We use the acronym LINER to remember them. The L, N, and E conditions are checked by examining residual plots.

(L) Linear: The true relationship between x and y is linear.
- How to check: Examine the residual plot (residuals vs. x-values). There should be no leftover curved pattern or obvious shape. The points should be randomly scattered above and below the horizontal line at zero.
(I) Independent: Individual observations are independent of each other.
- How to check: If sampling without replacement, verify the 10% condition: the sample size n must be no more than 10% of the population size N (n \le 0.10N).
(N) Normal: For any given x-value, the distribution of y-values (and thus the residuals) is approximately Normal.
- How to check: Create a histogram, boxplot, or Normal probability plot of the residuals. The histogram should be roughly symmetric and unimodal. The Normal probability plot should be roughly linear. This condition is most important for small sample sizes; the test is robust if n \ge 30.
(E) Equal Variance (Homoscedasticity): The standard deviation of the y-values (and thus the residuals) is the same for all x-values.
- How to check: Examine the residual plot. There should be no fanning or megaphone shape. The vertical spread of the points should be roughly the same across all x-values.
(R) Random: The data come from a well-designed random sample or a randomized experiment.
- How to check: This is stated in the problem description.

[Image: A set of four graphs. 1. A good residual plot showing random scatter. 2. A residual plot showing a curved pattern (violates L). 3. A residual plot showing a megaphone shape (violates E). 4. A histogram of residuals that is roughly symmetric and unimodal (supports N).]

3. The Test Statistic and P-value

If the conditions are met, we can calculate a test statistic. This statistic measures how many standard errors the sample slope $b$ is from the hypothesized slope β₀ (which is usually 0).

Formula for the Test Statistic:
$t = (s t a t i s t i c - p a r am e t er) / (s t an d a r d erroro f s t a t i s t i c)$
$t = (b - β_{0}) / S E_{b}$
When testing H₀: β = 0, this simplifies to: $t = b / S E_{b}$
Key Components:
- b: The slope calculated from the sample data.
- SE_b: The standard error of the slope. This value estimates the variability of the sample slope $b$ in repeated sampling. It is almost always provided in computer output.
- Degrees of Freedom (df): For inference about regression, df = n - 2, where n is the number of data pairs. We lose one degree of freedom for estimating the intercept and another for estimating the slope.
Finding the P-value: The p-value is the probability of getting a test statistic as extreme or more extreme than the one observed, assuming the null hypothesis is true. We use the t-distribution with n-2 degrees of freedom to find this probability.

4. Reading Computer Output

On the AP exam, you will almost never calculate $S E_{b}$ by hand. Instead, you will be given computer regression output. You must be able to locate the key values.

[Image: Sample computer regression output table.]

Sample Computer Output:

`` $P re d i c t or C oe f SEC oe f TPC o n s t an t 2.1000.8612.440.026 X - Va r iab l e 0.5500.1254.400.000 S = 1.998 R - Sq = 56.4$ ` $* **$b` (slope):** The "Coef" for the "X-Variable" row. Here, **b = 0.550**. * **`SE_b` (standard error of slope):** The "SE Coef" for the "X-Variable" row. Here, **SE_b = 0.125**. * **`t` (test statistic):** The "T" for the "X-Variable" row. Here, **t = 4.40**. Note that $b / SE_b = 0.550 / 0.125 = 4.40$ .

** $P` (p-value):** The "P" for the "X-Variable" row. Here, **P \approx 0.000**. This is the two-sided p-value for testing H₀: β = 0. ## Key Vocabulary - **True Slope (β):** The parameter representing the actual rate of change in the mean response variable for a one-unit increase in the explanatory variable for the entire population. - **Sample Slope (b):** The statistic calculated from sample data that estimates the true slope (β). - **Standard Error of the Slope (SE_b):** An estimate of the standard deviation of the sampling distribution of the sample slope, $b$ . It measures the typical amount that $b$ varies from the true slope $β`. - **t-test for Slope:** A significance test used to determine if there is convincing statistical evidence of a linear relationship between two quantitative variables. - **Residual:** The vertical distance between an observed data point and the regression line; calculated as $residual = observed y - predicted y$ or $e = y - ŷ`. - **Residual Plot:** A scatterplot of the residuals against the explanatory variable (or predicted values). It is the primary tool for checking the L, N, and E conditions for regression inference. - **Degrees of Freedom (df):** For a t-test for the slope of a regression line, the degrees of freedom are $n - 2$ , where $n$ is the sample size.

Calculator Tech (TI-84)

The $LinRegTTest` function performs a complete t-test for the slope from raw data. **Scenario:** You have your explanatory variable data in L1 and your response variable data in L2. **Keystrokes:** 1. Enter your data: `STAT -> 1:Edit...` and type your x-values into L1 and y-values into L2. 2. Access the test: `STAT -> TESTS -> F:LinRegTTest...` 3. Fill in the inputs: * **Xlist:** L1 * **Ylist:** L2 * **Freq:** 1 (unless you have a frequency list) * **β & ρ:** Select the form of your alternative hypothesis ($ \neq 0$, $< 0$ , or > 0). This is a crucial step.

*   **RegEQ:** (Optional) To store the regression equation into Y1, press `VARS -> Y-VARS -> 1:Function... -> 1:Y1`.

*   **Calculate:** Highlight and press `ENTER`.

Output Screen:

The calculator will display:

The alternative hypothesis you selected (`y=a+bx$, $β & ρ \neq 0`) * **t =** [your test statistic] * **p =** [your p-value] * **df =** [your degrees of freedom, n-2] * **a =** [the y-intercept of the sample regression line] * **b =** [the slope of the sample regression line] * **s =** [the standard deviation of the residuals] * **r^2 =** [the coefficient of determination] * **r =** [the correlation coefficient] ## How to Show Work on the FRQ To receive full credit for an inference question on the AP exam, you must use the four-step **State-Plan-Do-Conclude** process. **Template for a t-test for Slope:** **STATE:** * **Parameter:** Define β in context. "Let β be the true slope of the population regression line relating [response variable, y] to [explanatory variable, x]." * **Hypotheses:** State the null and alternative hypotheses using symbols and words. * H₀: β = 0 (There is no linear relationship between [y] and [x].) * Hₐ: β \neq 0 (There is a linear relationship between [y] and [x].) * **Significance Level:** State the alpha level, α. "We will use a significance level of α = 0.05." (If not given, choose 0.05). **PLAN:** * **Procedure:** Name the test. "We will perform a t-test for the slope of a regression line." * **Conditions:** Check the **LINER** conditions. * **Linear:** "The residual plot shows a random scatter of points with no leftover pattern, so the linear condition is met." * **Independent:** "Assuming the sample of [n subjects] is less than 10% of all possible [subjects], the observations are independent." * **Normal:** "The histogram of residuals is roughly symmetric and unimodal (or the Normal probability plot of residuals is roughly linear), so the Normal condition is met." * **Equal Variance:** "The residual plot shows a similar amount of vertical scatter for all x-values (no fanning), so the equal variance condition is met." * **Random:** "The problem states the data came from a random sample." **DO:** * **Identify Values:** List the key statistics from the computer output or calculator. * Sample slope: `b = ...` * Standard error of the slope: `SE_b = ...` * **Calculate Test Statistic:** Write the formula and substitute the values. * `t = b / SE_b = [value] / [value] = ...` * **Find P-value:** * Degrees of freedom: `df = n - 2 = ...` * P-value: $p = ...$ (from computer output or $tcdf()` on calculator) **CONCLUDE:** * **Decision:** Compare the p-value to α. "Because the p-value of [p] is less than (or greater than) α = [α], we reject (or fail to reject) the null hypothesis." * **Conclusion in Context:** State your conclusion in the context of the problem, addressing the alternative hypothesis. "We have (or do not have) convincing evidence of a significant linear relationship between [response variable] and [explanatory variable] for the population of [population in context]." ## Practice Problems **Problem 1:** A real estate agent wants to investigate the relationship between the size of a house (in square feet) and its selling price (in thousands of dollars). She collects data from a random sample of 15 recently sold houses in a large suburb and produces the following computer output. A residual plot showed no obvious pattern. A histogram of the residuals was roughly symmetric and unimodal. ``$

Predictor Coef SE Coef T P

Constant 90.25 10.50 8.60 0.000

Size (sq ft) 0.125 0.024 5.21 0.000

S = 12.45 R-Sq = 67.3%

$`` Is there convincing evidence at the α = 0.05 level of a positive linear relationship between the size of a house and its selling price in this suburb? **Solution:** **STATE:** * **Parameter:** Let β be the true slope of the population regression line relating selling price (in thousands of dollars) to the size of a house (in square feet) for all recently sold houses in this suburb. * **Hypotheses:** * H₀: β = 0 (There is no linear relationship between size and selling price.) * Hₐ: β > 0 (There is a positive linear relationship between size and selling price.) * **Significance Level:** We will use α = 0.05. **PLAN:** * **Procedure:** We will perform a t-test for the slope of a regression line. * **Conditions:** * **Linear:** The problem states the residual plot showed no obvious pattern. * **Independent:** The sample of 15 houses is likely less than 10% of all recently sold houses in a large suburb. * **Normal:** The problem states the histogram of residuals was roughly symmetric and unimodal. * **Equal Variance:** We assume this condition is met as no fanning was mentioned in the residual plot description. * **Random:** The problem states the data came from a random sample of houses. **DO:** * **Identify Values:** From the computer output for the "Size (sq ft)" predictor: * Sample slope: `b = 0.125` * Standard error of the slope: `SE_b = 0.024` * **Calculate Test Statistic:** * `t = b / SE_b = 0.125 / 0.024 = 5.21` * **Find P-value:** * Degrees of freedom: `df = n - 2 = 15 - 2 = 13` * The computer output provides a two-sided p-value of 0.000. Since our test is one-sided (Hₐ: β > 0) and our t-statistic is positive, the one-sided p-value is half of the two-sided p-value: `p = 0.000 / 2 \approx 0`. **CONCLUDE:** * **Decision:** Because the p-value of \approx 0 is less than α = 0.05, we reject the null hypothesis. * **Conclusion in Context:** We have convincing evidence of a positive linear relationship between the size of a house in square feet and its selling price for the population of all recently sold houses in this suburb. --- **Problem 2:** A student wonders if there is a linear relationship between the number of hours a student sleeps the night before a test and their score on the test. They collect data from 8 randomly selected classmates. | Sleep (hours) | 5 | 7 | 8 | 6.5 | 9 | 4 | 7.5 | 6 | |---------------|----|----|----|-----|----|----|-----|----| | Score (%) | 65 | 82 | 88 | 78 | 94 | 58 | 85 | 71 | Perform a significance test to determine if there is evidence of a linear relationship between sleep and test score. **Solution:** **STATE:** * **Parameter:** Let β be the true slope of the population regression line relating test score (%) to the number of hours of sleep. * **Hypotheses:** * H₀: β = 0 (There is no linear relationship between hours of sleep and test score.) * Hₐ: β \neq 0 (There is a linear relationship between hours of sleep and test score.) * **Significance Level:** We will use α = 0.05. **PLAN:** * **Procedure:** We will perform a t-test for the slope of a regression line. * **Conditions:** We will enter the data into a calculator (Sleep in L1, Score in L2) and check the conditions using a residual plot and a histogram of the residuals. * **Linear:** A residual plot generated on the calculator shows a random scatter of points. * **Independent:** It is reasonable to assume these 8 students are less than 10% of all students. * **Normal:** With n=8, we must be cautious. A histogram of the residuals shows no strong skew or outliers. * **Equal Variance:** The residual plot shows a similar vertical spread across all x-values. * **Random:** The problem states the students were randomly selected. **DO:** * Using the TI-84 $LinRegTTest$ function with L1 and L2 and $β & ρ \neq 0`: * **Test Statistic:** `t = 6.42` * **P-value:** `p = 0.0006` * **Degrees of Freedom:** `df = n - 2 = 8 - 2 = 6` * **Sample Slope:** `b = 6.85` **CONCLUDE:** * **Decision:** Because the p-value of 0.0006 is less than α = 0.05, we reject the null hypothesis. * **Conclusion in Context:** We have convincing evidence of a significant linear relationship between the number of hours a student sleeps and their score on the test for the population of all students. ## Common Mistakes to Avoid - **Hypotheses about $b$ instead of $β`:** Hypotheses are always about population parameters (like β), not sample statistics (like b). Never write H₀: b = 0.

Checking Conditions on Raw Data: The Linear, Normal, and Equal Variance conditions must be checked using the residuals, not the original x or y data. Always create and reference a residual plot.
Using Incorrect Degrees of Freedom: For regression inference, the degrees of freedom are always df = n - 2. A common mistake is to use n-1, which is for inference on a single mean.
Misinterpreting the P-value: The p-value is calculated assuming H₀ is true. A small p-value means your observed sample slope is unlikely to occur by chance if there's truly no relationship. It does not mean there is a small probability that H₀ is true.
Claiming Causation: Rejecting H₀ only provides evidence of a linear association. Unless the data came from a randomized, controlled experiment, you cannot conclude that changes in the explanatory variable cause changes in the response variable.

Carrying Out a Test for the Slope of a Regression Model - AP Statistics Study Guide

Quick Summary

Key Concepts

Calculator Tech (TI-84)