Quick Summary
This guide will equip you to construct and interpret a confidence interval for the true slope of a regression model. You will learn how to use this interval to make a statistically sound judgment about whether a significant linear relationship exists between two quantitative variables. By the end of this lesson, you will be able to move beyond simply describing a relationship in a sample to making a formal inference about the relationship in the entire population.
Key Concepts
When we perform a least-squares regression on a sample of data, we get a sample slope, b. This is our best estimate for the true slope, β, of the population regression line. However, a different sample would likely produce a different sample slope. A confidence interval provides a range of plausible values for the true slope, β, based on our single sample.
The Formula for a Confidence Interval for the Slope
The general formula for any confidence interval is:
Statistic ± (Critical Value) × (Standard Error of Statistic)
For the slope of a regression line, this becomes:
b ± t(SE_b)*
Let's break down each component:
b (Sample Slope): This is the slope of the least-squares regression line calculated from your sample data. It's your point estimate for the true slope, β.
t* (Critical Value): This is a value from a t-distribution. Unlike with means or proportions, for regression inference we use degrees of freedom (df) = n - 2, where n is the sample size. We use n-2 because we are estimating two parameters from the data: the slope (β) and the y-intercept (α). You find t* using a t-table or your calculator's function for a given confidence level.
SE_b (Standard Error of the Slope): This value estimates the standard deviation of the sampling distribution of the slope. In simpler terms, it measures how much the sample slope, , typically varies from the true slope, , if we were to take many different samples. On the AP Exam, you will almost never calculate SE_b by hand. It will be provided in computer regression output.
[Image: A standard computer regression output table. The row for the explanatory variable is highlighted, showing the columns for "Coefficient" (which contains ) and "SE Coef" (which contains ).]
Conditions for Inference (LINER)
To ensure our confidence interval is valid, we must check five conditions. Use the acronym LINER:
L - Linear: The true relationship between the explanatory variable (x) and the response variable (y) is linear.
- How to Check: Examine a scatterplot of the data for a linear form. Also, check that the residual plot shows no clear pattern (e.g., a curve or a U-shape).
I - Independent: Individual observations are independent of each other.
- How to Check: If sampling without replacement, verify the 10% condition: the sample size n must be no more than 10% of the population size N (i.e., ).
N - Normal: For any fixed value of x, the corresponding values of y are normally distributed.
- How to Check: Examine a histogram or a boxplot of the residuals. It should be roughly symmetric with no strong skew or major outliers. A normal probability plot of the residuals is even better; it should be roughly linear.
E - Equal Variance (or Equal Standard Deviation): The standard deviation of the residuals is the same for all values of x. This property is also called homoscedasticity.
- How to Check: Examine the residual plot. The vertical spread of the points should be roughly the same across all x-values. A fanning or cone shape in the residual plot indicates a violation of this condition.
R - Random: The data come from a well-designed random sample or a randomized experiment.
- How to Check: Read the problem description to confirm random sampling or assignment.
Justifying a Claim: The Connection to Hypothesis Testing
The primary purpose of a confidence interval for the slope is to determine if there is a statistically significant linear relationship between the two variables.
The null hypothesis for a test on the slope is H₀: β = 0. This hypothesis claims there is no linear relationship between the variables in the population. If the slope is zero, the line is horizontal, and changes in x do not predict changes in y.
The alternative hypothesis is Hₐ: β \neq 0. This hypothesis claims there is a linear relationship between the variables.
The confidence interval gives us a simple way to test this claim:
If the interval for β contains 0: Zero is a plausible value for the true slope. We do not have convincing statistical evidence to reject the null hypothesis. We would conclude there is not a significant linear relationship between the variables.
If the interval for β does NOT contain 0: Zero is not a plausible value for the true slope. We have convincing statistical evidence to reject the null hypothesis. We would conclude there is a significant linear relationship between the variables.
Key Vocabulary
True Slope (β): The parameter representing the actual rate of change in the mean response variable for a one-unit increase in the explanatory variable for the entire population.
Sample Slope (b): The statistic calculated from sample data that serves as the point estimate for the true slope, β.
Standard Error of the Slope (SE_b): An estimate of the standard deviation of the sampling distribution of the sample slope. It quantifies the typical amount of error in using to estimate .
Confidence Interval for the Slope: An interval of plausible values for the true population slope, β, calculated from sample data.
Degrees of Freedom (df) for Regression: For inference about the slope of a regression line, the degrees of freedom are , where is the sample size.
Residual Plot: A scatterplot of the residuals against the explanatory variable (or predicted values). It is a critical diagnostic tool for checking the Linear and Equal Variance conditions.
Calculator Tech (TI-84)
The TI-84 has a built-in function to create a confidence interval for the slope directly from raw data.
Function:LinRegTInt` (Linear Regression T-Interval)
**Steps:**
1. Enter your explanatory variable data into a list (e.g., L1) and your response variable data into another list (e.g., L2). `STAT -> 1:Edit...`
2. Go to the testing menu: `STAT -> TESTS`
3. Scroll down to `G:LinRegTInt...` and press `ENTER`.
4. Fill in the required inputs:
* **Xlist:** The list containing your explanatory data (e.g., L1).
* **Ylist:** The list containing your response data (e.g., L2).
* **Freq:** Almost always 1.
* **C-Level:** The desired confidence level as a decimal (e.g., 0.95 for 95%).
* **RegEQ:** (Optional) You can store the regression equation in a function like Y1.
5. Select $Calculate and press ENTER.
Output Screen:
The calculator will display the confidence interval , the degrees of freedom (), the sample slope (), the standard error of the residuals (), and the coefficients of determination and correlation ( and ).
How to Show Work on the FRQ
For a Free Response Question asking you to construct and interpret a confidence interval for the slope, use the four-step State-Plan-Do-Conclude process to ensure you earn full credit.
State:
- Define the parameter of interest in context. "We want to estimate β, the true slope of the population least-squares regression line relating [y-variable in context] to [x-variable in context] at a [C]% confidence level."
Plan:
Name the inference procedure. "We will construct a t-interval for the slope."
Check the conditions for inference (LINER).
Linear: "The scatterplot of [y-variable] vs. [x-variable] appears roughly linear, and the residual plot shows no leftover pattern." (If plots are provided).
Independent: "The [subjects] were randomly sampled. Assuming the population of [subjects] is at least 10 * [n], the 10% condition is met."
Normal: "The histogram of residuals is roughly symmetric (or the normal probability plot of residuals is roughly linear), so the normality condition is met."
Equal Variance: "The residual plot shows a similar amount of vertical scatter for all x-values, so the equal variance condition is met."
Random: "The problem states that the data come from a random sample."
Do:
If given raw data, use your calculator's function and report the interval.
If given computer output, use the formula.
Identify values:, , .
Find df:.
Find t*: "Using a t-distribution with [df] degrees of freedom, the critical value for a [C]% confidence interval is t* = [value]." (Found using or a table).
Formula: Write the formula: .
Substitute: Plug in the values: .
Calculate: Show the final interval: .
Conclude:
This is a two-part conclusion.
Interpret the interval: "We are [C]% confident that the interval from [lower bound] to [upper bound] captures the true slope of the population regression line relating [y-variable] to [x-variable]."
Justify the claim: "Because 0 is not in this interval, we have convincing evidence of a significant linear relationship between [y-variable] and [x-variable]. (OR) Because 0 is in this interval, we do not have convincing evidence of a significant linear relationship between [y-variable] and [x-variable]."
Practice Problems
Problem 1:
A real estate agent wants to understand the relationship between the size of a house (in square feet) and its selling price (in thousands of dollars). She collects data from a random sample of 12 recent sales in a suburb. Computer output from a least-squares regression analysis is shown below.
| Predictor | Coef | SE Coef | T | P |
|---|---|---|---|---|
| Constant | 15.52 | 24.18 | 0.64 | 0.536 |
| Size (sq ft) | 0.185 | 0.025 | 7.40 | 0.000 |
Construct and interpret a 95% confidence interval for the slope of the regression line. Based on your interval, is there a significant linear relationship between house size and selling price?
Solution:
State: We want to estimate β, the true slope of the population least-squares regression line relating selling price (in thousands of dollars) to the size of a house (in square feet) with 95% confidence.
Plan: We will construct a t-interval for the slope. We assume the conditions for inference have been met. (Note: On an FRQ, you would need to check LINER based on provided plots or problem descriptions).
Do:
Identify values from output:
Sample slope
Standard error of the slope
Sample size
Find df:.
Find t*: For a 95% confidence interval with df = 10, the critical value is . (This can be found using on a TI-84 or from a t-table).
Formula:
Substitute:
Calculate:
Margin of Error:
Interval:
Final Interval:
Conclude:
We are 95% confident that the interval from 0.1293 to 0.2407 captures the true slope of the population regression line relating selling price to house size. This means for each additional square foot of size, the true mean increase in selling price is plausibly between 129.30 and $240.70. Because 0 is **not** in this interval, we **have** convincing evidence of a significant linear relationship between the size of a house and its selling price. --- **Problem 2:** A biologist studies the relationship between the number of trees in a forest plot and the number of bird species found in that plot. After collecting data from 30 randomly selected plots, she calculates a 90% confidence interval for the slope of the regression line to be(0.25, 1.45). (a) Interpret the confidence interval in context. (b) Based on the interval, what conclusion should the biologist make about the relationship between the number of trees and the number of bird species? (c) If the biologist had calculated a 99% confidence interval instead, would it have been wider or narrower? Explain. **Solution:** **(a) Interpretation:** We are 90% confident that the interval from 0.25 to 1.45 captures the true slope of the population regression line relating the number of bird species to the number of trees in a forest plot. In other words, we are 90% confident that for each additional tree in a plot, the true mean number of bird species increases by an amount between 0.25 and 1.45. **(b) Conclusion:** Because the entire interval is positive and does **not** contain 0, the biologist has convincing evidence of a significant positive linear relationship between the number of trees in a plot and the number of bird species. **(c) Width:** A 99% confidence interval would be **wider**. To be more confident (99% vs. 90%) that we have captured the true population slope, we need to include a wider range of plausible values. This is reflected mathematically by a larger t\* critical value for a 99% confidence level, which in turn creates a larger margin of error and a wider interval. ## Common Mistakes to Avoid - **Incorrect Degrees of Freedom:** A very common mistake is to use $df = n - 1. For all inference related to the slope of a regression line (both confidence intervals and significance tests), you must use . Remember, you are estimating two parameters (slope and intercept).
Misinterpreting the Confidence Level: Do not say, "There is a 95% chance that the true slope β is in the interval (0.13, 0.24)." The true slope either is or is not in the interval; it doesn't change. The 95% confidence is in the method used to construct the interval, not in any single interval. Stick to the "We are 95% confident that..." template.
Confusing with : In your and steps, be precise with your notation. β is the parameter (the true population slope) you are trying to estimate. b is the statistic (the sample slope) you calculated from the data. Your conclusion is always about β.
Forgetting the Final Conclusion: Many students correctly construct and interpret the interval but fail to answer the second part of the question: "Is there a significant relationship?" Always finish your conclusion by checking if 0 is in the interval and explicitly stating whether you have convincing evidence of a linear relationship.
"Accepting the Null": If 0 is in your interval, you conclude that you do not have convincing evidence of a linear relationship. You never "accept" the null hypothesis or claim that the slope is zero. You simply lack the evidence to prove it isn't.