Quick Summary
This guide will equip you to identify and describe the potential flaws that can make a sample unrepresentative of its population. You will learn to distinguish between bias (a systematic error in data collection) and sampling variability (the natural, chance variation in samples), and you will be able to explain how issues like undercoverage, nonresponse, and poor question wording can lead to inaccurate conclusions. Ultimately, you will understand that a large sample size cannot save a poorly designed study.
Key Concepts
The goal of sampling is to get a snapshot of a population. However, flaws in our method can make that snapshot misleading. These flaws are sources of bias.
Bias vs. Variability
Bias is a systematic error that consistently pushes our sample statistic in a certain direction, away from the true population parameter. It is a measure of the accuracy of our method. A method with high bias is like a rifle whose sights are misaligned—it consistently misses the target in the same direction.
Sampling Variability (also called sampling error) is the natural, expected, random variation between different samples. It is not a "mistake." It describes how spread out our estimates are likely to be. It is a measure of the precision of our method. A method with high variability is like a rifle that is not held steady—the shots are scattered all over the place.
The Ideal: Our goal is a sampling method with low bias and low variability.
Key Relationship: Increasing the sample size reduces sampling variability (makes our estimate more precise) but has no effect on bias. A large, biased sample is just a very precise estimate of the wrong value.
[Image: Four targets illustrating the combinations of bias and variability. 1) Low Bias, Low Variability (shots clustered at center). 2) High Bias, Low Variability (shots clustered away from center). 3) Low Bias, High Variability (shots scattered around center). 4) High Bias, High Variability (shots scattered away from center).]
Sources of Bias (Non-Sampling Errors)
These are errors that arise from the data collection process itself, not from the act of taking a sample.
Undercoverage Bias: This occurs when some members of the population have a lower chance of being selected, or are entirely left out of the sampling frame (the list from which the sample is drawn).
- Example: A survey of city residents that uses a telephone directory as its sampling frame. This would systematically exclude people with unlisted numbers, people who only use cell phones, and those without a phone. The sample would not be representative of the entire city's population.
Nonresponse Bias: This occurs when individuals chosen for the sample cannot be contacted or refuse to participate. This becomes a source of bias if the people who do not respond differ in a meaningful way from those who do.
- Example: A university sends a survey to alumni asking about their annual income. Alumni with very high or very low incomes may be less likely to respond than those with average incomes, skewing the results. The key is not just that people didn't respond, but that the reason for their nonresponse is related to the variable being studied.
Response Bias: This occurs when there is a systematic pattern of inaccurate answers to a survey question. The design of the question or the interaction with the interviewer influences the responses.
Wording of Questions: Leading, confusing, or emotionally charged questions can push respondents toward a particular answer.
Bad Question: "Given the importance of protecting our national parks for future generations, do you support a small increase in the park entrance fee?" (This leads the respondent to say "yes.")
Better Question: "Do you support or oppose an increase in the park entrance fee to fund park maintenance?"
Interviewer Effect: The interviewer's characteristics (e.g., age, race, gender, tone of voice) or actions can influence how people respond.
Social Desirability: People may lie or modify their answers to seem more socially acceptable. For example, when asked "Did you vote in the last election?", some non-voters may say "yes" to avoid embarrassment.
Voluntary Response Bias: This is an extreme form of nonresponse bias. It occurs when a sample is composed of people who volunteer to be in the sample by responding to a general appeal (e.g., online polls, call-in shows). These samples are almost always biased because people with strong opinions (often negative ones) are more likely to participate.
Key Vocabulary
Bias: A systematic failure of a sampling method to represent its population. It consistently overestimates or underestimates the value you want to know.
Undercoverage: A type of bias that occurs when some groups in the population are inadequately represented or left out of the process of choosing the sample.
Nonresponse Bias: A type of bias that occurs when individuals selected for a sample do not participate, and these non-responders differ in meaningful ways from the responders.
Response Bias: A type of bias that occurs when a systematic pattern of inaccurate answers is given to a survey question, often due to question wording, interviewer effects, or social desirability.
Wording of Questions: A source of response bias where the phrasing of a question influences the responses in a particular direction.
Sampling Error (or Sampling Variability): The natural, expected variation between a sample statistic and the population parameter that occurs simply due to chance. It is not a mistake.
Non-sampling Error: An error that arises from the data collection process, such as bias from undercoverage, nonresponse, or response bias. A large sample size does not fix these errors.
Calculator Tech (TI-84)
No major calculator functions are required for this topic. The focus is on conceptual understanding and clear communication.
How to Show Work on the FRQ
On Free Response Questions, you will be asked to identify a potential source of bias in a study and explain its impact. Simply naming the bias is not enough. You must provide a complete, context-specific explanation.
Template for Describing Bias on the FRQ:
Identify the Type of Bias: State the specific name of the potential bias (e.g., undercoverage, nonresponse bias, response bias due to question wording).
Describe How the Bias Occurs: Explain who is being systematically favored or left out by the sampling method described in the problem. Be specific and connect your explanation directly to the scenario.
Explain the Likely Direction of the Bias: State whether the sample statistic will likely be an overestimate or an underestimate of the true population parameter. Crucially, you must provide a clear reason why the group that was over/under-represented would respond differently than the rest of the population.
Example Application of the Template:
Scenario: A high school principal wants to know the proportion of students who feel the homework load is too heavy. He has a student aide stand at the library entrance after school and survey the first 100 students who enter.
FRQ Response:
"This study suffers from undercoverage bias (or convenience sampling leading to undercoverage). The sample is taken from students who visit the library after school. This method systematically leaves out students who do not go to the library, such as those who participate in sports, have after-school jobs, or go straight home. It is likely that students who go to the library are more academically focused and may feel differently about homework than the general student population. Therefore, the sample proportion of students who feel the homework load is too heavy will likely be an underestimate of the true proportion for all students, because the students who are not in the library (e.g., athletes, those with jobs) may have less time for homework and feel more burdened by it."
Practice Problems
Problem 1:
A local environmental group wants to estimate the proportion of residents in a large city who support a ban on single-use plastic bags. They obtain a list of all 20,000 residential addresses in the city, randomly select 500 of them, and mail a survey to these addresses. The survey includes a stamped, pre-addressed return envelope. After two weeks, 150 of the surveys have been returned.
Identify a potential source of bias that might affect the results of this survey and describe how it could lead to an inaccurate estimate.
Solution:
This study design is susceptible to nonresponse bias. The researchers selected a random sample of 500 addresses, but only 150 residents completed and returned the survey, resulting in a 30% response rate. It is very likely that the residents who chose to respond are different from those who did not. People who feel strongly about environmental issues, and thus are more likely to support the ban, may be more motivated to take the time to complete and mail back the survey. Therefore, the sample proportion of residents who support the ban on plastic bags from this survey is likely to be an overestimate of the true proportion for all city residents.
Problem 2:
The management of a large corporation wants to gauge employee satisfaction with a new "work from home" policy. They decide to conduct a survey. The first question on the survey is: "Considering the company's generous investment in technology to support remote work and its commitment to employee flexibility, how satisfied are you with the new 'work from home' policy?"
Identify a potential source of bias that might affect the results of this survey and describe how it could lead to an inaccurate estimate.
Solution:
This survey question is likely to produce response bias due to the wording of the question. The question is leading because it prefaces the inquiry with positive statements about the company's "generous investment" and "commitment to employee flexibility." This phrasing pressures employees to respond favorably and may make them feel that expressing dissatisfaction would be ungrateful or inappropriate. As a result, employees who may be neutral or dissatisfied might be inclined to report a higher level of satisfaction than they truly feel. Therefore, the sample result for employee satisfaction is likely to be an overestimate of the true level of employee satisfaction with the new policy.
Common Mistakes to Avoid
Confusing Nonresponse Bias with Voluntary Response Bias: Nonresponse bias occurs when individuals who were selected for a sample fail to respond. Voluntary response bias occurs when the sample consists only of people who chose to participate in the first place (like an online poll). All voluntary response samples suffer from nonresponse bias, but not all studies with nonresponse bias are voluntary response samples.
Just Naming the Bias: Simply writing "This is undercoverage" on an FRQ will earn minimal credit. You must follow the three-step process: identify the bias, describe who is left out/overrepresented in the context of the problem, and explain the likely direction of the bias (overestimate or underestimate).
Believing a Large Sample Fixes Everything: This is the most critical conceptual error. A larger sample size reduces sampling variability (random chance error), making your estimate more precise. It does nothing to fix bias (systematic error). A large, biased sample will just give you a very precise, but very wrong, answer.
Confusing "Bias" with "Unfair": In statistics, bias isn't about personal prejudice. It is a technical term for a study design that systematically favors certain outcomes. Avoid using emotional or non-statistical language.
Vague Descriptions of Direction: Don't just say the results will be "inaccurate" or "skewed." You must commit to a likely direction—overestimate or underestimate—and provide a plausible justification based on the group that was over- or under-sampled.