加载中...
加载中...
Assessment for Unit 8: Inference for Categorical Data: Chi-Square
Select the one best answer for each question.
1. A six-sided die is suspected of being unfair. To test this, a student rolls the die 60 times. If the die is fair, the expected count for each face (1 through 6) is 10. The student observes that the face "6" appears 15 times, while the face "1" appears only 5 times. Which of the following best describes the fundamental question the student is trying to answer based on the variation between these observed and expected counts?
2. A candy company claims that 20% of its candies are red, 30% are blue, and 50% are yellow. A customer buys a large bag containing 100 candies to test this claim. The customer finds there are 25 red, 25 blue, and 50 yellow candies. The expected counts, based on the company's claim, are 20 red, 30 blue, and 50 yellow. Which of the following statements is the most appropriate interpretation of the variation between the observed and expected counts for the red candies?
3. A sociologist is investigating whether the distribution of birth months for professional athletes is uniform across the four quarters of the year. The sociologist collects data on a random sample of 200 athletes. The table below shows the Observed counts and the Expected counts (assuming a uniform distribution). | Quarter | Observed | Expected | |:---:|:---:|:---:| | Q1 (Jan-Mar) | 70 | 50 | | Q2 (Apr-Jun) | 45 | 50 | | Q3 (Jul-Sep) | 45 | 50 | | Q4 (Oct-Dec) | 40 | 50 | Based on the data, which quarter contributes the most to the question of whether the variation is non-random?
4. A city planner believes that traffic accidents occur with equal frequency on all 7 days of the week. To investigate, they record the day of the week for the last 700 accidents. The expected count for each day is 100. The observed count for Tuesday is 102. Which of the following best characterizes the variation between the observed count of 102 and the expected count of 100 for Tuesday?
5. A six-sided die is suspected of being weighted. To test this, a student rolls the die 120 times and records the frequency of each face. If the die is fair, the probability of rolling any specific face is 1/6. What is the expected count for the face showing a '5'?
6. A researcher wishes to perform a chi-square goodness-of-fit test to see if a sample of data fits a distribution with four categories: A, B, C, and D. The hypothesized proportions are p_A = 0.05, p_B = 0.15, p_C = 0.30, and p_D = 0.50. The sample size is n = 80. Which of the following correctly evaluates the Large Counts condition for this test?
7. Which of the following best describes the properties of a chi-square distribution used in a goodness-of-fit test?
8. A store manager claims that customer visits are equally distributed across the 5 weekdays (Monday through Friday). A sample of 100 customer visits is recorded. On Wednesday, the Observed count is 25. What is the contribution of the Wednesday data to the chi-square test statistic?
9. A biologist is studying the inheritance of traits in a specific breed of flower. According to a genetic model, the flowers should appear in the ratio 9:3:3:1 for phenotypes Red-Tall, Red-Short, White-Tall, and White-Short, respectively. In a random sample of 160 flowers, the observed counts are 85, 35, 25, and 15, respectively. What is the value of the chi-square test statistic for a goodness of fit test?
10. A store manager wants to test if customers show an equal preference for four different styles of winter coats. She selects a random sample of 16 customers and records their choice of coat style. The observed counts are: Style A: 5, Style B: 3, Style C: 6, Style D: 2. Which of the following explains why a chi-square goodness of fit test would NOT be appropriate for these data?
11. A market researcher performs a chi-square goodness of fit test to investigate if a specific brand of cereal contains the claimed distribution of marshmallow colors: 30% Pink, 30% Yellow, 20% Blue, and 20% Green. The calculated chi-square test statistic is 14.2 with 3 degrees of freedom, resulting in a p-value of 0.0026. Which of the following is the correct interpretation of this p-value?
12. A city official claims that the distribution of vehicle types (Sedan, SUV, Truck, Van) passing through a specific intersection matches the national average. A chi-square goodness of fit test is conducted with a significance level of alpha = 0.05. The degrees of freedom for the test are 3, and the critical value from the chi-square table is 7.81. The calculated test statistic is X^2 = 8.45. Which of the following is the correct justification for the conclusion?
13. A university registrar is analyzing the relationship between student classification (Freshman, Sophomore, Junior, Senior) and their primary mode of transportation to campus (Car, Bus, Walk/Bike). The contingency table below shows the observed counts for a random sample of 500 students. | | Car | Bus | Walk/Bike | Total | |---|---|---|---|---| | Freshman | 40 | 60 | 50 | 150 | | Sophomore| 55 | 45 | 25 | 125 | | Junior | 70 | 20 | 10 | 100 | | Senior | 85 | 15 | 25 | 125 | | Total | 250 | 140 | 110 | 500 | If a chi-square test for independence is conducted, what is the expected count for the cell corresponding to Sophomores who take the Bus?
14. A market researcher is investigating the association between age group and preferred social media platform. The following incomplete table summarizes the data collected from a survey. | | Platform A | Platform B | Platform C | Total | |---|---|---|---|---| | 18-24 | 45 | 30 | 25 | 100 | | 25-34 | 20 | 50 | 30 | 100 | | 35+ | 15 | 20 | 65 | 100 | | Total | 80 | 100 | 120 | 300 | Which of the following expressions correctly calculates the expected number of respondents in the 18-24 age group who prefer Platform C, assuming there is no association between age group and platform preference?
15. A biologist is studying the distribution of tree species across three different elevations. The data are organized into a 3x3 matrix with the rows representing Elevation (Low, Medium, High) and the columns representing Species (Oak, Pine, Maple). The row totals are 80, 120, and 200 respectively. The column totals are 100, 150, and 150 respectively. The total sample size is 400. To determine if a chi-square test for homogeneity is appropriate, the biologist checks the large counts condition. What is the value of the smallest expected count in the table?
16. In a study regarding the relationship between sleep duration (< 6 hours, 6-8 hours, > 8 hours) and academic performance (High, Average, Low), a researcher calculates the expected counts for a chi-square test for independence. The study included 200 total participants. If the null hypothesis is true, and 25% of all participants reported getting > 8 hours of sleep, while 40% of all participants achieved High academic performance, what is the expected count for the cell representing students with > 8 hours of sleep and High academic performance?
17. A marketing firm wants to determine if there is an association between age group (18–25, 26–40, 41–60, 60+) and preferred social media platform (Platform A, Platform B, Platform C). The firm selects a single simple random sample of 1,200 social media users and asks each person to state their age group and their preferred platform. Which of the following is the most appropriate test for this situation?
Refer to the figure below.
18. A statistics student is analyzing data to see if there is a relationship between a student's college major and their preferred music genre. The student collects data from a random sample of 200 university students. The observed counts are displayed in the partially completed two-way table below. What is the expected count for the cell corresponding to Science majors who prefer Pop music?
19. A city planner wants to determine if the distribution of commute modes (Car, Public Transit, Bike/Walk) differs between residents of the North District and residents of the South District. The planner selects a random sample of 200 residents from the North District and a separate random sample of 200 residents from the South District. Which of the following statements correctly identifies the appropriate inference procedure and the justification for its use?
20. A market researcher wants to determine if there is a difference in preference for three different packaging designs (Design A, Design B, Design C) between two different age groups (Under 30, 30 and Over). The researcher selects random samples from each age group and records their preferences. The data are shown in the incomplete table below. | | Design A | Design B | Design C | Total | |---|---|---|---|---| | **Under 30** | 40 | 35 | 25 | 100 | | **30 and Over** | 20 | 45 | 35 | 100 | | **Total** | 60 | 80 | 60 | 200 | Which of the following expressions represents the contribution of the 'Under 30, Design A' cell to the chi-square test statistic?
21. A study is conducted to compare the distribution of highest education level achieved (High School, Bachelor's, Master's, Doctorate) across three different regions of the country (Northeast, South, West). Independent random samples are taken from each region. Which of the following describes the correct method for determining the p-value for the appropriate chi-square test?
22. A biology student performs a chi-square test for homogeneity to see if the distribution of eye color (Blue, Brown, Green) is the same for two different populations of fruit flies. The resulting p-value is 0.04. Which of the following is the best interpretation of this p-value?
23. A biologist claims that a specific species of beetle has a color distribution of 50% green, 30% brown, and 20% black based on a genetic model. To test this claim, a researcher collects a single random sample of 200 beetles from the wild and records the color of each beetle. Which of the following is the most appropriate inference procedure for this situation?
24. A marketing firm wants to determine if there is an association between a consumer's age group (18–29, 30–49, 50+) and their preferred method of online shopping (Mobile App, Website, Social Media Link). The firm selects a simple random sample of 1,000 online shoppers and classifies each person according to their age group and preferred shopping method. Which of the following tests should be used to analyze the data?
25. A school administrator wants to compare the distribution of transportation methods (Car, Bus, Walk/Bike) used by students in three different high schools in the district. The administrator selects a random sample of 50 students from High School A, a random sample of 50 students from High School B, and a random sample of 50 students from High School C. The students are then categorized by their transportation method. Which inference procedure is most appropriate?