PrepGo

AP Computer Science A Practice Quiz: Ethical and Social Issues Around Data Collection

Written by AP Content Team, Verified for 2026 AP Exams, Last updated: May 2026

Test your understanding with short quizzes. This quiz has 16 questions to check your progress.

Question 1 of 16

A company collects user birthdates and email addresses for account creation. Which of the following best describes a primary risk associated with storing this personal data on their computer systems?

All Questions (16)

A company collects user birthdates and email addresses for account creation. Which of the following best describes a primary risk associated with storing this personal data on their computer systems?

A) The data quality might be poor, containing typos.

B) A breach of the system could expose users' private information.

C) The dataset might be inappropriate for answering other questions.

D) The data could be used to create algorithmic bias.

Correct Answer: B

Content point 1 explains the 'risks to privacy from collecting and storing personal data on computer systems.' A security breach that exposes personal information is a primary example of such a risk.

When a software developer creates a new social media application that collects users' locations, what is their ethical responsibility regarding this data according to the provided text?

A) To sell the data to the highest bidder to fund the application.

B) To collect as much data as possible for potential future use.

C) To attempt to safeguard the personal privacy of the user.

D) To ensure the data is only used to answer one specific question.

Correct Answer: C

Content point 4 explicitly states, 'When developing new programs, programmers should attempt to safeguard the personal privacy of the user.'

A computer program designed to screen job applicants consistently favors candidates from a specific university, even when candidates from other universities are equally qualified. This situation is best described as:

A) A data privacy breach.

B) A data quality issue.

C) Algorithmic bias.

D) An inappropriate dataset selection.

Correct Answer: C

Content point 5 defines algorithmic bias as 'systemic and repeated errors in a program that create unfair outcomes for a specific group of users,' which perfectly matches the scenario described.

A researcher has a dataset detailing the average annual rainfall in every county in the United States. Why would this dataset be inappropriate for answering a question about the most popular car models in the state of Florida?

A) The dataset contains personal information, creating a privacy risk.

B) The dataset likely has poor data quality and missing values.

C) The contents of the dataset are not related to the question being asked.

D) The dataset could introduce algorithmic bias into the car model analysis.

Correct Answer: C

Content point 6 explains that the 'contents of a data set might be related to a specific question or topic and might not be appropriate to give correct answers or extrapolate information for a different question or topic.' Rainfall data is unrelated to car sales.

Why is it crucial to recognize the quality of a dataset before using it for analysis?

A) High-quality data is always free from algorithmic bias.

B) Poor data quality can lead to inaccurate conclusions or solutions.

C) All datasets are created to answer one specific question.

D) Recognizing data quality is the programmer's only privacy safeguard.

Correct Answer: B

Content point 2 emphasizes the 'importance of recognizing data quality and potential issues when using a data set.' Using data with issues (poor quality) can logically lead to flawed results and inaccurate conclusions.

A city council wants to determine the best locations to build new public parks to serve the most children. Which of the following datasets would be most appropriate to answer this specific question?

A) A dataset of property tax records for all homeowners in the city.

B) A dataset showing the locations of all existing businesses.

C) A dataset containing census information on the number of households with children by neighborhood.

D) A dataset of traffic flow patterns on major city roads.

Correct Answer: C

Content point 3 discusses the need to 'identify an appropriate data set to use in order to solve a problem or answer a specific question.' The census data directly addresses where children live, which is most relevant to the council's question.

According to the provided text, what is a defining characteristic of algorithmic bias?

A) It is a one-time error that affects a single user.

B) It is caused by users providing incorrect information.

C) It involves systemic and repeated errors creating unfair outcomes.

D) It only occurs when personal data is stored on insecure systems.

Correct Answer: C

Content point 5 explicitly defines algorithmic bias as 'systemic and repeated errors in a program that create unfair outcomes for a specific group of users.'

A hospital digitizes all its patient records, including names, addresses, and medical histories, and stores them on a central server. Which of the following is a direct privacy risk associated with this action?

A) The data might be used to answer a question it wasn't intended for.

B) The records may contain typos, representing a data quality issue.

C) Unauthorized access to the server could expose sensitive patient information.

D) An algorithm analyzing the data might unfairly prioritize certain patients.

Correct Answer: C

Content point 1 explains the 'risks to privacy from collecting and storing personal data on computer systems.' Unauthorized access to a centralized server of sensitive data is a key example of this risk.

An analyst is using a customer survey dataset where over 50% of the respondents left the 'age' field blank. This is an example of:

A) A necessary privacy safeguard implemented by the programmer.

B) A potential data quality issue that could affect analysis.

C) An appropriate dataset for determining customer income levels.

D) Algorithmic bias against a specific age group.

Correct Answer: B

Content point 2 highlights the importance of recognizing 'potential issues when using a data set.' A large amount of missing data is a significant quality issue that can skew the results of any analysis performed on it.

A programmer is building an e-commerce website. Which of the following actions best demonstrates an attempt to safeguard user privacy?

A) Storing user passwords in plain, unencrypted text for easy recovery.

B) Requiring users to provide their social security number to create an account.

C) Automatically sharing user purchase history with third-party advertisers.

D) Implementing encryption for stored credit card information.

Correct Answer: D

Content point 4 states that 'programmers should attempt to safeguard the personal privacy of the user.' Encrypting sensitive financial data like credit card numbers is a fundamental method for safeguarding that information from unauthorized access.

A marketing team uses a dataset of national smartphone sales from 2015 to predict the most popular social media apps in 2024. What is the primary issue with this approach?

A) The data collection creates a significant privacy risk for smartphone users.

B) The dataset's topic and time frame are not appropriate for the question being asked.

C) The dataset is likely to be systematically biased against certain phone brands.

D) The quality of the sales data from 2015 is guaranteed to be poor.

Correct Answer: B

Content point 6 states that a dataset for one topic (or time period) 'might not be appropriate to give correct answers or extrapolate information for a different question or topic.' Data from 2015 is outdated and about hardware sales, not current software usage, making it inappropriate.

A ride-sharing app's pricing algorithm consistently charges higher fares for trips starting in low-income neighborhoods, even when the distance and time are identical to trips from other areas. This is a potential example of:

A) A data quality problem due to incorrect GPS coordinates.

B) A privacy risk from collecting location data.

C) Algorithmic bias creating an unfair outcome for a specific group.

D) Using an inappropriate dataset to calculate fares.

Correct Answer: C

This scenario fits the definition in content point 5, where 'systemic and repeated errors in a program...create unfair outcomes for a specific group of users' (in this case, residents of low-income neighborhoods).

To answer the question, 'What is the relationship between hours of sleep and student test scores at a specific high school?', which dataset is the most appropriate?

A) A national survey of sleep patterns among adults.

B) The academic transcripts of all students from the high school.

C) Anonymous survey data from students at that school detailing their sleep hours and their corresponding test scores.

D) A dataset of the school's budget and teacher salaries.

Correct Answer: C

Following the principle in content point 3, one must 'identify an appropriate data set to use in order to solve a problem or answer a specific question.' The dataset in C directly contains the two variables (sleep hours, test scores) for the specific population (students at that school) needed to answer the question.

Why does storing personal data on computer systems, as opposed to paper records, introduce unique risks to privacy?

A) Computer systems are always connected to the internet and are easily accessible.

B) Data on computer systems can be copied and distributed to millions of people almost instantly.

C) Data stored on computers is more likely to contain systemic bias.

D) Computer systems cannot store data for long periods.

Correct Answer: B

Content point 1 discusses the 'risks to privacy from collecting and storing personal data on computer systems.' The ability to rapidly copy and distribute vast amounts of data is a key risk unique to computer systems, as a single breach can have a massive and immediate impact.

A developer creates a facial recognition program using a dataset composed primarily of images of one ethnic group. When the program is used in a diverse community, it has a high error rate for other ethnic groups. This outcome is a result of:

A) A data privacy breach during the collection of the images.

B) A data quality issue where the images were low resolution.

C) The dataset being inappropriate for its intended broad use, leading to algorithmic bias.

D) A programmer intentionally writing code to misidentify people.

Correct Answer: C

This question combines multiple concepts. The dataset was not appropriate for the diverse population it was applied to (Content point 6), which resulted in 'systemic and repeated errors...that create unfair outcomes for a specific group' (Content point 5, algorithmic bias).

A company develops a hiring algorithm using a dataset of its past successful employees. The dataset has missing educational information for many older employees (a data quality issue). The algorithm subsequently shows a preference for younger candidates, creating an unfair outcome. The programmer did not encrypt the applicant data, which was later stolen. Which concept is NOT represented in this scenario?

A) Risks to privacy from storing personal data.

B) Potential issues with data quality.

C) Algorithmic bias.

D) Using a dataset to answer a completely unrelated question.

Correct Answer: D

The scenario illustrates privacy risks (point 1, stolen data), data quality issues (point 2, missing information), and algorithmic bias (point 5, unfair outcome). However, the dataset of past employees is directly related to the task of hiring new employees. The issue is not that the dataset is for a 'different question or topic' (point 6), but that it is a biased and flawed sample for the same topic.