A data revolution is transforming the workplace. Employers are increasingly relying on algorithms to decide who gets interviewed, hired, or promoted. Although data algorithms can help to avoid biased human decision-making, they also risk introducing new sources of bias. Algorithms built on inaccurate, biased, or unrepresentative data can produce outcomes biased along lines of race, sex, or other protected characteristics. Data mining techniques may cause employment decisions to be based on correlations rather than causal relationships; they may obscure the basis on which employment decisions are made; and they may further exacerbate inequality because error detection is limited and feedback effects compound the bias. Given these risks, I argue for a legal response to classification bias--a term that describes the use of classification schemes, such as data algorithms, to sort or score workers in ways that worsen inequality or disadvantage along the lines of race, sex, or other protected characteristics.
Addressing classification bias requires fundamentally rethinking antidiscrimination doctrine. When decision-making algorithms produce biased outcomes, they may seem to resemble familiar disparate impact cases; however, mechanical application of existing doctrine will fail to address the real sources of bias when discrimination is data-driven. A close reading of the statutory text suggests that Title VII directly prohibits classification bias. Framing the problem in terms of classification bias leads to some quite different conclusions about how to apply the antidiscrimination norm to algorithms, suggesting both the possibilities and limits of Title VIPs liability-focused model.
TABLE OF CONTENTS INTRODUCTION I. THE IMPACT OF DATA ANALYTICS ON WORKPLACE EQUALITY A. The Promise of Workforce Analytics B. The Risks of Workforce Analytics C. Types of Harm 1. Intentional Discrimination 2. Record Errors 3. Statistical Bias 4. Structural Disadvantage D. Classification Bias II. ALTERNATIVE SYSTEMS OF REGULATION A. The Market Response B. Privacy Rights III. THE ANTIDISCRIMINATION RESPONSE A. The Conventional Account of Title VII B. A Closer Reading C. Addressing Classification Bias 1. Data on Protected Class Characteristics 2. Relevant Labor Market Statistics 3. Employer Justifications 4. The Bottom-Line Defense D. A Note on Ricci v. DeStefano E. The Limits of the Liability Model CONCLUSION INTRODUCTION
The data revolution has come to the workplace. Just as the analysis of large datasets has transformed the businesses of baseball, advertising, medical care, and policing, it is radically altering how employers manage their workforces. Employers are increasingly relying on data analytic tools to make personnel decisions, thereby affecting who gets interviewed, hired, or promoted. (1) Using highly granular data about workers' behavior both on and off the job, entrepreneurs are building models that they claim can predict future job performance. (2) Sometimes called workforce or people analytics, these technologies aim to help employers recruit talented workers, screen for eligible candidates in an applicant pool, and predict an individual's likelihood of success at a particular job. (3)
Proponents of the new data science claim that it will not only help employers make better decisions faster, but that it is fairer as well because it can replace biased human decision makers with "neutral" data. (4) However, as many scholars have pointed out, data are not neutral, and algorithms can discriminate. (5) Large datasets often contain errors in individual records, and these errors may not be randomly distributed. Algorithms that are built on inaccurate, biased, or unrepresentative data can in turn produce outcomes biased along lines of race, sex, or other protected characteristics. When these automated decisions are used to control access to employment opportunities, the results may look very similar to the systematic patterns of disadvantage that motivated antidiscrimination laws. What is novel is that the discriminatory effects are data-driven.
Of course, employers have always done things such as recruiting, hiring, evaluating, promoting, and terminating employees, but data models do not rely on traditional indicia like formal education or on-the-job experience. Instead, they exploit the information in large datasets containing thousands of bits of information about individual attributes and behaviors. Third-party aggregators harvest information from the internet about job applicants, including detailed information about their social networking habits--how many contacts they have, who those contacts are, how often they post messages, who follows them, and what they like. (6) Similarly, monitoring devices collect data on the workplace behaviors of current employees, recording information such as where they go during the day, how often they speak with others and for how long, and who initiates the conversation and who terminates it. (7) Employers can also obtain information about their employees' off-duty behavior. As employees spend more of their personal time online, third parties can collect information on those activities, aggregate it with other data, and share it with employers. (8) Growing participation in wellness programs means that employees increasingly share information about their offline behaviors as well, reporting such things as how often they exercise or what they eat. (9) Data miners use this information to make health-related predictions, such as whether an employee is pregnant or trying to conceive. (10) Aggregating these various data sources can produce a rich and highly detailed profile of individual workers. (11)
This volume of information requires some form of automatic processing. No human brain can keep in view all of the thousands of data points about an individual. And so, algorithms are developed to make sense of it all--to screen, score, and evaluate individual workers for particular jobs. These algorithms are the tools of workforce analytics. For example, a company called Gild offers a "smart hiring platform" to help companies find "the right talent quicker." (12) Gild uses an algorithm that
crunches thousands of bits of information in calculating around 300 larger variables about an individual: the sites where a person hangs out; the types of language, positive or negative, that he or she uses to describe technology of various kinds; self-reported skills on Linked In; [and] the projects a person has worked on, and for how long as well as traditional criteria such as education and college major. (13)
Other services screen large pools of applicants, automating the process of selecting the most promising candidates for employers. (14) One company examines hundreds of variables about job seekers, analyzes a firm's past hiring practices, and then recommends only those applicants it believes the employer will be interested in hiring. Other firms are developing computer games that record thousands of data points about how individuals play, such as what decisions they make and how long they hesitate before deciding, in order to uncover patterns that can identify successful employees. (15) Employers can then use these tools to make hiring or promotion decisions.
The actual impact on employment opportunities is difficult to document because information about how developers construct these algorithms is considered proprietary, and personnel data is confidential. Nevertheless, some publicly available examples suggest there is reason for concern. One company seeking to identify which employees would stay longer found that the distance between home and the workplace is a strong predictor of job tenure. (16) If a hiring algorithm relied on that factor, it would likely have a racially disproportionate impact, given that discrimination has shaped residential patterns in many cities. Other studies involving internet advertising illustrate how algorithms that learn from behavioral patterns can discriminate. For example, Latanya Sweeney has shown that Google searches for African American-associated names produce more advertisements for criminal background checks than searches for Caucasian-associated names, likely reflecting past patterns in users' search behavior. (17) Amit Datta, Michael Carl Tschantz, and Anupam Datta have demonstrated gender differences in the delivery of online ads to jobseekers, with identified male users "receiv[ing] more ads for a career coaching service that promoted high pay jobs," while female users received more generic ads. (18) Similarly, a field study by Anja Lambrecht and Catherine Tucker revealed that an internet ad for STEM (science, technology, engineering and math) jobs was far less likely to be shown to women than men. (19) These examples did not necessarily result from intentional bias, but the discriminatory effects were nevertheless real.
While workforce analytics are transforming employers' personnel practices, the legal world has only just begun to take notice. Privacy law scholars have raised concerns about the growth of big data, asking what limits the law should place on the collection of particularly sensitive personal information, or whether it should regulate "data flows" or downstream uses of this information. (20) Although much of the focus has been on problems caused by inaccurate data records or unexpected and invasive uses of sensitive personal information, (21) these scholars have also sounded alarms that big data may produce biased outcomes. Of the handful of commenters who have addressed the employment context, most have simply raised questions about the discriminatory potential of data analytics, (22) without deeply theorizing the nature of the harms that these technologies threaten for workers. And to the extent that legal scholars have considered how the law might respond, they have confined their analysis to narrowly applying existing...