Author:Yanisky-Ravid, Shlomit

TABLE OF CONTENTS Introduction 431 I. Data Matters: Training the AI 438 II. The Legal Challenges 443 and Hurdles of Using Big Data: The Threat of Discriminatory Outcomes and Privacy Violations A. Discriminatory Data 444 1. AI and Discriminatory 444 Data 2. Disparate Impact 446 3. The Impact of Bad 449 Data: Biased, Partial, or Wrong 4. Discrimination in 450 the Feedback 5. Unmoored and 451 Independent AI Systems that Autonomously Seek Data 6. From Theory to 451 Practice: The Inability of Existing Laws to Control AI Systems 7. Examples and 453 Consequences of Discriminatory B. Data and Privacy: 455 Invasive and Pervasive Data 1. AI and U.S. Privacy 459 "Islands": Healthcare, Finance, and Children a. AI in the Healthcare 459 Field b. AI for Children 464 and Education c. Data, AI, and 466 Consumer Finance 2. General Normative 470 Expectations of Privacy 3. Privacy by Design 471 III. The AI Data 473 Transparency Model A. The Need for an 473 AI Data Transparency Model B. The Benefits of the 477 AI Data Transparency Model 1. The Benefit of 477 Increased Transparency 2. Value Adding 479 3. Flexibility 480 C. Theoretical 482 Justifications 1. Law and Economic 482 Theory: Transparency, Accountability, and Efficiency 2. The Market Structure 483 and the Multi-Player Model 3. Law and Economic 484 Theory:& Self-Regulating Incentive Mechanism Conclusion 485 INTRODUCTION

Commentators and experts frequently herald artificial intelligence ("AI") as a technological breakthrough that will completely transform our society and economy. (1) From medicine to transportation, finance to art, legal systems to social structures, and many other sectors, AI systems hire, fire, grant loans, predict diseases, and decide who will go to jail and how long they will stay there. (2) Many decisions previously determined by humans are now made by autonomous AI systems. (3) These AI systems, embedded in computers and robots, have begun to automate workplaces and have created new applications that rely on the vast amounts of data produced by society's daily occurrences. (4) Corporations, governments, and individuals are investing in the AI sector, creating the specter of a new Industrial Revolution. But if society comes to over-rely on AI too rapidly, it risks overlooking potential problems that may arise. (5) It is true that machine learning offers broad opportunities for innovation in a host of areas such as climate and physical, transactional, and behavioral data about people, pandemics, pharmaceuticals, infrastructure, and supply chains. (6) However, as AI technologies grow in prominence and become more easily implementable, stakeholders must acknowledge that AI has the dangerous potential to violate laws and societal norms. (7)

The growing AI industry is dominated by huge firms that collect, hold, or can afford to access massive amounts of data. (8) But data can be flawed--indeed, instances abound of massive companies utilizing AI systems that produce biased outcomes. In one instance, Amazon's AI facial recognition software, Rekognition, wrongly identified twenty-eight members of Congress as individuals who had jail mugshots. (9) These results demonstrate the existence of race and gender biases present in facial recognition AI system. (10) Similarly, Facebook's software is known to identify the "ethnic affinities" of users' characteristics, which advertisers can then use to exclude certain users from viewing particular promotions. (11) These troubling, biased consequences are not inevitable in an era of Autonomous, Automated, and Advanced AI Systems--the so-called "3A Era." Rather, they highlight that AI technologies pose crucial challenges that policymakers must address. (12) These challenges cut across firms, developers, governments, and employees; therefore, proper legal and regulatory schemes must be established to ensure that AI development is neither held back nor goes too far. (13)

The innovations in AI technology are moving too fast for Congress to effectively understand and grapple with. Inadequate regulatory schemes might be unbalanced: too permissive a scheme would give cover to and perpetuate existing discrimination in AI programs, while a scheme too restrictive would halt the development of AI technology altogether, stymying its potential benefits. Thus, it is vital to create a framework that can help the industry, the public and policymakers identify where problems with data occur, how they occur, and why they occur. Once these nuances are better understood, the government can more effectively regulate the AI industry. To that end, this Article proposes an AI Data Transparency Model that focuses on illuminating how AI systems utilize data. This Model differs from other commentaries on the risks of AI systems in that it does not oppose the use or expansion of AI systems. Rather, this Model recognizes that regulatory schemes have to focus on the source of threats and hazards in AI systems--the data itself.

The Transparency Model recommends an auditing and certification regime that will encourage transparency, and help developers and individuals learn about the potential threats of AI, discrimination, and the continued weakening of societal expectations of privacy. If firms choose to utilize non-infringing data from beginning to end, from the very first steps of developing and training AI systems through the actual operation of those systems, the likelihood of discriminatory outcomes and privacy violations will be greatly reduced.

The proposed Transparency Model takes into account the nature of how AI systems work and the prevalence of multiple stakeholders, each of whom is responsible for developing and operating AI systems (the "Multi-Player Model"). (14) These stakeholders may include software programmers, data providers, users, sellers and distributors of AI systems, manufacturers, and others such as the public and the shareholders of firms. (15) As part of the regulatory scheme of the Model, we first contend that each of these stakeholders, especially the data providers, should concern themselves with potential adverse outcomes that AI systems might create. Stakeholders must consider the possibility that AI systems will misinterpret data and produce discriminatory outcomes or otherwise violate human rights. The Model includes a certification process, whereby stakeholders can align, assert, and publicize their efforts to produce AI systems that conform with a transparency industry standard. This certification can be determined internally, or conducted by a third-party auditing agency; either way, the purpose is to encourage the development of a certifiable, uniform industry standard. This Article argues that cultivation of a strong certification process is soundly justified by law and economics and would spur public demand for ethical use of AI. Finally, the Model will raise awareness about the dangers that may arise when stakeholders overlook the possibility that certain compositions of data can have discriminatory effects.

Just as technology has exploded in the 3A Era, so has the literature concerning the legal implications of AI's proliferation. (16) However,

the literature to date has tended to focus on the operation of AI systems, rather than on the data used to train them. (17) This leaves technology firms without guidelines, which increases the risk of societal harm and leaves policymakers and judges without a regulatory regime to turn to when addressing the novel and unpredictable outcomes of AI systems. This Article also tries to fill that void with the Transparency Model, which focuses on data, rather than on software programmers or algorithms. It is important to mention that some scholars, notably Professor Joel Reidenberg, have opposed the prevailing position that transparency of software and algorithms alone will not completely resolve existing issues of bias and prejudice in AI. (18) This work goes a step further to suggest great emphasis and focus must be placed on the data itself. Focusing on the data is vital and can usher in newfound understanding of how and to what extent AI systems should be integrated into nearly any aspect of society.

A short review of some other important works that have been conducted regarding AI systems demonstrates that a thorough discussion of data itself is missing from the literature at large. In a landmark article, Solon Barocas and Andrew D. Selbst, two scholars who have pioneered the study of the effects of big data and the advent of the internet on individuals' privacy and civil rights, noted that algorithms and the use of big data compromise the spirit of decades old anti-discrimination statutes. (19) They warned of the need to pass new statutes that counteract the dangers that algorithms and big data pose to society. (20) There has been a steady creep of AI into consumer finance, and it remains an unresolved question who should bear the burden of ensuring that their AI applications do not discriminate, target, or fail to provide services to protected demographics. (21) Current laws are insufficient to address these risks, but overregulation could hinder the development of the technology. (22) Many others have raised concerns about the new challenges that AI systems pose for criminal justice. For example, scholars have discussed the inadequacy of current legal doctrines in protecting citizens from automated suspicion algorithms, which identify suspects and suspicious activity that would ordinarily be identified by a human police officer. (23)

There are numerous problematic features of machine learning algorithms that make regulation difficult. (24) First, there is discreetness, the fact that machine learning applications can be developed with limited visible infrastructure. (25) Next, because so many different entities develop machine learning applications, diffuseness makes it difficult to identify who should be regulated. (26) Further, the opacity of the...

To continue reading