TABLE OF CONTENTS
INTRODUCTION 66 II. THE PROMISE OF BLACK-BOX MEDICINE 70 A. Advancing Medical Knowledge 70 B. Automating the Routine 71 C. Democratizing Expertise 73 1. Diagnostics and Treatment Recommendations 74 a. Diagnostics 74 b. Treatment Recommendations 76 2. Contexts of Application 77 III. WHERE MEDICAL AI Is DEVELOPED--AND WHY 79 A. That's Where the Data Are 81 B. Reputational Effects 83 C. Legal Influences 84 1. FDA Approval 84 2. Tort Liability 86 3. Insurer Reimbursement 87 D. Caveats 88 IV. TRANSLATIONAL CHALLENGES 90 A. Treatment Quality 91 1. Patient Population Differences 91 2. Resource Capacity Differences 95 B. Cost 97 V. ISN'T ALL MEDICINE CONTEXTUAL? 98 VI. SOLUTIONS 100 A. Provider Safeguards and Humans-in-the-loop 101 1. Present Provider Ignorance 101 2. Reliance on Algorithms 102 3. Future Provider Ignorance 103 4. Provider Absence 103 B. Labeling 104 C. Representative Datasets 107 D. FDA Regulation and Concordance 110 E. Incorporating Cost 113 F. Traps to Avoid 114 VII. CONCLUSION 115 I. INTRODUCTION
Artificial intelligence is entering medical practice. The combination of medical big data and machine learning techniques allows developers to create Al usable in medical contexts--also called "black-box medicine" due to its inherent opacity--that can help improve human health and health care. only a few years ago, black-box medicine seemed far from real-world use. Today, there are already FDA-approved devices that use Al to diagnose diabetic retinopathy or to flag radiologic images for further study. (1) Hospitals have used Al to help develop care pathways for increasingly specified groups of patients. Future uses are multiplying.
But there is a problem lurking in the development of Al in medicine. (2) A key promise of medical Al is its ability to democratize medical expertise, allowing providers of all sorts to give care that otherwise might be beyond their capacity. (3) Medical Al is typically trained in high-resource settings: academic medical centers or state-of-the-art hospitals or hospital systems. (4) These sites typically have well-trained, experienced practitioners and are most likely to have high-quality data collection systems; training medical AI in these systems makes intuitive sense. Democratizing medical expertise, though, requires deploying that medical AI in low-resource settings like community hospitals, community health centers, practitioners' offices, or rural health centers in less-developed countries. (5) This translation runs into a problem: low-resource contexts have different patient populations and different resources available for treatment than high-resource contexts, and disparities in available data make it hard for AI to account for those differences.
The translational disconnect between high-resource training environments and low-resource deployment environments will likely result in predictable decreases in the quality of algorithmic recommendations for care, limiting the promise of medical AI to actually democratize excellence. To take a simple example: at Memorial Sloan Kettering, one of the best cancer centers in the world, it may well make sense to give a patient a cocktail of powerful chemotherapeutics with potentially fatal side effects, since trained oncology nurses and other specialists are available to monitor problems and intervene if things go wrong. In a community hospital without those safeguards, though, it may be a better call to administer less drastic remedies, avoiding the chance of catastrophic failure. That danger is even more pronounced in even lower-resource settings, such as rural areas of less-developed countries. But medical AI trained only on data from Memorial Sloan Kettering would have no way of taking that resource constraint into account and would provide a poor recommendation to providers in those lower-resource settings. (6)
Contextual bias is an under-addressed kind of bias in the legal AI literature. (7) Rather than the bias arising from problems in the underlying data, such as when policing algorithms end up silently replicating racial bias in underlying arrest patterns and the data they generate (8) or when health algorithms accurately mirror racial or gender biases already present in health care, (9) this bias arises in the process of translating algorithms from one context to another. The care provided in high-resource contexts may be superb and untinged by problematic human bias of any kind, and this bias would still arise. (10)
I do not mean to suggest that AI developers are unaware of the challenges of translating AI from one context to another, or the differences between high- and low-resource contexts. The technique of "transfer learning," for instance, focuses on taking insights from one environment and using them in another. (11) And some work, especially nonprofit work in the global health space, focuses intently on developing robust AI especially for deployment in low-resource contexts in less-developed countries. (12) But this Article places the dynamics of cross-context translation into a legal context where, particularly in the United States, incentives actively promote problematic development patterns; it also suggests why the data most useful to address problems of contextual bias are least likely to be available.
This Article analyzes how medical AI can run into problems through an otherwise reasonable process of development and deployment. It proceeds in four Parts. Part II briefly describes the promise of artificial intelligence in medicine, focusing on the idea of democratizing medical expertise. Part III explores the incentives for developing medical Al in high-resource medical contexts. It explores how technological factors around data availability are buttressed by legal and economic incentives to focus Al training on high-resource contexts.
Part IV, the heart of the paper, lays out the different types of errors that can arise when medical Al trained in high-resource contexts is deployed in low-resource contexts. It notes problematic differences in patient populations, differences in recommended treatments based on the available resources of the medical environment, and systematic influences on cost.
Part V addresses a question of scope: isn't all medicine contextual? Treatments are developed and doctors are trained in one set of contexts--often high-resource--and then care occurs in a wide array of different contexts. In one sense, medical Al embodies the same type of contextual bias. But medical Al carries the illusory promise of being different because it can theoretically take into account exactly those contextual differences to tailor care and can learn from its own performance. However, this safeguard fails if medical Al lacks data from different contexts to adjust its recommendations. The resulting contextual bias is especially insidious because medical Al is typically opaque, hiding the negative effects that may result.
Part VI discusses potential solutions. It begins with two obvious but flawed solutions. First, could we rely on human doctors "in the loop" to provide common-sense checks on medical Al contextual bias errors? Unfortunately, even assuming that doctors have the knowledge, incentive, and willingness to correct Al errors--assumptions that may not be merited--in many low-resource situations where Al can bring the most benefit, well-trained human providers will simply not be present. second, could we simply rely on labeling to inform users of its limitations? I argue that labeling is unlikely to solve the problem, since training-based labels are difficult to design, likely to be ignored, and, if followed, would eviscerate much of the promise of democratizing expertise. This Part suggests instead that a better solution requires a combination of public investment in data infrastructure and regulatory mandates of data showing that Al focuses well across different contexts. This combination would ameliorate the problem of contextual translation and help ensure that medical Al actually does provide benefits more broadly, rather than just to those who can already access high-resource care.
Part VI also notes that while the problem of contextual bias needs addressing, policymakers should not be misled by the Nirvana fallacy. (13) Some forms of even imperfect medical Al promise substantial benefit to underserved patients, and the field's growth should not be strangled while we await perfection.
Before proceeding, one caveat is in order. Medical AI is on the cusp of entering practice, and a few specific examples of medical AI are already available. But it is early yet, and some of the features key to this discussion are largely in development, especially AI that recommends a particular treatment for a particular patient. The predicates of the argument made here--medical AI, training in high-resource contexts, differences in patient population and resources, and impact of resources on treatment plans--are all already present. I argue that their combination is likely to lead to problems in the process of contextual translation, barring action specifically taken to avoid those problems. But I cannot yet point to instances where such problems have happened, and it is possible that careful developers and regulators will ensure that they never do, even without explicit policy intervention. (14) Nevertheless, the risk needs to be identified and brought to the fore now. Medical AI is developing rapidly and will become increasingly embedded in medical practice; the problem of pervasively biased treatment will be easier to avoid than to fix.
THE PROMISE OF BLACK-BOX MEDICINE
Medical AI promises big things. Big data and machine learning can help health-care providers explore new biological relationships and new methods of treatment, automate many low-level tasks that fill providers' days, and raise the general level of care by allowing many types of providers to access expertise...