Machine Learning and Law

Publication year2021

MACHINE LEARNING AND LAW

Harry Surden(fn*)

INTRODUCTION

What impact might artificial intelligence (AI) have upon the practice of law? According to one view, AI should have little bearing upon legal practice barring significant technical advances.(fn1) The reason is that legal practice is thought to require advanced cognitive abilities, but such higher-order cognition remains outside the capability of current AI technology.(fn2) Attorneys, for example, routinely combine abstract reasoning and problem solving skills in environments of legal and factual uncertainty.(fn3) Modern AI algorithms, by contrast, have been unable to replicate most human intellectual abilities, falling far short in advanced cognitive processes-such as analogical reasoning-that are basic to legal practice.(fn4) Given these and other limitations in current AI technology, one might conclude that until computers can replicate the higher-order cognition routinely displayed by trained attorneys, AI would have little impact in a domain as full of abstraction and uncertainty as law.(fn5)

Although there is some truth to that view, its conclusion is overly broad. It misses a class of legal tasks for which current AI technology can still have an impact even given the technological inability to match human-level reasoning. Consider that outside of law, non-cognitive AI techniques have been successfully applied to tasks that were once thought to necessitate human intelligence-for example language translation.(fn6) While the results of these automated efforts are sometimes imperfect, the interesting point is that such computer generated results have often proven useful for particular tasks where strong approximations are acceptable.(fn7) In a similar vein, this Article will suggest that there may be a limited, but not insignificant, subset of legal tasks that are capable of being partially automated using current AI techniques despite their limitations relative to human cognition.

In particular, this Article focuses upon a class of AI methods known as "machine learning" techniques and their potential impact upon legal practice. Broadly speaking, machine learning involves computer algorithms that have the ability to "learn" or improve in performance over time on some task.(fn8) Given that there are multiple AI approaches, why highlight machine learning in particular? In the last few decades, researchers have successfully used machine learning to automate a variety of sophisticated tasks that were previously presumed to require human cognition. These applications range from autonomous (i.e., self-driving) cars, to automated language translation, prediction, speech recognition, and computer vision.(fn9) Researchers have also begun to apply these techniques in the context of law.(fn10)

To be clear, I am not suggesting that all, or even most, of the tasks routinely performed by attorneys are automatable given the current state of AI technology. To the contrary, many of the tasks performed by attorneys do appear to require the type of higher order intellectual skills that are beyond the capability of current techniques. Rather, I am suggesting that there are subsets of legal tasks that are likely automatable under the current state of the art, provided that the technologies are appropriately matched to relevant tasks, and that accuracy limitations are understood and accounted for. In other words, even given current limitations in AI technology as compared to human cognition, such computational approaches to automation may produce results that are "good enough" in certain legal contexts.

Part I of this Article explains the basic concepts underlying machine learning. Part II will convey a more general principle: non-intelligent computer algorithms can sometimes produce intelligent results in complex tasks through the use of suitable proxies detected in data. Part III will explore how certain legal tasks might be amenable to partial automation under this principle by employing machine learning techniques. This Part will also emphasize the significant limitations of these automated methods as compared to the capabilities of similarly situated attorneys.

I. OVERVIEW OF MACHINE LEARNING

A. What Is Machine Learning?

"Machine learning" refers to a subfield of computer science concerned with computer programs that are able to learn from experience and thus improve their performance over time.(fn11) As will be discussed, the idea that the computers are "learning" is largely a metaphor and does not imply that computers systems are artificially replicating the advanced cognitive systems thought to be involved in human learning.(fn12) Rather, we can consider these algorithms to be learning in a functional sense: they are capable of changing their behavior to enhance their performance on some task through experience.(fn13)

Commonly, machine learning algorithms are used to detect patterns in data in order to automate complex tasks or make predictions.(fn14) Today, such algorithms are used in a variety of real-world commercial applications including Internet search results, facial recognition, fraud detection, and data mining.(fn15) Machine learning is closely associated with the larger enterprise of "predictive analytics" as researchers often employ machine learning methods to analyze existing data to predict the likelihood of uncertain outcomes.(fn16) If performing well, machine learning algorithms may produce automated results that approximate those that would have been made by a similarly situated person. Machine learning is thus often considered a branch of artificial intelligence, since a well-performing algorithm may produce automated results that appear "intelligent."(fn17)

The goal of this Part is to convey some basic principles of machine learning in a manner accessible to non-technical audiences in order to express a larger point about the potential applicability of these techniques to tasks within the law.

1. Email Spam Filters as an Example of Machine Learning

Consider a familiar example-email "spam" filters-that will illustrate some basic features common to machine learning techniques. "Spam" emails are unsolicited, unwanted commercial emails that can interfere with a user accessing more important communications.(fn18) In principle, an email user could manage spam manually by reading each email, identifying whether a given email is spam, and deleting those determined to be spam. However, given that this task is labor intensive, it would be desirable to automate spam identification. To perform such automated filtering of spam, email software programs frequently use machine learning algorithms.(fn19)

How do machine learning algorithms automatically identify spam? Such algorithms are designed to detect patterns among data. In a typical process, a machine learning algorithm is "trained" to recognize spam emails by providing the algorithm with known examples of spam for pattern analysis. For instance, imagine that a person determines that a particular email is spam and flags it as such using her email reading software. We can think of this act of flagging as an indication to the computer algorithm that this is a verified example of a spam email that should be assessed for patterns.(fn20)

In analyzing the spam email, the machine learning algorithm will attempt to detect the telltale characteristics that indicate that a given email is more likely than not to be spam. After analyzing several such examples, the algorithm may detect a pattern and infer a general "rule"(fn21)-for instance that emails with the phrase "Earn Extra Cash" tend to be statistically more likely to be spam emails than wanted emails. It can then use such learned indicia to make automated assessments about the likelihood that a new incoming email is or is not spam.(fn22)

In general, machine learning algorithms are able to automatically build such heuristics by inferring information through pattern detection in data. If these heuristics are correct, they will allow the algorithm to make predictions or automated decisions involving future data.(fn23) Here, the algorithm has detected a pattern within the data provided (i.e., the set of example spam emails) that, of the emails that were flagged as spam, many of them contained the phrase "Earn Extra Cash." From this pattern, it then inferred a heuristic: that emails with the text "Earn Extra Cash" were more likely to be spam. Such a generalization can thus be applied going forward to automatically categorize new incoming emails containing "Earn Extra Cash" as spam. The algorithm will attempt to detect other similar patterns that are common among spam emails that can be used as a heuristic for distinguishing spam from wanted emails.

Importantly, machine learning algorithms are designed to improve in performance over time on a particular task as they receive more data. The goal of such an algorithm is to build an internal computer model of some complex phenomenon-here spam emails-that will ultimately allow the computer to make automated, accurate classification decisions. In this case, the internal model would include multiple rules of thumb about the likely characteristics of spam induced over time-in addition to the "Earn Extra Cash" heuristic just described-that the computer can subsequently follow to classify new, incoming emails.

For instance, such an algorithm might infer from additional spam examples that emails that...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT