Towards a synthesis of judicial perspectives on technology-assisted review.

AuthorBrickell, Julia L.

DOCUMENT review accounts for an estimated 70% of all e-discovery costs. This means that document review also represents the greatest area of potential cost savings in e-discovery. Over the past ten years, the number of technological approaches to document review for litigation has increased, with the application of methods long in the domain of information retrieval. These advances call upon lawyers practicing civil litigation to gain a familiarity with the various technological options available, lest an opportunity is missed or the opponent, court, or client catches the unwary lawyer by surprise.

Technology-assisted review ("TAR") is not a luxury available only to large firms handling very large cases. Properly chosen and deployed, technological methods can enable a smaller firm to handle larger cases, and thereby compete with larger firms. In the end, the goal is to make litigation more cost-effective, allowing more cases to stay in the judicial system rather than having litigants settle because the discovery costs outweigh the value of the matter. In addition to gaining an understanding of the technology behind the available techniques, lawyers should appreciate that the match (or mismatch) between the problem to be solved, the technology chosen, and the expertise of the user are as important as the technology in determining whether the results will be both satisfactory and defensible.

Part I of this article explains TAR and the variety of approaches available. Part II synthesizes the meaning of the various judicial decisions dealing with TAR in a substantive fashion. Consideration of the judicial opinions proves the importance of understanding available technology. If there is an overall theme from the cases on this topic, it is that courts increasingly expect counsel to consider technological approaches and be competent to discuss what is warranted for a case. While the producing party still has the power to decide how to handle its production, if parties do reach agreement on an approach, they will be held to it, no matter how ill-advised it might turn out to be. Counsel must bring to the conversation sufficient expertise to understand the nuances and import of any proposed discovery protocol applying technology.

  1. Technology Assisted Review

    Before 2000, review was done with large teams of document reviewers. Between 2000 and 2010, online review platforms entered the market, hosting whole cases on a single platform and reducing the transactional costs involved in having to manually pull or deliver documents. As court rules and courtroom lawyers began to focus on discovery of electronic information, research turned to understanding how search could be applied to the review process. Research conducted from 2006-2011 under the auspices of the National Institute of Standards and Technology showed that TAR could significantly outperform or significantly underperform human review, depending on the tools used and the expertise of the users. An analysis of some of those results published in the Richmond Journal of Law and Technology ("JOLT') concluded: "Overall, the myth that exhaustive manual review is the most effective--and therefore, the most defensible --approach to document review is strongly refuted. Technology-assisted review can (and does) yield more accurate results than exhaustive manual review, with much lower effort." (1) The methods used by participants that returned top results ranged from sophisticated Boolean search to machine learning algorithms; some of the methods that performed less well fell into the same categories.

    With the help of cases like United States v. O'Keefe and Victor Stanley v. Creative Pipe in 2008, the conversation about the process and expertise needed to successfully navigate document discovery in the increasingly electronic world of both individual and corporate clients entered the legal bar. Those foundational cases, followed by the JOLT article and the article "Search, Forward " (2) in Law Technology News authored by SDNY Magistrate Judge Andrew Peck, increased the awareness of the uses of search in this context. In parallel, amendments to ABA Model Rule 1.1 and increasingly, state rules of professional conduct, have focused on knowledge of technology (either hired or acquired) as an element of "competence" for lawyers.

    With the safety of acquiescence (and at times encouragement) from the courts, the use of TAR has become more commonplace. TAR is sometimes called computer-assisted review ("CAR") or content-based advanced analytics ("CBAA") and is often referred to as predictive coding. It is important for lawyers to become familiar with the technology behind these techniques. The various forms of TAR in the market today have their origin in information retrieval techniques that have been the purview of doctoral theses and research for the past 50 years. In today's market, two types of TAR predominate: approaches based on search terms and machine learning approaches.

    Search based upon search terms is the easiest to understand: it is based on words you can read. It can range from simple keyword searches to complex search strings. User input comes in the form of building search terms that are tailored to the language the client uses in its discussion of the relevant subject matter; this language will likely differ by department and from client to client. Search terms can be tested and adjusted, with important words added and over-broad terms anchored or narrowed to home in on relevant content. The impact of these changes can be evaluated by running iterative searches on sample data, allowing the terms to be evaluated and refined. The effectiveness of the search will depend on the methodology followed in designing and refining the search terms, the expertise and know-how of the individuals designing them, and the capability of the search tool to handle multiple or complex searches.

    Machine learning approaches fall into several categories. What is common to the approaches is the use of an algorithm (a set of mathematical instructions) that builds a model of a class of documents based on various document features (chiefly words). Each technology vendor's algorithm is different, counting and weighting the words and features in the documents in a particular way (perhaps ignoring some and weighting others more heavily, maybe or maybe not taking into account the order of the words). In practice, counsel neither sees nor adjusts the algorithm.

    Machine learning approaches can be supervised or unsupervised. In a supervised approach, the user supplies the algorithm with a pre-labeled training set of documents on the basis of which the algorithm builds a model for classifying unseen documents. For example, a user might supply the algorithm with a sample of documents manually pre-coded as responsive and nonresponsive; from that sample the algorithm would build a model for classifying documents not yet seen. The exact impact of the user input cannot necessarily be predicted or discerned, as it depends on the manner in which the input affects the model that the algorithm is building. The algorithm may rank the results based on a mathematical assessment of similarity to the algorithm's model of the original input. Predictive coding, which has become a shorthand for any machine learning technique used for document review (but in fact is just a particular variation) is of this type. In an unsupervised approach, the algorithm is not supplied with training data and instead tries to discover salient patterns in the document population. Many clustering tools (tools that attempt to categorize a document population by...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT