TABLE OF CONTENTS I. INTRODUCTION 633 II. DISPUTES SURROUNDING PRIVILEGED DOCUMENTS PRESENT A CUMBERSOME 636 TRUST PROBLEM FOR ALL PARTIES INVOLVED III. ZERO-KNOWLEDGE PROOF ENABLES VALIDATION OF A STATEMENT WITHOUT 639 REVEALING ANY OTHER INFORMATION IV. SOLVING THE TRUST PROBLEM IN PRIVILEGE LOG DISPUTES WITH 643 ZERO-KNOWLEDGE PROOF A. The Millionaire Model 643 1. A Review of Machine Learning in Technology-Assisted 644 Review in e-Discovery 2. Case-Specific Machine Learning Algorithms 649 3. Generic Machine Learning Algorithms 651 B. The Sudoku Model 653 V. CONCLUSION 655 I. INTRODUCTION
In recent years, as e-discovery of electronically stored information ("ESI") has become widely adopted, the number of disputes over privileged documents have also exploded. Resolving these disputes in large civil cases often involves lengthy court adjudications, in camera reviews, and sometimes even special masters appointments to oversee the process. (1) As one judge put it, "such a situation is detrimental to the litigants, the courts, and our system of justice." (2) In addition to the sheer amount of work involved, judges are also tasked with striking the delicate balance between imposing high financial costs on the privilege-claiming party by demanding detailed descriptions of the claimed documents in the privilege logs, (3) and risking allowing non-privileged documents to be unfairly withheld. (4) As a result, privilege disputes have become a vexing legal problem. They await better solutions.
At the core of the disputes surrounding privileged documents is a simple trust problem: the privilege-claiming party holds secret documents that it is unwilling to show to the requesting party, who suspects the veracity of the privilege-claim. In other words, the privilege-claiming party wants to prove that the documents are indeed privileged without disclosing the documents' contents. This is, in fact, a classical problem that can be solved by a cryptographic concept called zero-knowledge proof.
Zero-knowledge proof has a seemingly contradictory definition: to be successful, a protocol needs to convince the verifier of the veracity of a statement without revealing the content supporting that statement. For example, if two children, Alice and Bob, want to see if they have received the same number of Halloween candies without showing each other their respective candy collections, they can use the following zero-knowledge proof implementation. Bob can label each of four locked boxes with different numbers. Only one box will be labeled with the number of candies that Bob has. He will keep the key to that box and will throw away the keys to all the other boxes. Alice will then slip identical pieces of paper into each box. If Alice sees a box labeled with the number of candies she holds, she will place a special mark on the paper she places in that box. If Bob then opens up the only box he has a key to, and sees the special mark, Alice and Bob will know they have the same number of candies; otherwise they will know they have different amounts of candies. (5) See Figure 1 for a visual representation of this scenario. Zero-knowledge proofs also serve a role in business and industry, such as acting as escrow agents in financial transactions, or calculating whether a salesperson has remitted appropriate taxes from her sales to be paid by a counterparty, without revealing the precise amount for which she was able to sell an item. (6)
Zero-knowledge proof is an active research area. Its applications in law have only recently begun to attract attention. Joshua Kroll contemplated applying zero-knowledge protocols to ensure that decision-makers or machine learning algorithms apply policies consistently across all decision subjects. (8) These policies could concern voting, approving loan and credit card applications, targeting citizens or neighborhoods for police scrutiny, setting bail or parole, selecting taxpayers for IRS audits, and granting or denying immigration visas. (9)
Besides Kroll's proposal, there are no other prominent application of zero-knowledge proof in the legal context. This Note focuses on the concept of zero-knowledge proof, describes the parallels between the problems it solves and the problems with disputes surrounding privileged documents, and illustrates that there are opportunities for the application of zero-knowledge proof in the broader legal context.
Part II of this Note discusses a specific legal issue that is prevalent in civil litigation--disputes of privileged documents. In the age of e-discovery these disputes have become numerous and burdensome for all parties involved. (10) One of the problems which arises in these disputes is the lack of trust between the parties about their claimed privileges. (11) In cryptography, such distrust problems can be solved with zero-knowledge proof. Part III explains this concept using a few examples. Part Iv proposes two solutions to disputes surrounding privileged documents modeled on examples of zero-knowledge proof. The first solution involves applying a machine learning algorithm to identify privileged documents. The machine learning algorithm can either be trained with case-specific documents or with privileged documents of a specific type from a vast pool of cases. under this solution, special care needs to be taken to ensure transparency and trust-building between opposing parties. The second solution involves masking keywords and concepts to mitigate the risk of disclosing potentially sensitive content in privilege challenges. Part v concludes the Note.
DISPUTES SURROUNDING PRIVILEGED DOCUMENTS PRESENT A CUMBERSOME TRUST PROBLEM FOR ALL PARTIES INVOLVED
During discovery for civil litigation in federal court, a party is under a legal duty to disclose certain information requested by the opposing party. (12) A party may withhold responsive information from a production request on the basis of privilege, (13) but that party typically should create a "privilege log" identifying what privileged information is being withheld. (14)
Although Rule 26 of the Federal Rules of Civil Procedure has long governed the discovery process in general, privilege logs were governed by local rules or by judge orders on a case-by-case basis prior to the enactment of Rule 26(b)(5) in 1993. (15) Today, when "information produced in discovery is subject to a claim of privilege or of protection as trial-preparation material," Rule 26(b)(5) requires a party to "notify any party that received the information of the claim and the basis for it." (16) Since the rule's enactment, it has become customary for the privilege-claiming party to comply with Rule 26(b)(5) by producing a privilege log for each document, containing enough information for the court or the opposing party to assess the claim of privilege. (17)
Rule 26(b)(5) deliberately left out the details on how to make a claim of privilege or work-product protection and what information is needed to justify such claims. (18) This is because claims of privilege often come in different forms, and thus appropriate justifications vary depending on case-specific circumstances. (19) But "the absence of explicit guidance as to the nature of the required [information] enlarges the vacuum in which strategic manipulation of the discovery process... may flourish." (20) Courts must strike a delicate balance between the request for information to establish the privilege claims and the burden such requests put on the privilege-claiming party; although blanket assertions of privilege are inadequate to satisfy the claiming party's burden, (21) requiring too much description risks giving away privileged information and increases the cost and burden of preparing privilege logs. (22)
Even when a party meticulously prepares a document-by-document privilege log, few litigators are willing to trust "an opponent's understanding of the law and willingness to be forthcoming" in their determination of privileged documents. (23) When disputes arise in this context, courts use in camera review or special masters to review privileged materials, both of which are costly and time-consuming. (24)
The advent of e-discovery (i.e. electronic discovery of ESI) further exacerbated the already complicated issues with privilege logs. The primary challenge brought by e-discovery is the volume of privileged documents that need to be logged and described. one case, In re Vioxx Products Liability Litigation, (25) illustrates this challenge: a large volume of documents coupled with ambiguous guidance on the preparation of privilege logs. (26) In Vioxx, defendant Merck produced over two million documents amounting to eighteen million pages of documents in response to a discovery request. (27) Merck claimed privilege for one percent of the documents. (28) The district court ordered Merck to submit for in camera review all documents to which Merck claimed privilege. (29) In response to the order, Merck delivered eighty-one boxes "containing approximately 30,000 documents, amounting to nearly 500,000 pages." (30) The district judge "undertook the herculean task of personally reviewing 30,000 documents over a two-week period," but ended up with inconsistent results, concluding that one copy of a document was privileged and that exact duplicates of the same document were not. (31) While commending the district judge's efforts, the Fifth Circuit suggested that the district court instead sample only representative documents. (32) Merck duly provided the district court with another ten boxes of 2,000 documents. (33) The district court appointed two special masters to review these documents and 600 additional documents offered by the plaintiffs from the privilege log. (34) The second round of court review took three months and cost $400,000. (35) Eventually, the court ordered Merck to produce documents in accordance with guidelines produced in the special masters' report. (36)...