Abstract 1 I. Introduction 2 II. DATA ON THE CIVIL JUSTICE SYSTEM AND DEMOGRAPHICS OF COURT USERS 7 III. Data Mining Technology 12 IV. Textual data analysis of court cases 18 V. Textual analysis of Victorian and New South Wales cases: a pre-cursor for justice system triage? 21 VI. Reflecting on the results of textual analysis of cases that go to full hearing 23 A. The results of the ACJI exploratory study 25 B. Table 1. Highest frequency terms 28 VII. Conclusion 29 I. INTRODUCTION
Richard Susskind in Tomorrow's Lawyers predicts fundamental and irreversible changes in the world of law. (1) For Susskind, the future of legal service will resemble neither the characteristics suggested by Grisham nor Rumpole. (2) Instead, it will be a world of virtual courts, internet-based global legal businesses, online document production, commoditized service, legal process outsourcing, and web-based simulated practice. (3) Legal markets will be liberalized, with new jobs for lawyers and new employers too. These future predictions are partly based on developments in the use of information technology in legal practice over the last two decades, that can be tracked from initial limited use by legal professionals (through supportive technologies such as word processing and other office automations) through to more advanced forms of electronic discovery ("e-discovery") and legal decision support (used in courts). (5) In addition, the more advanced forms of Knowledge Discovery from Databases ("K.DD"), which is sometimes also referred to as data mining, "can help improve access to justice and provide support for alternative dispute resolution." (6) The objective of improving access to justice may be achievable through a multitude of mechanisms which are supported by information technology as well as more sophisticated data mining techniques. (7)
A major use of data mining in law is seen by the increasing use of electronic fding ("e-filing"), electronic case management, as well as e-discovery. (8) Each of these arguably contributes to access to justice by improving efficiencies and supporting more timely justice. Developments in e-discovery may, however, do more to promote access to justice by creating tools and methodologies that can be used to explore justice questions and issues across the justice sector (not only in individual cases). The utility of such processes to assist with broader knowledge discovery is enhanced by their continued evolution where experts develop systems to further improve their effectiveness and efficiency. (9) In individual cases, an example of this is the use of statistical sampling which prioritises documents for review and eliminates irrelevant data or duplicates which in turn reduces costs for discovery. (10)
The processes described relate to the use of data within the legal system, in that their broad goal is to distil and discover relevant information either from disputing parties, or from cases, in order to support speedier and perhaps better systematised resolution of disputes. (11) Such processes may also have a broader utility and can be applied to explore questions about legal systems and processes. This article explores the use of data mining processes in this broader justice context, through an exploration of a recent project undertaken by the Australian Centre for Justice Innovation ("ACJI") at Monash University in Australia, which used artificial intelligence, via data mining and textual analysis, to decipher dispute characteristics, with the goal of developing process referral indicia or 'dispute resolution triage' for disputants. (12) The impetus behind this pursuit arose from the reality that
[c]ivil disputes that end up in in a final hearing before a judge in the superior courts of Australia often have many complex characteristics. In some situations, these characteristics are the very factors which cause an initial intractability and the escalation of the dispute into litigation. In the higher civil courts, judges often note that the disputes that end up being finally litigated will often have high levels of task complexity (ie [sic] they may involve large amounts of information, complex transactions and multiple participants) [as well as] high levels of behavioural complexity, that is, those that are involved in the dispute may be... more likely to engage in excessively adversarial behaviour. (13) In addition, "[m]any commentators have noted that very few civil disputes end up in a final hearing before a judge (14) and figures from" two Australian jurisdictions "suggest that only a very small percentage of matters that are commenced in the courts result in a final hearing before a judge." (15) "As late as 1936, on the eve of the promulgation of the Federal Rules of Civil Procedure, a fifth of all civil cases that were filed in the federal courts were resolved at trial." (16) Commentators such as Langbein have argued that this is the result of new systems of civil procedure that have emerged in the U.S. and are centered on a "package of discovery techniques" consisting of "interrogatories, documentary discovery, and sworn depositions." (17) The Vanishing Trial research in the U.S. might also be referenced, and commentators such as Galanter have also suggested that this is due to practice geared toward settlement, and he states that "[p]lausible causes for this decline include a shift in ideology and practice among litigants, lawyers, and judges." (18) The 2015 civil procedure reforms in the U.S. (19) are likely to extend this phenomena and similar reforms in both the U.K. and Australia have also had a pronounced impact.
In this context, [t]here have also been a number of research projects focused on how matters can be referred to dispute resolution so that litigation is avoided wherever possible. To that end, throughout the 1980s and 1990s there was a considerable focus on what case characteristics could be used to refer disputes more effectively to alternative dispute resolution (ADR). (20) However, there has been little research "to determine whether disputes that result in final... court decisions have [particular] characteristics that could lead to the creation of indicia which might enable the earlier referral of such cases and their management" to be supported more effectively. (21) The exploratory project, undertaken by ACJI, "sought to better understand the characteristics of disputes that" eventually "result in judicial decisions by examining and exploring available case decision data." (22) "The project included the trialling of case analysis approaches in two [Australian] jurisdictions" (Supreme Court of New South Wales (23) and Supreme Court of Victoria (24)) using text analysis tools to ascertain "whether there are factors or case characteristics that may be more likely to result in final litigation in disputes." (25) It should be noted that in Australia, there are few jury trials in relation to civil disputes. (26) Most civil disputes are heard by a judge who renders a decision (rather than an opinion) that is available on a publically accessible free database. (27) In Australia, there are a relatively small proportion of cases that involve juries, and as a result judicial decision data is representative of most higher court matters that progress to a final civil hearing. (28)
DATA ON THE CIVIL JUSTICE SYSTEM AND DEMOGRAPHICS OF COURT USERS
First, it is important to note that data from the legal domain is usually stored in relatively disorganised text based document formats, in contrast with commercial and scientific data which is usually stored in a "more structured manner." (29) In addition, many data sources relating to a court are paper based, or where decisions are stored electronically, these are in narrative form rather than in structured databases, making it more challenging to retrieve fact values and case data. (30) As Stranieri and Zeleznikow have highlighted, the operating systems used by search engines such as LexisNexis or Westlaw can prove ineffective for retrieving cases and text that are relevant to particular cases. (31) In fact, to be able to glean information which might enhance and support access to justice, a series of steps need to be undertaken in order to extract useful information from repositories such as these. (32)
The sort of information which might provide insight into and assist with devising a support mechanism for those using the justice system is demographic data. Unfortunately, there has been only piecemeal research into the demographics of court users, (33) and without clear baseline data from the courts in respect of filed matters, it is difficult to devise systems which will support effective and efficient processes and durable and meaningful outcomes for those in dispute. As previously discussed, once within the litigation system, traditional trial processes account for the determination of only a small number of disputes. (34)
There are no universal standards for the information that is recorded in court files in Australia or the U.S. A typical civil or family case file in Australia or the U.S. (noting the different terms and names given to various documents) might include documents such as a writ or complaint, orders of notice, affidavits, cross complaints, third party complaints, pleadings, memorandums of decisions, opinions and judicial orders. (35) In some jurisdictions, such as the United States, there will be a greater focus on discovery and depositions. (36) In many jurisdictions, mandatory disclosure requirements have reduced an emphasis on discovery and the deposition process is not present in some jurisdictions where there is an emphasis on sworn statements or affidavits. (37) However, the particular contents will depend on the nature of the case and allegations, and will not systematically include demographic information such as parties' age, cultural origins, linguistic information, employment/work...