Two Views from the Data Mountain

Publication year2001
CitationVol. 36


Creighton Law Review

Vol. 36



In 1975, when we graduated from high school together, the modern computer age was still in its infancy. Although most major businesses and institutions had some computerized records and operations, the volume of electronic (versus paper) records was still relatively low. The personal computer revolution, desktop networking, the Internet, and e-mail as a common form of business communication all had yet to occur.

These developments, over the last quarter century, for most businesses and institutions have produced a vast mountain of data in electronic form.(fn1) Many of the most recent developments in computer science and technology, moreover, have made it even easier to store and, increasingly, to search, this enormous quantity of data.(fn2)

The ability to create, maintain and use this huge volume of data raises important technical and legal issues.(fn3) In essence, for most businesses, it is technically possible to keep virtually every electronic record that comes into existence.(fn4) Indeed, there are costs(fn5) and other burdens associated with attempting to eliminate electronic records on a selective basis.(fn6) In many instances, moreover, there may be legal constraints on attempts to destroy electronic records.(fn7)

We come at this problem from a common background, but from two different professional perspectives. One of us is a computer science professional who largely views the data mountain from the perspective of the possibilities of improving the efficiency and productivity of business through effective data analysis and storage. The other is a lawyer who largely views the data mountain with trepidation, knowing that what is buried in the mountain may often be the stuff of which litigation nightmares are made.

Can these two views be reconciled? Although there is no one perfect solution to this problem, we believe that businesses and institutions can, with forethought and sufficient effort, master the basic challenge of the data mountain. The final section of this Article is an attempt to outline the most important steps involved in that process.


The data that corporations and other institutions store is growing exponentially.(fn8) Most major businesses and institutions have long since passed the tipping point, as the clear majority of records are now created and stored in electronic form.(fn9) Even small organizations may have hundreds of gigabytes of data stored and available almost immediately, not to mention the backups archived and locked away in off-site storage.(fn10) Many office workers and professionals have thousands of e-mails (sometimes sorted into folders and sometimes merely kept as a perpetual inbox), recording nearly every scrap of e-conversation.(fn11)

As data storage manufacturers continually increase storage capacities and cut costs, our electronic file cabinets are fast approaching a capacity that is effectively infinite.(fn12) The reasons for this ever-escalating volume of data are many. Nearly every business larger than a paper route uses computers as a normal part of daily operation.(fn13) Computer systems enable collection of data about sales, inventory, financials and other aspects of business.(fn14) More and more of this data is retained as firms learn to leverage their investment in information by extracting business trends and customer tendencies from vast warehouses of archived data. This tendency to retain data is accelerated by the decreasing cost of storage.(fn15) Further, more and more interpersonal and inter-company communication is done electronically by exchanging word processing documents and e-mails.(fn16) This communication medium is responsible for the massive multiplication of documents, as attachments are added to e-mails, documents are mailed to multiple recipients and long conversations carried out in e-mail "chains" are copied and responded to over time.(fn17) All of these electronic items are also being retained for long periods as storage capacities climb.(fn18)

Advances in data creation and storage are not the only reason that businesses are retaining more and more data. Highly efficient search techniques have made it possible to use vast quantities of data effectively.(fn19) Increasingly, due to technology like the systems that Lancet(fn20) has built, it is becoming possible to "mine" these enormous quantities of material. In the past, a company might say in response to a request by business people (internal) or lawyers (outside, in a lawsuit), "this is the best we can do, given cost and time," and the results would be limited. Now, and in the future, with effective data mining, it has become possible to pull a lot of information together for a lot less cost.

The same techniques that allow a computer user to search the web for a new broccoli recipe that contains garlic can allow huge numbers of documents to be loaded into a database and searched with sophisticated queries. Fuzzy logic and artificial intelligence techniques allow searches to find and rank documents containing words that are near each other, to find documents that contain some words but not others, or to construct a web of linkages between related documents that allow a user to navigate through the pile in a rational manner.(fn21)

The implications are significant. What would have been a daunting, seemingly impossible research task twenty-five years ago is now quite possible and will soon become the norm.(fn22) Computer users are increasingly aware that with these sophisticated computerized search mechanisms, millions of records can be reviewed and analyzed, and records in disparate locations can be collected and compared.(fn23) The data mountain is no longer an impossible height to scale, but a vast database to be mined for secrets and insights that were previously unavailable.

Coupled with the vast expansion of electronic data and the sharp increase in the ability to search and use such data is the fact that in the modern computer environment, data tends to persist, often well beyond its intended useful life. Even if an institution has a document retention policy (or, more appropriately named, a document deletion policy), and employees apply the policy correctly by doing their housekeeping (deleting old e-mails and ridding disks of documents), making the data truly disappear is not quite that easy. So-called "deleted data" can continue to exist nearly forever in forms that range from immediately available to quite costly to recover, but the data is often recoverable nevertheless.

Discarded data can lurk in a number of spots that the average user may not even know about. Using the ubiquitous personal computer running Microsoft Windows, computer professionals can find data in a wide array of places.

By default, for most computer users, deleting a file does not truly delete it - the direction to delete simply moves the file into a special folder called the "Recycle Bin."(fn24) Just as with a real trash can, if you accidentally toss something in the Recycle Bin, you can retrieve it.(fn25) For e-mail, there is typically a similar mechanism - deleted items are not deleted, they are just moved into a special folder.(fn26) Anything in such a special folder is recoverable by moving the document back to one of the user's normal folders.(fn27)

Temporary copies of documents are often created during word processing sessions to archive a document in progress in case the computer crashes.(fn28) This means that there is a "shadow" version of the document, stored on the computer's hard drive, in a place that the user does not know about but that document recovery experts DO know about.(fn29) There are many other places where such "shadow documents" could be hiding, waiting to be recovered by computer forensics experts.(fn30)

Even when a user "empties" the Recycle Bin, such that the document is no longer easily recoverable by that user, the electronic data persists.(fn31) A computer, just like a card catalog in a library, keeps track of shelf space, with a map that indicates which slots on the shelves are occupied and which are empty. If we tell a computer to "really" delete a document (rather than merely to move it to the Recycle Bin), the computer's map of shelf space is updated to indicate that a spot on the shelves is available for another document. But the document is allowed to sit on the shelf (i.e., in the computer's memory) until another document is added to the library and the spot is needed.(fn32) With the large amounts of space on today's computers, it could be a long time before that spot is needed. Also, the catalog "card" used to index the document is not fully destroyed, but just marked as "gone."(fn33) The ability to identify documents that are marked for deletion (by reviewing descriptive document names) may make it possible for a computer expert to reconstruct other information about the document (such as the date it was created and the last time it was modified).(fn34) This kind of information could be enough to tell an expert where to begin looking on backup tapes and other places for copies of the documents.

Most organizations perform periodic backup of their data.(fn35) Many organizations employ cyclical backups...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT