Techniques for making molehills out of unstructured data mountains: visual analytics, a new technology that graphically illustrates datasets, helps users quickly identify responsive documents for electronic discovery review.

AuthorCarr, Kevin

[ILLUSTRATION OMITTED]

The astounding volume of data produced, shared, and stored by organizations today is accelerating at a greater pace than ever before. Managing this information in the past posed some challenges, but what was once considered "a lot of data" is nothing compared to what is now measured in terabytes (1,000 gigabytes) or even petabytes (1 million gigabytes).

Prior to when most data was created and stored electronically, business professionals would create a document, use it for its intended purpose, and then periodically make decisions about whether or not to file the information. Organizations archived only that which they deemed truly important because they had neither the time nor the money to engage in elaborate document storage systems.

With the adoption of and increased reliance on computers, the decision to retain information no longer revolves around manually filing a document; it focuses on actively deleting it. But with the availability of petabytes of computer storage, workers may not feel the need to delete or destroy files. Predictably, organizations have amassed huge volumes of archived materials, saved on hard drives or back-up tape media.

Over time, offsite storage of archived documents has become the norm. However, with the materials now stored remotely, an "out-of-sight, out-of-mind" approach to dealing with the data also has become common. As a result, organizations often find themselves overwhelmed when required to sort through the data pool to produce responsive documents during litigation or regulatory compliance activities in preparation for electronic discovery review. Collecting all of this data takes a great deal of time, requiring a number of steps, often starting with tape restoration. A series of processing activities follows, including de-duplication, keyword searches, and data filtering, each of which takes times and may add thousands of dollars in associated expenses.

[FIGURE 1 OMITTED]

Adding to the frustration over these mountains of data that must be managed is the reality that only a small portion of each document collection is even responsive to the case at hand. So, after dedicating significant time and expense to collecting and processing vast amounts of data, much of the effort is inevitably for naught.

The good news is that today's technology offers some help in dealing with large sets of unstructured data, with some tools taking a very logical approach to leverage the strengths of...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT