Rise of the machines: the role of text analytics in record classification and disposition: the appropriate information management strategy, along with recent developments in text analytics search technology, can be combined to help solve the problem of large, unmanaged data repositories that increase an organization's cost and legal exposure.

AuthorSantangelo, James
PositionCover story

The disparate nature of unmanaged data repositories hinders an organization's ability to purge unneeded data because the size and nature of the repositories make it cost-prohibitive to evaluate the data for disposition.

However, if shared file data repositories continue to be left unmanaged, organizations will have vast wastelands of unstructured electronic information. They will be unable to distinguish among data that does and does not need to be retained, and they will be locked into the ever-increasing cost of storing and maintaining the data in order to comply with their legal obligations.

Accurate classification of electronic information, or identifying and associating information types with electronic data, is essential to making the appropriate retention and disposition decisions. To reduce volume, organizations must be able to determine what type of data they have to understand what data must be retained and what data are no longer useful. And reducing the volume of data is one generally accepted approach to reduce data's primary risk--the high cost of finding, preserving, reviewing, and producing it for litigation.

Accurately classifying data allows organizations to accurately retain it, place legal holds on it, and make reasonable disposition decisions about it, thus helping to minimize the significant legal costs and risks associated with continuing to store it unnecessarily. But, because of the seemingly complex, costly, and insurmountable task of classifying many years worth of unmanaged data, little has been done to address the problem.

Human Neglect

Many organizations rely on their records management policy and retention schedule to provide employees with the guidelines they need to apply retention to their data. However, employees' primary responsibilities may leave them little time to do these administrative tasks. As a result, even trained employees may fail to accurately determine how long a file should be retained, to what record classification it belongs, or how long it must be preserved for litigation.

The approach of relying on policy and assigning employees with the sole responsibility of retention, even though the data is actually owned by the organization, can cost the organization a great deal of money. Cohasset Associates' Information Governance: A Core Requirement for the Global Enterprise says that regulators and courts may still hold an organization responsible for its employees' actions in this regard.

Classification Ruin

The term classification has different meanings depending on its use. In an information management context, it can be summarized from ISO 15489-1:2001 Information and Documentation--Records Management Part 1: General as a standardized way of identifying and arranging records into categories according to a logically structured classification scheme. The categories typically correspond to record classes that are indicated in a company's record retention schedule and may contain a number of different, but related, record types.

Each record class delineates specific types of business information or data, with each class being retained for a specific amount of time from days, to years, to indefinite, thus implying the data's eventual destruction once all legal hold obligations are fulfilled. Yet, many times, the destruction never takes place because not enough identifying information is known about the data to make a decision on its disposition.

If data exists as individual electronic files stored in formally managed shared folders according to specific business functions' file plans, its classification can be based on its content and retention can be determined.

Conversely, if the data has not been formally managed, which is the typical scenario of today's average large organization, the shared folders can store millions of different types of electronic files. These files may contain many other classes of records and...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT