Classifying electronic documents: a new paradigm: the U.S. Department of Education set out to determine whether large volumes of electronic data can be indexed cost-effectively.

AuthorSchewe, Donald B.
PositionLessonsLearned

At the Core

This article:

* Explains how the Department of Education (DoEd) used artificial neural network software to classify electronic documents

* Reveals the challenges in analyzing and categorizing mass amounts of electronically stored data

The introduction of the desktop computer and its widespread adoption by and the private sector in the 1980s raised new problems for records managers. The computer increased information volume, manipulation of data became easier, and networks increased the ease with which information was transmitted, further adding to information mass.

When e-mail was added to the mix, quantity began to overwhelm traditional information management systems. Records managers simply could not apply the old classification and storage systems to the huge amount of information generated.

Volume was not the only problem, of course. There was also the issue of what medium to store the information on and how to migrate it forward as new versions of software were developed.

The underlying assumption of the current paradigm for records and information management is that there is now too much information to manage at the document level. The file folder has become the basic unit of control. However, computer users often do not file the information in neat folders. Attempts to force users to do this filing have been largely unsuccessful. The time has come for a new paradigm.

The monster that has created this mountain of information can be tamed and turned into the engine that controls it -- and to a far better degree than has been possible in the past. With the power of today's computers -- those found in virtually every office -- indexing can be done down to the word level, and retrieval can be virtually instantaneous. If only there were a way to categorize the information.

Enter the new paradigm.

Case in Point

The U.S. Department of Education (DoEd) had used artificial neural network technology to analyze and categorize some electronic materials at the end of the Clinton Administration. That project was successful, and the department wanted to see if the technology could be applied to their vast electronic information holdings consisting of word processing documents, spreadsheets in various formats, databases (both off-the-shelf and proprietary), and e-mail messages. The documents could not be deleted because some were record material deserving of retention for varying time periods according to the department's records retention schedule. The cost of storing and maintaining this material was a drain on the budget and hampered ongoing activities.

To see what could be accomplished, the department set up a demonstration project using e-mail and word processing documents from individuals who left the agency at the end of the Clinton Administration. Approximately 4 gigabytes of e-mail and half a gigabyte of word processing documents were provided for the project. Fairfax, Virginia based STG Inc., the same company that had done the earlier project, was employed to undertake this one, with the significant addition of an experienced records manager.

Prior experience within the department and elsewhere in the federal government had resulted in less-than-satisfactory results with desktop-deployed records management applications. Users were...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT