Recordkeeping in the 21st Century.

AuthorBARON, JASON R.

Growing professional interest and attention is being paid to utilizing standardized computer-system "metadata" as an information management tool to preserve the context, content, and structure of electronic records. Among the many initiatives under way are those arising in response to recent federal case law, federal regulatory, and statutory developments, as well as the work of the World Wide Web Consortium, all of which represent steps toward a more comprehensive metadata approach to managing information in the future.

According to the Web site guide to Buddhist memorial services, on October 24 of every year, at the Daioh Temple of Rinzai Zen Buddhism in Kyoto, Japan, the head priest conducts a prayer for lost information. Recognizing that "many `living' documents and software are thoughtlessly discarded or erased without even a second thought," the sect hopes that through the holding of its "information service" the "`information void' will cease to exist."

Paradoxically, at the same time as institutions in the United States and elsewhere may be in danger of losing their collective memory due to routine deletion of information in electronic form, the typical end user is most likely experiencing the opposite sensation: drowning in information overload. A recent Washington Post cover story characterized the time we live in as the "Too-Much-Information Age," going so far as to declare in a bold headline: "Tidal Wave .of Information Threatens to Swamp Civilization" (Achenbach 1999).

Both the perceived infoglut and the infovoid are increasing exponentially and in lock step with each other, especially over the past decade, due to the advent of networked computer systems, the Internet, and the World Wide Web. According to Stuart Madnick (1999), "Advances in computing and networking technologies now allow huge amounts of data to be gathered and shared on an unprecedented scale. Unfortunately, these new-found capabilities by themselves are only marginally useful if the information cannot be easily extracted and gathered from disparate sources, if the information is represented differently with different interpretations, and if it must satisfy differing user needs."

This raises a question: How can organizations get a better handle on managing information flow both for the short- and the long-term? Assuming that saving every bit and byte of data now being created is out of the question, how can public institutions go about managing what is perceived as appropriate for preservation (i.e., creating official records out of the deluge of data created and received in electronic form)? Are there methods or tools available that can assist?

There has been increased interest and attention in scientific, academic, and governmental circles on the subject of using computer-generated metadata as an information tool to preserve the context, content, and structure of electronic records (Madnick 1999).(1) In its most generic sense, metadata simply means "data about data." As one recent paper put out by the U.S. Patent and Trademark Office notes (Purcell et al. 1999), "Traditionally, the term `metadata' has been widely used to characterize the descriptive information that will support search and retrieval of both paper and electronic material. Over the past three or four years the use of the term metadata has expanded to include additional information that must be acquired and retained in order to effectively manage electronic records over long period[s] of time, including permanently."

Even before the advent of computers, we lived for a long time in a world populated by metadata -- we just may not have viewed the world in such terms. The library community has employed metadata systems for more than 100 years, classifying information by means of Dewey Decimal numbers and the Library of Congress' alphanumeric system. Documents in libraries have also been made accessible by using preexisting lists of subject headings or descriptors. Rating systems such as that of the Motion Pictures Association and the recent television ratings developed in connection with the V-chip also function as standardized metadata about the contents of those media. Even classification schemes that describe the contents of consumer goods -- such as the long-standing requirements for labeling on cigarettes, food, and tires -- are types of metadata specifications that add important, useful knowledge of the objects they describe. Arguably, all contextual information that classifies or interprets data may be validly thought of as forms of metadata.

However, the introduction of computers, and particularly the Internet, into everyday life has exponentially expanded the universe of metadata that each of us is responsible for creating but of which we remain largely unaware. For every document created on a popular computer software interface such as Windows or Lotus Notes, a wealth of metadata is retained that does not appear on the screen -- encompassing everything from descriptions of the properties of documents (e.g., character, word, and line counts), personal settings and preferences for fonts and styles used, and document revision information, to embedded codes that are virtually inaccessible. Most people have no idea of the quantity or type of metadata generated in association with individual computer-generated documents.

The Microsoft Corporation was publicly criticized recently for collecting information on its customers at the time of registration that included an embedded identification code in Windows 98 allowing for the matching up of individually created documents with their human creators (Markoff 1999). At least partially in response to this criticism, Microsoft has published a series of papers on "How to Minimize Metadata" in some of its proprietary applications.

Indeed, the very nature of the Internet (i.e., its reliance on packet-switching as the mode for information being propagated) means that message traffic accumulates an audit trail of metadata in the form of routing information which can be easily traced -- unless one takes concerted action to maintain anonymity. It has become common knowledge (although it is still easily overlooked) that virtually every move one makes surfing in cyberspace -- literally every keystroke entered on a home or desktop computer -- is potentially traceable by being recorded on some server or hard disk, either locally or on a remote Web site. Much of this is embedded in subterranean places in computers and is inaccessible to users without sophisticated knowledge...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT