Preservation of the times.

AuthorRothenberg, Jeff

At the Core

This article examines:

* Emulation in the context of other proposed solutions to digital preservation

* The advantages and challenges of using emulation to preserve digital artifacts

* Two alternative approaches to running emulators on computers in the far future

Digital informational artifacts, including records, documents, and data, share a number of core digital capabilities that give them irresistible advantages over traditional paper artifacts. First, they can be copied perfectly, which allows them to be distributed and disseminated widely and accessed remotely. In addition, records and information managers can search their contents, extract content from them, reformat, transform, and process them in ways that are unthinkable for non-digital artifacts. Beyond these core attributes, many digital artifacts possess inherently digital capabilities, such as dynamic, distributed, active, and interactive behavior -- facilities that traditional artifacts simply cannot provide.

Yet all these capabilities derive from the fact that digital artifacts are encoded, which -- although it makes them understandable to machines -- makes them unintelligible to humans without additional interpretation. It is as if they were written in invisible ink. Rendering them human-readable generally requires running programs on computers, especially for those inherently digital artifacts that use complex, executable formats. Furthermore, whereas simple page image artifacts can be printed on paper, inherently digital artifacts cannot be converted to non-digital form without losing essential aspects of their behavior, not to mention their "look and feel." This means that digital artifacts must generally be preserved in executable, digital form.

Preserving digital artifacts therefore requires not only saving the bitstreams that represent them but also retaining the ability to interpret those bitstreams properly in the future to recreate their intended behavior. Saving bits is problematic because most digital storage media become physically unreadable or obsolete rather quickly. But the greater challenge of long-term digital preservation is interpreting bitstreams correctly in the future. Each digital format -- corresponding to a given standard or an application program -- requires different interpretation, and although a few dozen formats may account for most digital artifacts, there are thousands of other formats whose importance may become apparent only in the future.

Consider the Options

A number of solutions to this problem have been proposed. Among them are:

* Do nothing. This argues that most information isn't worth saving anyway.

* Let the future worry about it ("digital archeology"). This requires future managers and researchers to bear the cost of understanding whatever they care about. Yet even using sophisticated cryptographic techniques, future users will be far less able to decipher digital artifacts than those who were able to read hieroglyphics during the 13 centuries prior to discovering the Rosetta Stone.

* Use standard or "canonical" digital formats for everything. This can help for a decade or so, but in the long term it assumes that future software will be able to render such standard formats. Yet it is reasonable to expect that all such formats eventually become obsolete, and it is impractical to enforce their use in any case.

* Repeatedly convert artifacts into future formats ("migration"). This is the obvious choice for records or other artifacts that remain "active" and so must be converted into current forms to be usable. But for the long term -- and with the vast majority of artifacts -- it is impractical. Converting every artifact represented in every arbitrarily complex, executable format would be extremely labor-intensive and may not even be possible across paradigm shifts. Moreover, every conversion loses or corrupts meaning, so the cumulative result of this approach must ultimately be gibberish. Finally, this does not even attempt to preserve digital artifacts in their original forms.

* Replace artifacts by formal descriptions. In principle, this is a very attractive approach, but it requires formally encoding all of a digital artifact's attributes that might ever be of interest, though it is impossible to predict all of these. Nor do we yet know how to encode most of the behavioral aspects of inherently digital artifacts.

* Rely on "viewer" programs to continue to render old formats. Although this may sound feasible, writing and maintaining such viewers for every format that might be of interest requires considerable effort, and their correctness will always be questionable, especially as they are continually rewritten for future computers.

* Rely on a digital artifact's original software to render it. Since its original software defines an artifact's format, this is the only way to truly preserve the artifact's original behavior. (Preserving "reader" software is sufficient for this purpose unless there is a need to preserve a record of the original authoring capabilities that were available.) Although this requires saving application...

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT