Studying the reader/researcher without the artifact: digital problems in the future history of books.

Author:Warner, Dorothy


It is salient to begin this article with some examples of fertile and groundbreaking study emanating from the history of the book, reading, and publishing:

* Robert Darnton brilliantly re-constructed the world-view of 18th century French society from the ground up in his book The Great Cat Massacre. He did so by reinterpreting odd and rare documents such as a printing society's wage book, a semi-fictional autobiography of a printshop worker, and an odd, obsessively complete "inventory" of the city of Montpellier. [1]

* Justin Kaplan's notes in his Library of America edition of Whitman's Leaves of Grass list eight different editions Whitman produced and edited, the first consisting of twelve poems and a preface, others expanding to four times the length, and then contracting again. Like all of Whitman's later compilers and editors, Kaplan faced the author's injunctions declared at various times on the variety of editions, in order to come up with a complete or definitive edition. [2]

* Wayne Wiegand has studied odd documents of library history like Library Bureau accession/de-accessioning books used in most small American public libraries to record the acquisition of books. Wiegand productively studied the censorship of controversial materials in some of those libraries over a 66-year period using these records. [3]

* Jonathan Katz [4] and Martin Duberman [5] are scholars who have researched and documented the history of the gay experience in America. Over the course of 25 years, they have examined previously unpublished and overlooked documents discovered through various means: by communication with gay people; by following up on rumor and vaguely remembered diaries and papers; by following obscure trails left in footnotes, much of which was located in privately-owned and only-recently gathered library archival collections.

What do these examples have in common? They represent important and interesting work that could be accomplished because the documents and the publications exist, and they exist primarily because they were printed and reprinted, simply kept somewhere, preserved and archived. The study of reading, books, book production, editing, and the research process posits a very simple assumption: that which has been read, edited, absorbed, used and studied will still exist as an artifact. As Ronald Schuchard wrote, "what interests the scholar ... in the archive [is] the preservation and accessibility of the materials of the creative imagination, the physical materials, including all the detritus, debris, and ephemera of art, biography, history. And the archival preservation of these materials is crucial for the minor as for the major figures of a literary generation" [6]--the very authors, as Michael Winship [7] points out, that most people read the most, after all.

However, the trend toward digitization, promoted by those who want information available instantly and in a "more accessible" format, poses a very fundamental challenge to the essential assumption that those items will exist in future. The dramatic move to exclusive web-distribution of federal and state government information and data in the United States is a good case study of this problem. Essentially, this project has been undertaken without planning or budgeting for archived, permanent and secure (hat is, unaltered) access. A front page story in the New York Times detailed the digitization project in the US Patent Office of 18th and 19th century patents--and the discarding of the original documents. One person did some dumpster diving outside the Office and came up with four original application copies of some of Thomas Edison's patents. [8] Much of the newly-digitized data is the raw material for scholars in such far-flung subjects as law, the environment, education, demography, and of course economics and business. Data and documents are not in danger only from governmental sources, but in private databases as well. Significant numbers of novels, scientific journals, and publishing records--economic and editorial--to give only a few examples, are now extant largely or exclusively in digital form.


Our profession's policies note specifically the "threat to information posed by technical obsolescence, the long-term retention of information resident in commercial databases, and the security of library and commercial databases." [9] However, in the haste to make information available electronically there are few agreed-upon plans for the preservation of digital information and much has already been lost. For example:

* Most of the data from the Viking mission to Mars no longer exists. [10]

* The Division of Elections in New Jersey eliminated the web page that gave the previous year's election lists and results. Concern from those using the information prompted the Division of Elections to begin to retain this information, but the earlier information is gone. Another New Jersey agency created a new web page and eliminated virtually all of the documents that had existed on the earlier page. [11]

* When the National Archives received data in the mid-seventies from the Census Bureau, it was in a 1960's then state-of-the-art UNIVAC format. At the time "there were only two UNIVAC computers left in the world: one in Japan and the other housed in the Smithsonian Institute as a museum piece. Heroic and costly rescue efforts recovered much, but not all, of the data." [12]

* The computerized data from a New York study mapping land use and environmental data throughout the state was lost. "The study had employed customized computer software that no longer existed when the computer tapes were turned over to the New York State Archives." [13]

* With the inauguration of George W. Bush, the White House website was completely changed and all of the Clinton administration's web collection disappeared overnight. Fortunately, the National Archives and Records Administration (NARA) had begun to preserve the content of the Clinton administration's contributions to the White House website, although some suspect that information has been lost anyway, since it has been reported that agencies in the Executive Branch were not all successful in complying with NARA preservation requests. [14]

* "Some historically valuable records may be deleted prematurely. The New Jersey state Department of Labor ... maintains a database of accounting information on each employer's payroll. Since the department needs the data primarily for enforcing employer contributions of ... taxes, it offloads records seven years after an employer has ceased operating. But historians might well want to use these datafiles for researching patterns of ethnic and gender employment... for example." [15]

* More recently, as a result of September 11, there have been requests by the Federal government to destroy specific information deemed as "potentially sensitive," and in one instance librarians questioned the order to destroy a public water supply database. The CD-ROM was "compiled to help those researching improvements in water supply safety [and] while it contained no analysis of system vulnerabilities, it documented locations of such crucial infrastructure as intake pipes.... Of primary concern is that there may be no way to retrieve electronic documents that are destroyed." [16]

* In the 1980s, NARA transferred about 200,000 images and documents on to optical disks--again the state-of-the-art technology of that moment. "[T]he half-life of most computer technology is between three and five years" and it is no longer certain that the disks can still be played because they depend on computer software and hardware that are no longer on the market, according to...

To continue reading