The conversion of MARC metadata for online visual resource collections: a case study of tactics, challenges and results.

Author:Gonzales, Brighid Mooney


With most library resources having been cataloged in MARC format for over half a century now, a great deal of legacy metadata exists in this format, while at the same time it is slowly being replaced by more current and technologically advanced metadata formats. Originally created for the purpose of cataloging print materials, many users are now finding the MARC format no longer quite suits their needs for the description of digital, visual and other non-print materials. This study aims to determine the best methods for converting previously-created MARC catalog records for visual resources into a more current and interoperable format for digital resources using Dublin Core.

For this study, usable Dublin Core metadata records were created for a subset of 50 photographs from the Library of Congress Lomax Collection, located at One of a number of collections in the Prints and Photographs section of the Library of Congress Digital Collection, the Lomax Collection consists of just over 400 photographs taken primarily by Alan Lomax as well as those by his father Avery Lomax and step-mother Ruby Terrill Lomax, from the mid-1930s to around 1950 for the Archive of American Folk-Song. The photographs depict mostly musicians, singers and dancers from this time period in the American South and the islands of the Bahamas. For this study, a subset of 50 photographs from the collection was chosen using no specific methodology, but with an attempt at including photographs from a variety of locations and with a variety of subjects, and with a specific effort made to avoid duplicate photographs where the only difference was in file size or height and width specifications.

Dublin Core was chosen as the metadata schema for these records because it is one of the most commonly used metadata schemas, particularly for collections of digital photographs and other online content, and is a ubiquitous metadata schema used in digital content management systems such as CONTENTdm. In addition, each photograph in the collection is linked to a MARC record created by staff at the Library of Congress, which was used to generate content for the Dublin Core elements for each record. This study examines the benefits of using the Dublin Core metadata scheme for a photographic collection, the advantages and the challenges of converting MARC records to Dublin Core and the process involved in undertaking such a transition.

Literature Review

In reviewing the literature it is clear that the need for a simple, common set of elements to provide useable metadata for digital and web-based resources was the driving factor in the creation of Dublin Core, though it is also clear that from the outset Dublin Core would not be usable in all situations without adding drastic complexity to the schema. Caplan and Guenther state, "the standard would ensure that a common core set of elements could be understood across communities, even if more specific information was required within a particular interest group" (2009, 46). Dublin Core has become widely used because of its simplicity as well as its extensibility, or the ability for particular communities to expand and adapt it for their particular needs. However, it is exactly this extensibility that also directly conflicts with the schema's goal of simplicity and its original intention of greater interoperability.

Dublin Core was created as a metadata standard to be used specifically for web resources and to help solve long-standing issues of how to adequately describe the dynamic, unbound resources of the digital world where commonly used standards such as the bibliographic MARC format were not quite as practical. But can Dublin Core be used to describe visual resources such as a digital image collection as well as it can be used to describe text resources? And for previously cataloged resources, can those MARC records be used as a basis to create Dublin Core records that will be as or more effective for users of online digital resources, or does the transfer of information from a complex format such as MARC to a simplified format such as Dublin Core result in the loss of too much information to truly be a workable solution?

Dublin Core Implementation

The most common use of Dublin Core is in providing metadata for web-based or digital resources. The use of Dublin Core is generally restricted to the description of what are referred to as "document-like objects," or resources that are "bounded, or fixed, in the sense that the resource looks the same to all users" (Weibel and Miller 2001, 211). This makes Dublin Core ideal for the description of "images, movies, musical performances, speeches and other objects that are characterized by being fixed" (211).

The implementation of Dublin Core as an extensible and widely usable metadata schema for a large number of digital resources has been highly successful. Park and Tosaka found that "despite perceived limitations, use of DC is the most widespread, with more than half of the digital collections using it alone or in combination with other schemata" (2010, 105). One of the main points of Dublin Core is that each of its elements are optional, and repeatable, and thus the schema can be used in any way that suits the community using it. This makes it a viable option for almost every collection of digital resources that contain these document-like objects. Caplan and Guenther describe the elements in the Dublin Core schema as falling into three distinct categories, "access points (Title, Subject, Identifier, Author, Other Agent), information to facilitate identification (Publisher, Date, Object Type, Form, Language, Coverage), and information to relate the object being described to other objects (Relation, Source)" (2009, 46 47). However, not all of the elements of Dublin Core are applied equally. In fact, some are more commonly used while others are more likely to be left out of the resource description.

Park found that the most frequently used Dublin Core elements are, "in descending order: subject, description, title, format and coverage" while the least used elements include "language, relation, source, creator, and identifier" (2008, 92). Phelps found similar use of the elements in Dublin Core, ascribing the most commonly used elements to content information being more "readily available," while the least used were omitted because they were "often not relevant or fully understood" (2012, 333). Caplan and Guenther attribute this selectivity to the way that Dublin Core was designed, so that "information that is not applicable or not readily available can be omitted" (2009, 47), while Park suggests that rather than omitting, users often put Dublin Core's flexibility to work, attributing the high usage of the Title element to "locally assigned titles ... which indicate the creation of title from information professionals" (2008, 93). The literature suggests that for all of Dublin Core's proposed simplicity there are a large and varied number of ways that different communities choose to implement it.

Dublin Core Interoperability

It is this ability of individual users to pick and choose which elements to include in their Dublin Core records, which to leave out, and which to adapt to their individual needs that leads to the common issue of interoperability with the use of Dublin Core. For one thing, even where the content for each element comes from is not rigidly defined, and Baker describes the method of "mixing and matching" from "one or more existing sources" (2012, 121) as that which leads to the creation of so many differing local applications of Dublin Core. Park also notes the difficulty in mapping between metadata schemas, comparing it to "translating two or more different languages" (2008, 89). Park and Tosaka found interoperability to be a major concern in the use of Dublin Core, writing that "the proliferation of open-source and commercial digital library platforms using a variety of metadata schemata has implications on the librarians' ability to create shareable and interoperable metadata beyond the local environment" (2010, 113). While this is not an issue strictly relegated to the use of Dublin Core, the schema's high extensibility makes it a difficult issue to conquer.

Kurth, Ruddy and Rupp define metadata mapping "as the process of establishing semantic relationships between equivalent elements in different schemas and metadata transformation as the design and implementation of scripts and other tools that move mapped metadata between schemes" (2004, 157). But the mapping and transformation of metadata between schemas are closely related issues and often examined together by metadata researchers, as the adequate mapping of metadata directly affects its ability to be transformed from one schema to another. For this reason, a number of crosswalks have been created linking the same schemas to one another in slightly differing ways by metadata practitioners. As Chan and Zeng write, "when mapping individual elements, often there are no exact equivalents. Meanwhile, many elements are found to overlap in meaning and scope. For this reason, data conversion based on crosswalks could create quality problems" (2006, section 4.3, para 6). Unfortunately the use of crosswalks is often the only option for those looking to convert metadata from one schema to another, though this issue with equivalency may force those doing the conversion to rely on more than one crosswalk to complete their efforts, adding additional complexity to the problem.

MARC to Dublin Core Mapping and Transformation

The issue of mapping other widely-used metadata schemas to Dublin Core elements is one that is still undergoing a great deal of study throughout the professional literature and has implications for a wide range of current and future metadata projects. Walsh advocates the practice of re-purposing metadata in order to...

To continue reading