Feminist Markup and Meaningful Text Analysis in Digital Literary Archives.

Author:Schilperoort, Hannah


In this research paper, I look closely at three digital archives of women writers for evidence of feminist encoding practices and text analysis experimentation that supports feminist scholarship. I chose to examine the University of Nebraska-Lincoln's Willa Cather Archive, Northeastern University's Women Writers Online, and University of Alberta's Orlando Project because all three archives utilize the Text Encoding Initiative (TEI) guidelines and are involved with computer based text analysis projects. The notion and practice of feminist markup has evolved out of the digitization and encoding of women's writing as well as feminist literary criticism. As I examine digital archives of women's writing, I focus on reaching an understanding of encoding practices and attempt to generate a working definition of feminist markup by looking at documentation of markup practices of women's writing. Secondly, I am looking for connections between feminist markup practices and text analysis, especially for evidence that supports or does not support a direct cause and effect relationship between interpretative and critical feminist markup and more meaningful text analysis outcomes.

Statement of Problem

Despite the undisputable interpretative nature of text encoding, traditional digital literary scholars have prioritized structural over overtly interpretive and critical markup in an attempt to produce the most objective and reliable scholarly editions as possible. However, as we will see in the following literature review, critics of this perspective reveal not only the fallacy of objectivity but also the benefits of embracing interpretative and critical markup for scholarly research and text analysis. In particular, in the following literature review, I will explore feminist theories in relation to the digitization and markup of women's writing, along with the possibility of more robust and meaningful text analysis.

Literature Review

Some TEI Basics

Digital literary studies is a subset of digital humanities concerned with digitization of literary texts, preservation and representation of digital texts, computational text analysis, and new ways of data visualization (Siemens and Susan Schreibman xix). Text encoding is a central concern of digital literary studies, contributing to the preservation of digital texts, scholarly editing, and preparation for digital display and computational text analysis. First developed in 1987, the Text Encoding Initiative (TEI), which refers both to the set of guidelines used for textual markup as well as the international consortium that maintains the guidelines, is currently the recommended standard of text encoding for digital scholarly texts in the humanities ("TEI: History").

TEI markup includes descriptive metadata about the text in the TEI Header, and structural metadata that identifies and separates the textual elements of the content (Van den Branden, Terras and Vanhoutte). Publication 5 (P5), the current phase of TEI, is expressed in XML (extensible Markup Language), which dictates the syntax, the structural and hierarchical layout of textual markup. The TEI guidelines are designed to be open and customizable, and the semantics of the markup is determined by the Document Type Definition (DTD), the agreed upon set of element tags and corresponding attributes, of the particular encoding project ("A Gentle Introduction to XML"; Van den Branden, Terras and Vanhoutte).

One of the primary purposes of TEI is to produce machine-readable texts, enabling a computer to perform functions, such as search retrieval, display or analysis, based on the elements of the text that are marked. Buzzetti and McGann explain: "It is through markup that textual structures show up explicitly and become processable" (64). Structural elements, such as chapters or paragraphs, for example, are marked, enveloped within opening and closing tags denoting the structure, so that they are recognized as such by computer software (64).

There is no doubt that the act of encoding is an interpretative act (Cummings 458-59; Hockey 48). Paul Eggert asserts that "texts do not have an unproblematic objective existence; they are not self-identical" (429). Thus, texts, even before digital markup is added, are open to subjective interpretation and multiple readings, meaning that there is no single objective text that would lead to a single correct marked-up version of the text. From this perspective, adding markup adds another interpretative level to the text rather than simply describing objective, inherent or static content elements. Furthermore, although digitization projects use an agreed upon DTD to identify and mark textual elements in a common way, inevitably, encoders are forced to make difficult decisions based on their own interpretations of text. For example, James Cummings explains: "It is up to the researcher applying the markup to decide what portion of the text to wrap a

element around" (458). Often this could be a fairly straightforward decision, but some situations might prove more problematic. For instance, where do the main title end and the subtitle begin (458)?

Despite the obvious interpretative nature of text and text encoding, traditionally, scholars working on encoding projects have attempted to maintain objectivity as much as possible by adhering to the "Ordered Hierarchy of Content Objects" (OHCO) textual model. Renear explains that the OHCO model "postulates that text consists of objects of a certain sort, structured in a certain way." Structural elements relating to the "intellectual content" of the text, such as chapters, paragraphs, titles, stanzas, lines and so on, are marked in a hierarchical and linear fashion (Renear). XML stems from this OHCO model of a text, assuming "that a document is a single hierarchical structure and that each element nests neatly within another element" (Hockey 45). For example, paragraphs occur within chapters and lines occur within stanzas.

Although the simple hierarchical nature of XML often helps to simplify encoding decisions, texts do not always adhere to a strict hierarchical structure, resulting in scholars encoding non-hierarchical elements in a hierarchical way or producing ill-formed XML texts that do not properly validate. Cummings points out that XML is limited "with regard to the encoding of multiple overlapping hierarchies" (460). For example, "when one structure runs concurrently with another and the encoder wishes to record both of these structures simultaneously" (460). Cummings offers a simple example of the problem of marking up "paragraphs [that] may split over pages" (460). In many cases, scholars choose to favor the element which is more important to the intellectual structure of the text rather than the physical structure. Thus, in the example given above, the paragraph would most likely be privileged over the page (461).

A Case for Interpretative and Critical Markup

As previously mentioned, XML dictates the hierarchical structure and layout of the text, but the particular DTD of the digitization project dictates the semantics, the tag elements and their attributes. In "A Case for Heavy Editing: The Example of Race and Children's Literature in the Gilded Age." Amanda Gailey explains that "XML provides the general rules for structuring tags," ensuring that opening tags are properly closed and tags are properly nested within other tags, but the tags used, their meaning, and their attributes are determined by the TEI DTD (130). Gailey continues to explain that as long as the text follows the hierarchical tagging rules, we can put any type of vocabulary inside the tags "as long as the terms are in brackets and we close and nest them properly" (131).

Though the openness and flexibility of TEI allows for a wide variety of markup options, scholars tend to make markup choices that denote structural textual elements rather than thematic, symbolic, or critical possibilities. As Gailey clarifies, traditionally, TEI is "primarily focused on noting the structural or formal features of a text," which, although still interpretative, is thought to be less interpretative and controversial than "overtly interpretive or critical claims" (131). For example, labeling a textual element as a "poetic line" would less controversial than labeling an element "homoerotic" (131).

Gailey posits that "there is nothing about XML that precludes using it to make interpretative claims about text" (131). However, most scholarly corpora projects choose not to employ interpretative markup. One reason for this is because editors of scholarly editions are mainly concerned with creating a reliable, somewhat "objective" text rather than offering criticism (132). Another reason has to do with producing well-formed conformant XML text. Interpretative elements, such as metaphors, for example, will often "compete hierarchically" with physical elements of the text, resulting in "technical errors" and texts that will not validate according to XML standards (132). XML's hierarchical structure also makes it difficult, if not impossible, to "accommodate several different interpretations of the text coexisting in the same file" (132). In addition, "deep, critical markup is time-consuming" and requires literary scholars familiar with the text to do the encoding (132).

Despite the apparent difficulties, Gailey believes that employing interpretative and critical markup will greatly enhance scholarly interactions and research with digital texts. For instance, marking metaphorical interpretations would expand search results to include implicit textual possibilities in addition to referents found in explicit structural elements such as stanzas or chapters, giving more meaningful and complete search results and textual context (133-34).

According to Gailey, "heavy editing--deep markup and conspicuous...

To continue reading