Encoding (and) Interpretation
Is encoding text an act of literary interpretation, or of pattern recognition? Either way, is it quantifiable? And if so, can a computer do it as readily as a human reader?
Those are just a few of my questions after a week-long course in text encoding at the Digital Humanities Summer Institute 2011, with the wonderful Julia Flanders from the Brown University Women Writers Project, Doug Knox from the Newberry Library, and Melanie Chernyk from the Electronic Textual Cultures Lab at the University of Victoria. We learned how to encode texts in TEI. That means taking texts that look like this —
The first step is easy enough when you have the rigorous transcriptions of EEBO-TCP, who have carefully transcribed many thousands of early modern books (PDF files based on UMI microfilms) into XML files, ready for researchers like me to do additional tagging.
My project this past week was to learn the language of these tags, so I could overlay my readings and interpretations on TCP’s already-encoded files — so I could, more precisely, add my tags to theirs. I began with an old favorite: Thomas Heywood’s elegy for Henry, Prince of Wales (d. 1612). Since TCP had already marked the stanzas, lines, and emphasized words (among other elements), I tagged references to historical figures, places, and motifs.
It’s that last category that got me thinking about tagging text as formalizing vs. enabling interpretation. To begin, how much of encoding is formalizing things you recognize, making things explicit that would otherwise pass un-noted? How objective, or definitive, is it?
It feels pretty objective when we’re talking about formal structures, about stanzas or lines — though there are always exceptions. (My colleague Laura Estill was working on the exchange between Hubert and the king in King John that splinters a pentameter line into five pieces: “Death. | My lord? | A grave. | He shall not live. | Enough.”) Your stanza is usually my stanza; your rhyme scheme my rhyme scheme. We might disagree on whether “frowned” rhymes with “loud,” but we encode an imperfect rhyme and move on.
Whatever we can automate, we can agree on. Does that sound right? We agree that “A grave,” in this instance, is the part of speech called a noun. We can even agree on its meaning in this context, or at least that there’s a lexical meaning that approximates its meaning for most readers. And we can then give a computer the task of ‘reading’ the text for us, doing what’s called Natural Language Processing: automated encoding of lines, stanzas, cantos, parts of speech, even rhymes. Compare proper nouns to a database of people and places, and tag the matches. All analogous to the natural ‘noticing’ that practiced readers do already, and that few would disagree with.
Then we can do interesting interpretive work with that automatically-generated data. Notice, for example, the comparative noun density in this canto versus that one. Notice the preponderance of feminine endings. Notice the recurring patterns of characters entering and exiting the stage, how few there are in the penultimate acts of comedies (I’m speculating) — or, say, the relative sparsity of adjectives in the pastoral cantos of Spenser’s Faerie Queene. We can even find a particular author’s stylometric signature, and attribute (or de-attribute) texts to her.
All of these interpretations are enabled by automated readings, or by systematic tagging and visualizations of the readings that computers can be trusted with. Trusted, but verified–to paraphrase Ronald Reagan on the Soviets.
But computer-aided interpretations get problematic when we’re talking about more qualitative judgements. Your reading of the tone of a scene in King John might be more ironic than mine, because I’ve been trained to read Shakespeare differently, or because I remember a really compelling performance, or because I’m just more maudlin than you. It’s my experience, mixed with training and personality and mood and all the rest, that inflects the way I turn inputs (words, valences) into outputs (interpretation of tone).
We know some people are better close readers than others, and that thousands of hours of reading experience can build the perceptive and synthetic abilities that make good readers. Can this experience be automated, sped up through processors? Not yet, anyway.