Doug Knox’s comment on my post about Encoding and Interpretation sent me to Stephen Ramsay’s paper “The Meandering through Textuality Challenge” (MLA, 2011).

Ramsay investigates the “digging into data” metaphor — widely used in the DH community because of its formalized support and recognition across multiple funding bodies. But this metaphor suffers (Ramsay writes) from what Neal Stephenson calls “metaphor shear“: essentially, we take it too literally.

Why dig into data? Because of making data there is no end, to paraphrase Ecclesiastes. It’s a problem recognized by authorities from The Economist to James Gleick.

The risk of Big Data isn’t just its volume, I think, but the alluring sense it gives us that our digitized record is definitive — that we’re scanning every existing text when we use the Google Ngram Viewer. Sure, Google is drawing us closer to the text-analysis singularity, but it’s still a ways off. And while computers can process many more texts than we can read in a thousand lifetimes, human readers can judge them far better than machines can.

The thrust of Ramsay’s paper isn’t just to assert that this digging metaphor is troubling, but to remind digital humanists what they share with the old-fashioned humanities, with the “the grandest traditions of humanistic inquiry”:

Every project gleefully proclaims itself to be “digging into data,” but on closer inspection, it becomes clear that they aren’t digging even in the metaphorical sense. They are, instead, doing something more akin to the meandering parole of the English or history classroom: asking questions, suggesting answers, reading, pondering. The astonishing thing isn’t, in the end, the ways in which high-performance computing and mega-scale datasets transforms the humanities; rather, it’s how much of the hermeneutical basis of humanistic inquiry ““ the character of its discourse and the eternal tentativeness of its “results” ““ remains invariant. The revolution is not hermeneutical so much as methodological. [my emphasis]

Ramsay’s purpose has long been to remind digital humanists that our tools actually enable traditional “interpretations inspired by pattern,” to move “the hermeneutical justification of [text analysis] away from the denotative realm of science and toward the more broadly rhetorical and exegetical practices of the humanities.”

Rhetorical and exegetical — that is to say, digital text analyses enable us to make interpretive arguments. We don’t dig into data merely to recover and display it without sifting it through our interpretive sieve. We need “to recover the rhetorical posture of inventio,” as Ramsay wrote elsewhere, “and to place subjective engagement at the center of digital humanities.”

In the sciences that subjectivity or sifting is sometimes called ‘fudging’ — or certainly, there’s a standard perception that the scientist who filters her data is untrustworthy. But surely (I say this as a non-scientist observer) scientific arguments aren’t passive observations of natural phenomena.

Similarly, the humanist reading computer-generated statistics about patterns in the texts s/he studies ought to make judgements about them, to sift and test them further. I would not make an argument that counters the data — but I would ask questions about how it was gathered (tags, processing algorithms); which text(s) and edition(s) it gathered them from; and whether it shows us something that’s both novel and reliable.

