Novels for Algorithms
I’m designing the graduate seminar I’ll teach in the Department of English this fall (2015) on the subject of ‘Algorithmic Criticism,’ a title I took from the subtitle of Stephen Ramsay’s 2011 book, Reading Machines. It’s an introduction to computational text-analysis for students of literature, from word frequency to topic modelling.
By the end of the course, students will be comfortable moving between close reading and distant reading, or what Matthew Jockers calls micro-, meso-, and macro-analysis. (Along with Ramsay’s book, Jockers’ 2013 study Macroanalysis and his 2014 guide to Text Analysis with R for Students of Literature will be required readings.)
Students will learn and implement some programming basics using Python and R, so they can see what happens when natural-language processing and other tools parse and rearrange the words in both individual texts and larger corpora. I haven’t developed more detailed course outcomes than that. We’ll use Codecademy’s Python tutorials alongside Jockers’ book on R.
So which literary texts do you assign for old-fashioned linear close readings in a course like this? They should be long enough to have a lot of words to work with, and complex enough that they contain a lot of topics. They should provide good contrasts with each other – that is, contain a lot of different words and topics – yet be close enough in time that the comparison makes sense. And they should be in the public domain, so we have texts to manipulate in whatever repository we’re drawing them from.
Using those criteria, Jockers compares Herman Melville’s Moby Dick with Jane Austen’s novels, including Pride and Prejudice and Sense and Sensibility. Ramsay instead focuses on distinquishing different characters’ lexical ranges in Virginia’s Woolf’s The Waves.
But should I assign the same novels, or new ones? I’ll confess that I’m tempted to follow Jockers’ lead, for two reasons. The programming tasks will be difficult to teach, so we need the guidance of at least possibly reproducing some of his results; and he uses Melville in both of his books. And although I love Woolf’s novel, it’ll be hard to divide it up by character, as Ramsay did.
My second reason has less to do with the course outcomes than with my own reading-shelf to-do list: I’ve never read Moby Dick (yes yes, how humiliating), and there’s no better way than assigning a book to ensure you read it. (Talk about humiliation!) Plus there’s an extraordinary podcast series, the Moby Dick Big Read, that has individual chapters read by the likes of Simon Callow, Benedict Cumberbatch, Tilda Swinton and David Cameron (the UK’s newly-reinstalled Prime Minister).
So what would you do? I need 2–3 novels in the public domain, and that will lend themselves to productive contrasts between close and distant readings.
Here’s a list of novels I weighed and will probably reject:
- David Copperfield, Hard Times, and others by Dickens: because – well, no good reason except they’re lower on my to-read list. [UPDATE on 2015–05–15: My friend Heather Froelich alerted me to Michaela Mahlberg’s work on Corpus Stylistics and Dickens’s Fiction so I’m reconsidering this. But which novel to assign? A very quick scan on Google Books suggests either David Copperfield or Oliver Twist.
Meanwhile, Matt Jockers responded that Joyce’s A Portrait of the Artist as a Young Man would be worth investigating, and it’s shorter than most of these other suggestions.]
- The Prelude, by Wordsworth: because while I’d like to try this on poetry, and this is a a long autobiographical narrative poem high on my to-read list, I wonder if exclusively poetic language will create too many challenges
- Mr. Penumbra’s 24-Hour Bookstore, by Robin Sloan: because it’s too meta, as a book ‘about’ algorithmic criticism and secret societies and the Googleplex
And posted the picture to Instagram: