Shakespeare Lipsum

Lorem ipsum, the placeholder text for printers and designers for centuries, has a thousand online variations. You can fill your documents with dummy texts in artisinal Hipster Ipsum (“Ethical hoodie tofu letterpress”); you can channel President Obama or corporate drones. And yes, Padawan, there’s a Star Wars one for you. (And here are a lot of other variants.)

But soft, mongrel vouchsafe alacrity i’faith,” quoth I: nary a Shakespeare Lipsum generator among them.

I set out to fix this problem, which only requires a plain-text file of Shakespeare’s words (minus speech prefixes, stage directions, and other editorial add-ons). But I’m stuck. Help me, Obi Wan; you’re my only hope.

I started by downloading the Folger Digital Texts XML files, and turned to TAPoR’s Text Cleaning tools to use this one called Extract Text from XML Document. If you limit the element to w (which all the words are tagged with), you get all the words — but also the speech prefixes (tagged with speaker) and stage directions (tagged with stage).

In other words, you need only the <w>words</w> that aren’t tagged with <stage><w>tags</w><stage> or <speaker> tags, one level up in the hierarchy.  

So how would you extract just the <w>words</w> without that higher-lever tag? You’d have to process them in two stages: first to remove everything wrapped in <stage> or <speaker> tags, and second to isolate just the <w> words.

There must be an easier way. I’m open to ideas.

With this text-file we can get really fancy. There are Textexpander snippets that can use various online lipsum-generators to insert custom-made lipsums right into your text. Read here for more, and fall down the rabbit-hole with custom lipsums from Alice in Wonderland or George Orwell’s 1984.

With your help, dear reader, we’ll do for Shakespeare what the Alice Lipsum generator does for Lewis Carroll:

Croquet and hurried upstairs in sight and birds with variations. Really now about again said on both its forehead the fall as Sure then when one Bill’s to whistle to ask them in same side of that green stuff the whole head unless there ought. Fifteenth said What made you again using it IS that first verdict the sentence three times over a dance said do that begins I mean by being seen the Multiplication Table doesn’t mind about by all except the bread-knife. Give your nose as the small ones choked his knuckles. then all crowded together first then I’m better and tumbled head. That is another hedgehog. Collar that into hers she got used up in here lad.

Leave a Reply