• Qualifying Quantity: Text Analysis and Methodology

    The recent New York Times series on DH picked up a thread that has been fascinating me for a while:

    A history of the humanities in the 20th century could be chronicled in “isms” — formalism, Freudianism, structuralism, postcolonialism — grand intellectual cathedrals from which assorted interpretations of literature, politics and culture spread.

    The next big idea in language, history and the arts? Data.

    Members of a new generation of digitally savvy humanists argue it is time to stop looking for inspiration in the next political or philosophical “ism” and start exploring how technology is changing our understanding of the liberal arts. This latest frontier is about method, they say, using powerful technologies and vast stores of digitized materials that previous humanities scholars did not have.

    Many folks reading this will recognize here a restatement of Tom Scheindfeldt’s “Sunset for Ideology, Sunrise for Methodology” post (which I find myself constantly referencing, even if I can’t bring myself to agree with it). I’m interested in returning to this question, both in theory and in practice (as a Marxist might say), or, to adopt the argot of THATCamp, both in yacking and hacking.

    First, some “practice”: we can find a particularly remarkable instance of this sort of “methodological” work in a project also profiled by the Times: Dan Cohen and Fred Gibbs’s fascinating Victorian Books project (here is Dan Cohen’s own extended write up). On a much smaller scale, with much less expertise and far less success I have played with similar techniques myself. And just yesterday Aditi Muralidharan posted about her project WordSeer, which leverages natural language processing to open richer avenues of text analysis.

    Now, some yacking: Despite a lot of well-meaning “there is no practice without a theory, and no theory not put into practice” talk, this division seems pretty well entrenched (Matthew Jockers—whose work mining novels at the Stanford Literature Lab is another great example of this work—nicely tries to bring distant reading and close reading together in this recent comment). In part this is because of the very different skills (e.g. statistics!) required to make sense of (and make claims about) this new type of data. (Random Session Idea: “‘So, you never took a STATS class’, or ‘How Many is Enough?’: Statistics for Readers of Books”). It is also, I think, difficult to integrate this sort of data into the traditional concerns of humanities scholars. To use my perennial example: what can “distant reading” tell me about the history of sexuality (my metonymy for “things folks, say dissertating grad students, are interested in right now”)?

    So I’m interested in putting our yack where our hack is: in trying to imagine how text analysis can contribute to the things scholars, right now, actually care about; and let’s put our hack where our yack is and play with some text and the NLTK or Voyeur or whatever. Let’s try to do something interesting.


  1. mebrett says:

    I’m in. My interest lies more in historical correspondence, and I have no experience with textual analysis tech, but I’m very into the idea of a yak and hack.

  2. cedwards says:

    Chris, I’m absolutely in. If we could throw some pedagogy into the mix too, that would be fabulous. I’m starting work next semester on a project to research and recommend simple web-based text analysis tools suitable for use in CUNY’s undergrad literature classes. Right now I have some exposure to the yack in this space, but only a very little experience of the hack, and am really looking forward to learning from you, Jeff, and other THATCampers.

Skip to toolbar