What the Dickens?!

What the Dickens?! was my independent senior capstone project for the English major that I completed during Winter 2018. This was a digital humanities project; I used the many words present in all of Dickens’ novels as data so that I could identify patterns throughout his texts. For a comprehensive look at this project and my general experience of doing a digital literary project, I recommend watching the video (below) of the presentation I gave. It runs a little on the long side (just under 50 minutes, 40 minute presentation + 10 minutes of questions), but includes helpful context for some of the challenges, limitations and opportunities of digital literary work, all in the context of specific examples from What the Dickens!?. There are also plenty of jokes, so I promise it’s entertaining!

If you’re not interested in watching the video right now, no worries. Keep scrolling to see a sparknotes version of the project.

Here’s the video of the presentation:

The Project

What the Dickens?! comes in 3 parts, each part dealing in some different way with words – word use, word frequency, word association – any information I can draw from mining the rich and expansive data in Dickens’ novels.

Part I: Serialization (Video: 11:45-22:50)

Part I of this project is my ode to failed experiments and negative results. Dickens wrote and published all of his novels serially, and he was known to adjust publications according to public opinion and current events. My theory: perhaps Dickens was so strongly influenced by what was going on around him that for any given month, all or most of the literature published in that month by Dickens had some unifying thread. No matter what the novel was about or where Dickens was in the novel, perhaps if Dickens was publishing some section in, say November, then it would have some feature in common with almost everything else Dickens published in November.

To test this theory, I divided all of Dickens novels into their individual serial releases, and then calculated all the words that occurred at least 5 times in every serial that was released in some given month. I was hoping that the result would be lists of words that fit together into twelve unique lexical sets. This, of course, did not happen. Most of the words were simply common across all 12 months, and the words that were unique to any given month seemed, for the most part, to simply be a coincidence. But there was one month that stood out.

The month of March was the only month that had consistently high occurrences of love, Mrs., gentleman, present, sir, and woman. The serials that were released in March would have been written during or shortly after February, which is, of course, when Valentine’s Day is. That may at first sound like an absurd explanation for this trend, but the Victorian era was the golden age of Valentine’s Day. After the UK instituted universal low-cost postage, sending letters became far more popular, and Valentine’s Day became a commercial holiday, with people sending and receiving dozens, sometimes hundreds, of valentines during the season. With valentines on the mind, it’s very likely that this affected Dickens’ writing.

Part II: Word Frequency Throughout Novels (Video: 22:50-27:50)

This section is simple: I divide each novel into 12 parts, count the occurrences of a single word in each part, and graph it. My favorite discovery is the graph of the word love, below. (Click on the image to open an interactive and more accessible view.)

loveIt’s clear that Dickens always begins novels with very few instances of the word love, and concludes novels with many instances. This observation alone invites a number of literary observations. Dickens clearly values love and sees it as essential to the success of his protagonists.

Part III: Part of Speech Frequency (Video: 27:50-33:58)

To get a sense of how Dickens speaks about different characters, I analyzed the part-of-speech patterns associated with individual characters. For each character, I found the frequency of every part of speech for sentences that contained that character’s name. The graph below compares 3 characters from David Copperfield – David, the male protagonist, and 2 prominent female characters, Dora and Peggotty. You can see that the female characters tend to have similar speech patterns, but David often tracks separately. (Click on the image to open an interactive and more accessible view.)

POS-DCThis final graph (below) compares male protagonists in 3 of Dickens’ most famous novels – David from David Copperfield, Pip from Great Expectations, and Oliver from Oliver TwistDavid Copperfield and Great Expectations are both bildungsromans, and David and Pip have more similar experiences to each other than they do to Oliver. The part of speech data reflects David and Pip’s similarities. The language used to describe David and Pip has similar speech patterns, but Oliver’s speech patterns are often quite different. (Click on the image to open an interactive and more accessible view.)