Making Books Unfamiliar: The Art of Novel Analytics

On March 2nd, Matthew Jockers gave a talk at Wesleyan about his research on using quantitative methods for analysis in literature. His talk was titled “Novel Analytics: From James Joyce to the Bestseller Code.” The following article is an exploration of his talk and the ideas he brought forward.

What makes something a piece of art? This might sound like a pretty theoretical question, but English professor Matthew L. Jockers believes that it is possible to take a technical approach.

“Art shows how things are perceived, not known,” Jockers explained. This is a definition that could cause tension in the literary world. After all, writing is messy, personal, and painfully subjective. And yet – “We tend to emphasize the idea that the text is withholding an ‘essential truth,’” Jockers explained. In this way, a literary critic wants to be able to anticipate a certain meaning, causing an endless tug-of-war between objectivity and subjectivity. Jockers does not wrestle with this tension, as evidenced by his book The Bestseller Code, in which he uses analysis to tackle that all-elusive question: What makes a bestselling novel?

Jockers and his co-author Jodie Archer nailed down the qualities that make a book a bestseller by analyzing 30 years’ worth of New York Times bestselling novels. The idea of taking an analytical look at novels was appealing to Jockers because, while a literature enthusiast, he is largely interested in the way parts fit together. Rather than focusing on the novel itself, Jockers believes we should focus on the relationship readers have with the literal words on the page. “Books are a map of grammar, syntax, and word order used to direct our attention to a certain meaning,” he explained. This transforms the previous abstract question to a new, more concrete one: How do best-selling writers write?


In order to find an answer, Jockers and Archer created a corpus of 4000 books. This database included 513 books that stayed on the bestsellers list for 5-10 weeks, 250 self-published books, and books that were traditionally published but not bestsellers. They then found 2800 features that differentiated between bestselling books and non-bestsellers, and broke them into four categories: themes, character, style, plot. Besides studying the distribution of these features in each book, they also measured the emotional valence of each sentence of a book in order to chart plot movement. They found that the bestsellers had even waves of changing valence, meaning a constant building and relieving of tension throughout the story.

These plots then allowed Jockers to make some interesting comparisons. In one chapter, he extrapolates on the similarities between The Da Vinci Code and 50 Shades of Grey, as the emotional valence plots for these two books were almost identical. This is fascinating because a typical New York Times review would paint these novels as entirely different, praising one for its exploration of religious themes and criticizing the other for its superficial indulgence in sex. But with the use of technology, Jockers has unlocked essential similarities that few literary critics would have picked up on from simply reading.

Distribution of Words according to Zipf’s Law

So, back to art –To all these literary critics, Jockers would say that he is “interested in making objects unfamiliar.” And that’s exactly what he’s done. In linguistics, there is a common word distribution known as Zipf’s law that has declared how many times common words will be used in a good book. Jockers charted the distribution of common word usage for each bestselling book against Zipf’s distribution with the hypothesis that the bestselling books would diverge from the pattern – and most of them did. “Art must break this distribution,” he explained. Art must be familiar enough that we can recognize its themes and metaphors, yet different enough that we feel as though we are reading something new.

“So what?” you may ask. Jocker’s co-author Jodie Archer did, commenting that we already know that these books are bestsellers because people like them, so what is the point of analysis showing us what we already know? Jockers was not fazed by this. “What mattered was not what the machine told us but that it could do it at all,” he said. “That meant that there was a pattern to be revealed.” Jockers is still holding on to the idea of an “essential truth,” but is looking for ways of discovery outside what is already known. In this way, his analysis itself is an art, a mixture between the familiar and the unknown. And next time you want to discount a bestselling book for having no structural substance, you’ll have a reason to think twice.