From the whereabouts of his skull to questions of authorship, the legendary playwright William Shakespeare attracts controversy as well as acclaim. Now, researchers have used a machine learning algorithm to analyze the Bard's writing style and that of some of his contemporaries, and determined which of his plays were most likely collaborations and with whom. Based on this and other recent research, an upcoming scholarly collection will for the first time credit Christopher Marlowe as co-author on all three Henry VI plays.

Questions of Shakespeare's attribution have been debated for centuries. Some plays, like The Two Noble Kinsmen, are accepted to have been collaborations, while others, like Titus Andronicus, are only suspected. These became the focus of the study, conducted by information scientists at the University of Pennsylvania, Alejandro Ribeiro, Santiago Segarra and Mark Eisen, with the help of a Shakespearean scholar, Gabriel Egan.

The team used an algorithm to study the writing style in the selected texts. The basic idea behind this is nothing new: previous computational approaches have analyzed word choice and frequency to create an impression of an author's style, but this method can be thrown off by the subject material. Instead, the researchers ignored subject-dependent key words in favor of essential, "functional" words, and developed an authorial fingerprint based on how close they tend to appear to each other.

"A more reliable approach is to use functional, rather than meaningful, words: 'the,' 'and,' 'or,' 'to,' and so on," explains Segarra. "Everyone has to use these words, so analyzing how they differ between authors gets closer to an objective measure of 'style'."

Word adjacency networks determine how far apart pairs of words typically are from each other in given texts, and build a score based on that(Credit: University of Pennsylvania)

To graph an author's style, the team picked out between 50 and 100 functional words from the texts and trained an algorithm on the complete works of Shakespeare to build a "word adjacency network" of the bard. These networks count the number of words that lay between each pair of the target words, and assign the pair a score. When all of the combinations are given scores and plotted out, it builds a "fingerprint" of a writer's style that, when compared to other texts, is a surprisingly reliable way to identify authorship, according to the researchers.

"For example, if we trained this system on a play by me and a play by Santiago, and then gave it another play written by one of us, it could tell which one wrote it 98 percent of the time," says Ribeiro.

On the left is the word adjacency fingerprint of Shakespeare, with Christopher Marlowe's on the right(Credit: University of Pennsylvania)

Applying the system to Shakespeare's body of work, the team found that the three Henry VI plays are statistical outliers, and were most likely not written entirely by the man himself. So to identify who else's fingerprints may be on those works, the team developed the same kind of networks for other writers of the time such as John Fletcher, Christopher Marlowe, Thomas Middleton, Ben Jonson, and George Peele, some of whom are known or suspected Shakespeare collaborators.

The results suggest that Christopher Marlowe and George Peele are the two most likely suspects in this case, although Marlowe's appears stronger, due to the fact that there aren't enough Peele works for a complete picture. Combined with the historical evidence, as well as complementary conclusions drawn by other recent research, the team feels confident enough to attribute Marlowe as a co-author of those plays in the upcoming collection, New Oxford Shakespeare Complete Works, of which Egan is an editor.

"We're seeing independent studies with different methodologies converging on the same conclusion," says Egan." "The more those independent approaches converge, the more confident we can be."

"There's a very famous riot scene in Henry VI, Part 2 where one of the followers of Jack Cade, a revolutionary, says, 'First thing we do, let's kill all the lawyers'," Egan continues. "I think that Marlowe was responsible for the Jack Cade scenes. Of course, we don't know if they sat together and worked as co-authors. Shakespeare may have adapted those passages afterwards, for example."

The research will appear in an upcoming issue of the journal, Shakespeare Quarterly.

View gallery - 3 images