Spending 12-14 hours a day managing investors’ money doesn’t leave me a whole lot of time to sit around watching TV. And since I have probably less than 10% of the ad-tolerance of a typical American audience member, I inevitably turn to TiVo, Netflix, or similar, to watch a commercial-free show. Which means that I am inevitably several years behind the cognoscenti of the au-courant. This has its pluses: I avoid a lot of drivel that way.
So it was that I recently tuned in to watch Deadwood, a masterpiece of modern drama written by the talented David Milch, of NYPD Blue fame. The setting of the show is unpromising: a mud-caked camp in South Dakota around the turn of the 19th century that appears to portend yet another formulaic Western featuring liquor, guns, gals and gold and not much else. The first episode appeared at first to confirm my lowest expectations. I struggled through the second. But by the third I was hooked.
What makes Deadwood such a triumph are its finely crafted plots and intricate sub-plots; the many varied and often complex characters, superbly played by Ian McShane (with outstanding performances by Brad Dourif, Powers Boothe, amongst an abundance of others, no less gifted); and, of course, the dialogue.
Yes, the dialogue: hardly the crowning glory of the typical Hollywood Western. And here, to make matters worse, almost every sentence uttered by many of the characters is replete with such shocking profanity that one is eventually numbed into accepting it as normal. But once you get past that, something strange and rather wonderful overtakes you: a sense of being carried along on a river of creative wordsmith-ing that at times eddies, bubbles, plunges and roars its way through scenes that are as comedic, dramatic and action-packed as any I have seen on film. For those who have yet to enjoy the experience, I offer one small morsel:
Around the start of Series 2 a rather strange idea occurred to me that, try as I might, I was increasingly unable to suppress as the show progressed: that the writing – some of it at least – was almost Shakespearian in its ingenuity and, at times, lyrical complexity.
Convinced that I had taken leave of my senses I turned to Google and discovered, to my surprise, that there is a whole cottage industry of Deadwood fans who had made the same connection. There is even – if you can imagine it – an online quiz that tests if you are able to identify the source of a number of quotes that might come from the show, or one of the Bard’s many plays. I kid you not:
Intrigued, I took the test and scored around 85%. Not too bad for a science graduate, although I expect most English majors would top 90%-95%. That gave me an idea: could one develop a machine learning algorithm to do the job?
Here’s how it went.
We start by downloading the text of a representative selection of Shakespeare’s plays, avoiding several of the better-known works from which many of the quotations derive:
Let’s build an initial test classifier, as follows:
It seems to work ok:
So far so good. Let’s import the script for Deadwood series 1-3:
DeadWood = Import[“……../Dropbox/Documents/Deadwood-Seasons-1-3-script.txt”];
Next, let’s import the quotations used in the online test:
We need to remove the relevant quotes from the Deadwood script file used to train the classifier, of course (otherwise it’s cheating!). We will strip an additional 200 characters before the start of each quotation, and 500 characters after each quotation, just for good measure:
and so on….
Now we are ready to build our classifier:
Let’s see how it performs:
Or, if you prefer tabular form:
The machine learning model scored a total of 19 correct answers out of 23, or 82%.
Let’s take a look as some of the questions the algorithm got wrong.
Quotation no. 13 is challenging, as it comes from Pericles, one of Shakespeare’s lesser-know plays and the idiom appears entirely modern. The classifier assigns an 87% probability of Milch being the author (I got it wrong too).
On Quotation no. 15 the algorithm was just as mistaken, but in the other direction (I got this one right, but only because I recalled the monologue from the episode):
Quotation no.16 strikes me as entirely Shakespearian in form and expression and the classifier thought so too, favoring the Bard by 86% to only 14% for Milch:
Quotation no. 19 had the algorithm fooled completely. It’s a perfect illustration of a typical kind of circumlocution favored by Shakespeare that is imitated so well by Milch’s Deadwood characters:
The model clearly picked up distinguishing characteristics of the two authors’ writings that enabled it to correctly classify 82% of the quotations, quite a high percentage and much better than we would expect to do by tossing a coin, for example. It’s a respectable performance, but I might have hoped for greater accuracy from the model, which scored about the same as I did.
I guess those who see parallels in the writing of William Shakespeare and David Milch may be onto something.
The Hollywood Reporter recently published a story entitled
It’s a fascinating account of the trials and tribulations of this great author, one worthy of Deadwood itself.
A silver lining to this tragic tale, perhaps, is that Milch’s difficulties may prompt him into writing the much-desired Series 4.
One can hope.