That’s what she said: Software that tells dirty jokes.
That was the headline of an article at NewScientist.com recently. The story was about a new software program that searches text and finds sentences “appropriate” for adding the rejoinder, “That’s what she said.” This phrase was made famous by the TV series The Office, although it’s been around forever. Here are excerpts from the article:
Double entendres have been making us laugh since the days of Chaucer and Shakespeare, but up until now computers weren’t in on the joke. Chloé Kiddon and Yuriy Brun, two computer scientists at the University of Washington, have developed a system for recognising a particular type of double entendre – the “that’s what she said” joke, in which seemingly innocent sentences can be transformed into lewd utterances by appending just four short words.
The pair describe the “TWSS problem” as recognising when it is funny to follow a sentence with “that’s what she said” – they give “Don’t you think these buns are a little too big for this meat?” as one example. The equivalent in the UK is appending sentences with “as the actress said to the bishop” and is used in the same way.
Automating this process means identifying sentences that contain potential euphemisms and follow a particular structure – a “hard natural language understanding problem”, say the researchers. Kiddon and Brun began by analysing two different bodies of text – one containing 1.5 million erotic sentences, and another with 57,000 from standard literature.
They then evaluated nouns, adjectives and verbs with a “sexiness” function to determine whether a sentence is a potential TWSS. Examples of nouns with a high sexiness function are “rod” and “meat”, while raunchy adjectives are “hot” and “wet”.
Their automated system, known as Double Entendre via Noun Transfer or DEviaNT, rates sentences for their TWSS potential by looking for particular elements such as nouns that can be interpreted in multiple ways. The researchers trained DEviaNT by gathering jokes from twwsstories.com and non-TWSS text from sites such as wikiquote.org.
The system turned out to be around 70% accurate, but the pair say this is deceptively low because much of the training data did not consist of TWSS jokes, and with a more even data set it could achieve 99.5% precision.
(Image: Rachael Voorhees/Flickr)