Writing comedy is hard. Humor is often subjective, so what you find funny, others may not. And comedy writers need to remember to include critical ingredients: timing, delivery, originality, and avoiding cliché. They are constantly walking the line between funny and unfunny.
So, how would AI, specifically OpenAI’s ChatGPT 3.5, fare as a comedy writer? Can it even be funny? If AI and humans were compared, who’d be funnier? In a recently published study, researchers from the University of Southern California (USC) found the answers.
“Since ChatGPT can’t feel emotions itself, but it tells novel jokes better than the average human, these studies provide evidence that you don’t need to feel the emotions of appreciating a good joke to tell a really good one yourself,” said Drew Gorenz, a doctoral candidate in social psychology at the USC Dornlife College of Letters, Arts and Sciences, an amateur stand-up comedian, and the study’s lead and corresponding author.
Some prior research has looked at whether ChatGPT can produce humorous writing, but not by comprehensively evaluating the AI’s output and comparing it to human comedy writing. So, Gorenz and Norbert Schwarz, a Provost Professor of Psychology and Marketing, set out to do just that by conducting two studies.
In the first study, a group of US adults was asked to complete three different comedy-writing tasks. In the acronym task, they were asked to generate a new, humorous phrase for the acronyms ‘S.T.D.’, ‘C.L.A.P.’ and ‘C.O.W.’.
For the fill-in-the-blank test, they had to fill in the blanks for three items. One of the items was, ‘A remarkable achievement you probably wouldn’t list on your resume: ________.’
Finally, there was the roast joke task, where participants had to create a humorous response to a fictional scenario. For example, ‘Imagine that one of your friends wants your opinion on how well she sings. She sings a minute or two to demonstrate her voice, and you cringe – she might be the worst singer you’ve ever heard. When she asks, “So how was it?” you decide to be honest, so you say, “To be honest, listening to that was like ________.”’
Then, ChatGPT 3.5 was given the same tasks.
A separate group of adults rated the funniness of the responses on a seven-point scale, from zero (not funny at all) to six (very funny). ChatGPT’s responses were rated funnier than the human responses, with 69.5% of participants preferring them (26.5% preferred the human responses, and 4.0% thought both were equally funny).
“Overall, ChatGPT 3.5 performed above 63% to 87% of human participants depending on the humor task,” said the researchers. “ChatGPT 3.5 showed particularly strong performance in the roast joke task. We found this result particularly interesting given the aggressive nature of the task. Given that ChatGPT is designed not to generate any speech that could be considered offensive or hateful, the opposite prediction could have been made.”
For the second study, the researchers compared ChatGPT’s ability to write satirical news headlines like those seen on The Onion. Because ChatGPT doesn’t regularly receive world news updates, the researchers drew the last 50 headlines from The Onion’s ‘Local’ news section before October 1, 2023. An example was ‘Man Locks Down Marriage Proposal Just As Hair Loss Becomes Noticeable’. The headlines were given to ChatGPT, and the AI was asked to generate 20 new headlines.
A group of USC psychology students rated the funniness of the AI-generated satirical headlines on the same seven-point scale used in the first study. The students were also asked to rate how much they sought out comedy, including satirical news. Those who self-reported seeking out comedy more and reading more satirical news rated the headlines as funnier, independent of whether they were AI-generated or produced by professional writers. Based on mean ratings, 48.8% preferred The Onion’s headlines, 36.9% preferred the headlines generated by ChatGPT, and 14.3% showed no preference.
“Participants, on average, rated the headlines as similarly funny, indicating that the average participant did not discern a difference in quality,” the researchers said. “This is particularly interesting given the high standard of comparison (i.e., professional comedy writers) in this study.”
Interesting, yes, but also concerning. Something that’s acknowledged by the researchers.
“That ChatGPT can produce written humor at a quality that exceeds laypeople’s abilities and equals some professional comedy writers has important implications for comedy fans and workers in the entertainment industry,” they said. “For professional comedy writers, our results suggest that LLMs [large language models like ChatGPT] can pose a serious employment threat.”
The study was published in the journal PLOS One.
Source: USC
But when given full creative freedom, I'm certain ChatGPT would fall far behind.
The human writers consisted of "a group of US adults" with the evaluation done by "[a] separate group of adults".
Were the groups large enough that statistical analysis could be meaningful?
What were their compositions in terms of age, gender, background, etc.?
Since "[w]riting comedy is hard" is it reasonable to expect that an inexperienced group would produce much? “Even a blind squirrel occasionally finds a nut,” but can comparing a group of them to a device that can shake the tree really provide any worthwhile results? Remember that ChatGPT has internet access to vast resources including the psychology, techniques, and examples of comedy writing.
Only a repeated test in which ChatGPT jokes were consistently preferred to that of a group of professionals would warrant the conclusion that the latter should be concerned.