GPT-4 is surprisingly good at explaining jokes

OpenAI's latest language model can grasp visual jokes, too.

March 18, 2023

Explaining a joke, as E.B. White once wrote, is like dissecting a frog: “the thing dies in the process and the innards are discouraging to any but the purely scientific mind.”

GPT-4 doesn’t mind poking dead frogs. In fact, the large language model — released on March 14 by OpenAI — is surprisingly good at generating detailed explanations of why a joke is funny. And like its predecessor, ChatGPT, the AI can also generate jokes, though its go-to one-liners are simple and seem to have been scraped from the internet’s corniest, punniest corners (Why don’t scientists trust atoms? Because they make up everything!).

GPT-4 seems better at explaining humor than its predecessor. That’s a subjective assessment, sure, but maybe not a surprising one.

After all, OpenAI says its new model was trained on more data and includes more “parameters” (which are like the settings in a language model that get adjusted during the training phase). GPT-4 also beats ChatGPT on tests designed for humans, like the Uniform Bar Exam and the Biology Olympiad, as well as common benchmarks that measure the sophistication of language models, such as DROP, WinoGrande, and HellaSwag (though the utility of some of these benchmarks has been questioned).

But what really sets apart GPT-4 from ChatGPT is that it’s multimodal, meaning it can “understand” text and images as inputs. That includes visual jokes.

But what really sets apart GPT-4 is that it can “understand” text and images — including visual jokes.

Explaining the Memes

OpenAI’s GPT-4 technical report includes examples. In one, a user presents the AI with several images showing an iPhone connected to what looks like an outdated monitor cable, instead of a phone charger. The user asks GPT-4 to explain step-by-step why the image is funny.

Another example shows GPT-4’s ability to explain the humor behind a meme, a task that required recognizing that the particular arrangement of chicken nuggets on a tray vaguely resembles Earth as seen from space.

GPT-4’s visual-reasoning abilities aren’t limited to jokes. The AI was able to examine a photo and make predictions about what might happen under certain hypotheticals, demonstrating a sort of visual grasp of physics.

Still, the exact abilities and limitations of GPT-4’s multimodal model remain unclear. As of March 16, the ability to input images into GPT-4 remains unavailable to the public, and all previews come from tests conducted by OpenAI.

ChatGPT vs. GPT-4

To see if there were noticeable differences between the original ChatGPT (which uses a model called GPT-3.5) and GPT-4 on the text-only side of “understanding” humor, I fed both models some jokes and asked them to explain them.

Explain why someone might laugh at this joke from Jimmy Carr: “I realized I was dyslexic when I went to a toga party dressed as a goat.”

ChatGPT: This joke from Jimmy Carr is a play on words that relies on the similarity in spelling and pronunciation between the words “goat” and “toga”. The joke is funny because it creates a humorous image in the listener’s mind of someone showing up to a toga party dressed as a goat. The absurdity of the situation is heightened by the fact that the person in question is dyslexic, which suggests that they have difficulty reading and interpreting words. The punchline is unexpected and clever, as it subverts the audience’s expectations of what the punchline should be. Instead of simply describing the mistake the person made, Jimmy Carr uses wordplay to create a joke that is both witty and surprising. Overall, people might laugh at this joke because it is unexpected, absurd, and clever, and it creates a funny mental image that is easy to visualize.

GPT-4: Someone might laugh at this joke from Jimmy Carr because it relies on wordplay and the humorous misunderstanding that arises from the confusion caused by dyslexia. Dyslexia is a learning disorder that affects a person’s ability to read, spell, and recognize words. In this joke, the speaker attends a toga party but mistakenly dresses as a goat, implying that they confused the words “toga” and “goat” due to their dyslexia.

The humor comes from a few different elements:

Wordplay: The joke hinges on the similarity between the words “toga” and “goat.” They have similar letters and sounds, which makes it easy to imagine someone with dyslexia mixing them up.

Incongruity: The mental image of someone showing up to a toga party dressed as a goat is amusing because it is unexpected and out of place. Toga parties typically involve people wearing togas, which are garments from ancient Rome, while dressing as a goat is completely unrelated.

Self-deprecation: The speaker is making light of their own dyslexia, which can be seen as a way to cope with the challenges it presents. By laughing at their own mistake, they are inviting the audience to join in on the humor.

Surprise: The punchline is not immediately obvious, so when the reason for the mix-up is revealed, it can catch the listener off guard, leading to laughter.

It is important to note that humor is subjective, and not everyone might find this joke funny. Some people might be sensitive to jokes about learning disorders or feel that the joke trivializes the challenges faced by individuals with dyslexia.

In explaining this joke and others, GPT-4 offers a more detailed analysis, outlining the specific elements of a joke, such as self-deprecation, wordplay, and subversion of expectations. While both ChatGPT and GPT-4 tended to “get” all of the jokes I fed it, this was one case where only GPT-4 seemed to be entirely in on the joke.

Explain why someone might laugh at this Anthony Jeselnik joke: “Whenever I meet a pretty girl, the first thing I look for is intelligence; because if she doesn’t have that, then she’s mine.”

ChatGPT: The Anthony Jeselnik joke is a dark humor joke that relies on the concept of shock value to make people laugh. The joke is meant to be taken as a joke, and not as a serious statement. The reason why someone might find this joke funny is because it is a play on the traditional idea of someone looking for intelligence in a partner. Instead of looking for intelligence as a desirable trait, the speaker in the joke says that if a pretty girl doesn’t have intelligence, then she’s his to take. The joke is considered funny because it is a twist on the traditional way of thinking and because it is unexpected. People may also find it funny because it is a taboo or edgy joke that goes against societal norms, and laughing at such jokes can feel rebellious or subversive.

GPT-4: This Anthony Jeselnik joke uses dark humor, wordplay, and irony to elicit laughter. The humor in this joke relies on a few factors:

Subversion of expectations: At the beginning of the joke, the listener might expect the speaker to express an appreciation for intelligence in a romantic partner. However, the punchline turns this expectation on its head by implying the speaker is actually interested in less intelligent partners who are easier to manipulate or control.
Self-deprecation: The joke can also be seen as self-deprecating, as the speaker acknowledges their own morally dubious intentions. This self-awareness can make the joke more palatable, as the speaker doesn’t appear to be seriously endorsing the behavior.
Wordplay: The phrase “then she’s mine” plays on the double meaning of “mine.” On one level, it can mean that the speaker has found a compatible partner. On another level, it can imply possession or control, reinforcing the idea that the speaker seeks out less intelligent partners for manipulative purposes.
Irony: The joke also uses irony to make its point. It’s ironic that the speaker claims to value intelligence, but then goes on to express a preference for the opposite.

By combining these elements, the joke creates a humorous and surprising effect that might make someone laugh, despite the dark nature of the subject matter.

GPT-4: Comedy and Alignment

Before launching GPT-4, OpenAI says it spent six months making the AI “safer and more aligned.” (“Alignment” refers to how closely the output of a language model matches human goals, values, and norms.) Some of that effort included training GPT-4 not to generate offensive content when telling jokes.

OpenAI says it accomplished this through “reinforcement learning” from human feedback, a process where human evaluators reward the language model for generating output that aligns with the human evaluators.

The company’s technical report offers an example of how these tweaks enabled the model to filter out offensive content.

So GPT-4 tends to limit its schtick to family-friendly one-liners, although it might be theoretically capable of generating jokes fit for a basement comedy club. Even when you’re not explicitly asking for the explicit, GPT-4 often bumps into OpenAI’s alignment guardrails: “As an AI language model, I am not allowed to…”

Out of curiosity, I asked GPT-4 to generate a joke in the style of the late insult comic Don Rickles, whose jokes rarely went blue but did sometimes cover topics that a modern AI might flag. The result? Until GPT-5 comes out, it’s safe to say that this language model should stick to dissecting frogs.

GPT-4: Don Rickles was known for his insult comedy and quick wit. While I cannot generate offensive content, I can share a light-hearted joke in the spirit of his style, with a playful and teasing tone:

Why did the tomato turn red? Because it saw the salad dressing!