ChatGPT answers physics questions like a confused C student

ChatGPT doesn’t understand physics, but it memorizes very well and puts in extra effort.
Subscribe to Freethink on Substack for free
Get our favorite new stories right to your inbox every week

The first thing you’ll notice when you ask ChatGPT a question is how smart and knowledgeable its answer sounds. It identifies the proper topic, speaks in intelligible sentences, and employs the expert tone of an educated human. The million-dollar question is: Does the AI give correct answers?

While ChatGPT (or any other chatbot) is obviously not sentient, its output is reminiscent of a person in certain ways. That’s not surprising, given that it mimics human language patterns. I’ve described ChatGPT as a parrot watching a million years of soap operas. The AI is very good at stringing together sentences simply because it has seen so many of them — it just doesn’t understand them.

But given its demonstrated abilities, such as acing a microbiology quiz, I asked ChatGPT a battery of physics questions, from relatively simple undergraduate subjects to specialized expert topics. I wasn’t interested in its ability to recite information or crunch numbers. (You can ask WolframAlpha or a search engine to do this.) Instead, I wanted to see if ChatGPT could interpret and give useful responses to the kinds of questions that a specialist human might be expected to answer.

A mediocre C student

All told, ChatGPT’s performance wasn’t up to par for an expert. It reminded me of a hardworking C student: one who doesn’t understand the material, but memorizes very well and puts in extra effort to eke out credit and pass the class. Let’s look at this in more detail.

The AI usually begins by regurgitating your question using more words or redefining the term you asked it about. (Thanks, but I have 50 exams to grade, so please don’t waste my time.) It later re-regurgitates, forming a miniature conclusion. (Now I’m getting irritated. A strong student gives concise, correct answers. A weaker student stumbles through long answers with convoluted explanations.)

In response to a simple question, ChatGPT generally produces three or four paragraphs of output. This usually contained the right answer, which was impressive. However, it sometimes included additional wrong answers. It also often contained extraneous details, related but unimportant facts, and definitions of partially irrelevant terms. The breadth of concepts imparted from its training is impressive, but the links between them are often nebulous. It can tell you what, but not why.

If I asked you why it was dark in here, and you said, “Because the light is off,” you’d be correct, but you’re not really telling me anything useful. I hope you wouldn’t go on to tell me about the definition of light, how light can be measured, and what colors make up light before summarizing that something that’s dark isn’t light. But that’s the sort of answer ChatGPT would provide.

ChatGPT’s word salad

When asked a harder question, ChatGPT tries to score points by shotgunning you with answer pellets. Each answer says a modest amount, using a lot of unnecessary words. In this way, the AI reminds me of a student who lacks full conceptual understanding and gives multiple explanations, elaborated in confusing ways, hoping to hit on something correct for partial credit and win extra points for effort.

ChatGPT’s response to each of my difficult questions consisted of a mix of good correct answers, partially correct answers with incorrect portions, answers that stated factual information but didn’t ultimately explain anything, answers that might be true but were irrelevant, and answers that were dead wrong. The wrong answers included full explanations that sounded reasonable, but were total nonsense on close reading.

Confoundingly, I cannot predict when the AI will give a right answer or a wrong one. It can give a confused response to a simple question and an impressive reply to an arcane query. ChatGPT also throws extraneous related information on top for brownie points, but often this just gets it into trouble.

Confident but wrong

More than once, I received an answer in which the AI would start by giving a correct definition. (Usually, it was restating the Wikipedia entry related to the topic, which is the student equivalent of rote memorization.) Then the AI would elaborate but say something completely wrong or backward. This reinforces my impression that the model seems well trained on what concepts are linked together, but it is unable to capture the nature of those relationships.

For example, ChatGPT knows A is related to B. However, it often doesn’t know if A implies B, or if A precludes B. It may mistake whether A and B are directly correlated or inversely correlated. Possibly A and B are just similar topics with no relevant relationship, but when asked about A, it tells you about A and then yammers on about B.

Beyond tabulating right and wrong scores, human factors matter in a human evaluation of the AI. It’s easy to overestimate ChatGPT’s ability because of its writing and tone. The answers are written well, read coherently, and give the impression of authority. If you don’t know the true answer to your own question, ChatGPT’s answer will make you believe that it knows.

This is troubling. If someone is a fool and talks like one, we can easily tell; if someone is a fool but well spoken, we might start to believe them. For sure, ChatGPT could give you the right answer or useful information. But it could just as eloquently and convincingly give you a wrong answer, a convenient or malicious lie, or propaganda embedded by its training data or humanhands. ChatGPT may be a C student, but C students run the world.

This article was reprinted with permission of Big Think, where it was originally published.

Subscribe to Freethink on Substack for free
Get our favorite new stories right to your inbox every week
Related
The missing tech case for how we create an era of abundance
AI and other new technologies could make things that are costly and scarce today, cheap and abundant for all tomorrow.
Why America reinvents itself every 80 years — and is doing so again
Three separate theories help explain why America enters a period of great progress every 80 years — and why another is coming soon.
How DeepSeek rewrote the rules of the AI race
Chinese startup DeepSeek has proven that vast quantities of capital and cutting-edge chips aren’t prerequisites for world-class AI.
Kevin Kelly points a new way forward into the Age of AI
One of the most original and optimistic thinkers in America helps build out some big through lines on what’s possible with AI in the next 25 years.
The artifact isn’t the art: Rethinking creativity in the age of AI
ChatGPT’s Studio Ghibli imitations invite questions about the creative value of people and what we really mean when we talk about creativity.
Up Next
Exit mobile version