When machines meet nonsense: how ChatGPT reacts to gibberish
7056 views
Being asked to define the word upknocking would stump most people today. In fact, unless you're a Victorian-era worker in England, you probably have no idea it once meant the job of waking people at dawn before alarm clocks existed. For most of us today, upknocking is just gibberish, nonsense, a nonword.
But for ChatGPT, it's a different story. A study published in PLOS One by Michael S. Vitevitch, a professor in the Department of Speech-Language-Hearing at the University of Kansas, explored exactly this kind of challenge. The focus of the research is one question: How does ChatGPT handle nonsense?
The research team turned to a time-honored tool in psychology and linguistics, nonwords. These are letter strings or sounds that look and sound like words but don't actually mean anything. Think blork, smeef, or the famous "wug" from the classic wug test, where children demonstrate they know how to form plurals by saying "wugs." Nonwords have been central to understanding how humans process language for more than a century. Now, they're being used to probe machines.
Nonwords are like linguistic crash tests. Just as engineers smash cars into walls to see how they behave under stress, psychologists throw fake words at people to see how our brains handle them. Can we remember them? Can we make them plural? Do they "sound right"?
When the researcher applied these same tricks to ChatGPT, he wasn't trying to figure out if the AI "thinks" like us. He knows it doesn't. Instead, he wanted to map out where humans and machines align, and where they part ways. That knowledge, he argues, can guide us in building AI that doesn't just mimic us, but complements us.
The study unfolded as a series of experiments, each playfully designed yet scientifically sharp. In study 1, the focus was extinct words. Vitevitch fed ChatGPT a list of 52 English words that had fallen out of use, like flothery (meaning "slovenly but trying to be fancy") or wangary (flabby meat). To a modern English speaker, these are basically gibberish.
The AI, however, nailed 69% of them, providing accurate definitions that most humans would never know. About 21% of the time it admitted ignorance ("not a recognized word"), and in 10% of cases it "hallucinated", making up definitions that sounded plausible but were wrong.
For humans, this is astonishing. Our collective memory of language is limited; we quickly forget old words just as we forget old presidents' names beyond a few generations. ChatGPT, trained on vast amounts of text, effectively extends our memory, retrieving meanings long gone from everyday use.
The focus of study 2 were Spanish words as nonsense. In this case, Spanish words treated as "nonwords" in English. For example, boda ("wedding" in Spanish). The task: find an English word that sounds similar.
When prompted with just "Give me a word that sounds like boda," ChatGPT sometimes answered with Spanish words instead, revealing its multilingual training. But when explicitly told "Give me an English word," it performed much like humans in similar tasks, suggesting words that differed by only one sound, such as quota for boda.
This highlights both a strength and a weakness. The model can detect phonological similarity, but it doesn't always follow the unspoken social rules humans take for granted, like sticking to the same language in a conversation.
Study 3 investigated how wordlike a fake word is. To this end, Vitevitch asked ChatGPT to rate nonsense words on a scale from "bad English word" to "good English word," and also how likely someone would be to buy a product with that name.
Surprisingly, its ratings closely mirrored human judgments. Both humans and ChatGPT were swayed by "phonotactic probability", the statistical likelihood that certain sound patterns occur in English. For instance, plim feels more wordlike than zqor. The more wordlike a name sounded, the more buyable it seemed, echoing studies of human branding psychology. ChatGPT, in effect, showed it had internalized some of our biases about language and persuasion.
And finally, study 4: making up new words. OK, let's admit it. This is the fun part: invent new words for new concepts. One prompt asked for a word meaning "the anger you feel when someone wakes you up." The AI coined rousrage (rouse + rage). For "fear of being watched by a platypus," it offered platypobia.
Some creations were clever blends (like stumblop for tripping over your own feet), while others accidentally recycled old, extinct words with new meanings. This mix of creativity and error resembled the human tendency to make malapropisms, using the wrong but similar-sounding word.
Taken together, these studies reveal that ChatGPT doesn't process language the way humans do, but it often ends up in similar places. Where humans rely on memory, intuition, and social rules, ChatGPT relies on statistical patterns in text. Yet both can converge on wordlike judgments, plausible definitions, and even playful inventions.
But the differences are just as telling. Humans know when it's inappropriate to switch languages mid-sentence; ChatGPT sometimes doesn't. Humans forget extinct words; ChatGPT remembers them. Humans are naturally creative in ways that stretch meaning; ChatGPT imitates this but sometimes stumbles.
Vitevitch argues this complementarity is the real opportunity. Instead of aiming to build AI that perfectly mimics human cognition, we should focus on systems that extend us, helping where our memory, attention, or creativity falls short. Just as airplanes don't flap their wings like birds but still let us fly farther than any bird could, AI doesn't need to "think" like us to be useful.
In the end, when it comes to nonsense, what starts as nonsense always ends up making a lot of sense.
If you want to learn more, read the original article titled "Examining Chat GPT with nonwords and machine psycholinguistic techniques" on PLOS One at https://doi.org/10.1371/journal.pone.0325612.