AI could rescue scientific papers from the curse of jargon

Models like ChatGPT and Claude can rewrite papers to make them easier to read — but can they do it without compromising scientific accuracy?

As an experiment, I once tried to rewrite an entire scientific paper from a biology journal. (The topic didn’t and doesn’t really matter, but it was about platypus genetics.) I wanted to see if I could rescue the writing from itself: banish some of the jargon, avoid abbreviations, cut up the long paragraphs, and provide more guidance to readers — all without sacrificing the scientific content in its full complexity. 

If that turned out to be doable at a reasonable cost, then perhaps I could spin up a project or company to provide such a service and help improve the quality of scientific writing overall.

It didn’t work — not at a reasonable cost. My work did, I believe, greatly improve the readability of the article. But it took me weeks. Considering that perhaps millions of academic papers are published every year (nobody knows the true number), my project was never going to even make a dent. Not unless I could somehow hire an army of extremely competent writers who could both understand complicated scientific text and instill it with style, clarity, and fun.

Well, one could say we do have such armies now. They’re called “large language models” (LLMs) — like ChatGPT, “the New Bing,” or Claude. Much has been written on the potential for these AIs, which are trained to autocomplete text, to transform various aspects of life and work. Can they do that for scientific writing?

People complain about scientific publishing all the time. Unlike most other forms of writing, neither the authors nor the readers enjoy it much.

Reasonable caution and excessive suspicion

Academic papers tend to be inherently complex due to their subjects, but many are also needlessly hard to read because of bad writing. A growing body of evidence shows that scientific papers have been getting harder to read over the last century. Acronyms, jargon, obscure words, and long sentences are on the rise, and there is no sign of these trends slowing down.

Powerful new text-generating AIs offer a potentially low-cost way to make academic research much easier to read and understand.

However, the initial suggestion from major publishers suggests that it’s not happening anytime soon. The prestigious journal Science has banned ChatGPT not only from being cited as an author, but also from being used altogether for submitted papers. Most people in scientific publishing agree that the industry will have to figure out how to deal with AI-generated writing sooner or later, but the current atmosphere is one of caution and suspicion.

It’s not hard to understand why. As of yet, large language models aren’t reliable. They recombine information in ways that can be misleading, sometimes in subtle ways. They can also make up completely false yet plausible looking statements, something that the AI community calls “hallucinating.” One particularly worrying kind of hallucination for academic writing is when LLMs support their statements with references that don’t exist. Considering that science relies on accurate, factual information, these are major concerns.

But while it is reasonable to avoid granting author status to LLMs, since they cannot take responsibility for what they write, banning them outright seems excessive. There are many ways that they can help as a tool. 

For example, they can compensate for a set of skills that scientists usually lack — not because they’re incompetent, but simply because there’s no strong incentive to get it: namely, the ability to make their papers as pleasant to read as possible.

Academic papers tend to be inherently complex due to their subjects, but many are also needlessly hard to read because of bad writing — and they’re getting harder to read and understand.

Why reading pleasure matters

Reading pleasure in an academic setting encompasses several aspects. One is whether the basic ideas under discussion are interesting at all. Another is clarity, or readability. It’s hard to have a good time reading lengthy paragraphs that are awkwardly structured, or riddled with abbreviations and jargon. 

Another aspect is more ineffable: it is something we could describe as style, elegance, or perhaps “the spark of life.” It is the pleasure that comes from words that feel alive. It is what sometimes makes us read long essays about complex topics in which we’re not particularly interested, just because it is clear that the author cared about the craft.

Style and readability are, of course, less important to scientific writing than the novelty and accuracy of results, but they do matter. They matter because they make papers more widely read, which makes it easier to use their discoveries — and catch their mistakes or biases. They matter, too, because they lighten the ever-growing burden of knowledge. They matter because science should be fun, if we want it to attract the brightest minds and have the most positive impact possible. 

[pull quote: “Readability matters because science should be fun.”]

Could LLMs help with either readability or style, or both? It’s still early, but signs suggest that they can. 

Some experimenting

Consider the first few sentences in the abstract of that platypus genetics paper I mentioned:

“The specific recognition of antigen by T cells is critical to the generation of adaptive immune responses in vertebrates. T cells recognize antigen using a somatically diversified T-cell receptor (TCR). All jawed vertebrates use four TCR chains called α, β, γ, and δ, which are expressed as either a αβ or γδ heterodimer.” 

I asked an LLM, Claude, created by the company Anthropic, to rewrite the abstract in a way that beginners would understand, but without taking out any relevant information. Here are the first sentences again:

“The immune system’s T cells recognize foreign substances in the body using receptors on their surface. These T-cell receptors (TCRs) are made up of proteins called chains. Most vertebrate animals, including mammals, use four types of TCR chains called α, β, γ, and δ. These chains pair up to form two types of TCRs: αβ and γδ.”

I’m not sure that all of it is 100% accurate — it should be “all” (jawed) vertebrates, not “most”, and the phrase “proteins called chains” sounds wrong. It’s possible that important pieces have gone missing, like the fact that TCRs are diverse. 

Otherwise, to the extent that we can conclude anything from a passage this short, it seems to be more readable. There’s less jargon: “pair up” replaces “heterodimer.” The structure of sentences is better: the beginning of each sentence refers to the end of the previous one (a good practice that George Gopen and Judith Swan describe as “putting old information in the topic position” in their essay “The Science of Scientific Writing”). 

Powerful new text-generating AIs offer a potentially low-cost way to make academic research much easier to read and understand.

The failure of the original abstract to contextualize each new sentence was one of its flaws, and Claude was able to fix it pretty much on the first try.

More involved revisions might be beyond the reach of current LLMs. There is the limitation that writing prompts for AI is not necessarily easier than writing abstracts by hand. One of my improvements, when I rewrote the article myself, was to clearly spell out the goal of the study in the first sentence. I could probably get an LLM to do this if I insisted enough, but I might as well do it myself. On the other hand, progress in AI continues apace, and maybe more advanced models will soon be able to figure out that they should rearrange the structure of a paragraph even if they’re given a generic prompt.

Getting LLMs to generate fun or elegant styles is also tricky at the moment. No matter the prompt, their output seems to always lack the spark of life. Still, it’s conceivable that they’ll get there. I already get interesting results when I ask for a rewrite in the style of Charles Darwin, a way to summon the pleasant (if overwrought) writing style of 19th-century scientific communication:

“The acute detection of unfamiliar substances by T cells proves indispensable for crafting adaptable protections throughout vertebrates. T cells discern intruders through deploying a T-cell receptor (TCR) of modified and diversified kinds. All vertebrates endowed with jaws possess four TCR chains called α, β, γ, and δ, materializing either as an αβ or γδ alliance.”

“All vertebrates endowed with jaws” isn’t the standard way to refer to this group of animals, but one can’t deny that it’s an elegant alternative. 

Is scientific writing an ideal case for generative AI?

People complain about scientific publishing all the time. Unlike most other forms of writing, neither the authors nor the readers enjoy it much — to say nothing of other actors involved in the process, like reviewers. 

This suggests that scientific writing could be an ideal task to outsource to generative AI, especially as LLMs improve further. AI could be used to ease the work of scientists, who would be free to devote more time to research, or to write fun blog posts about their work, which they can instill with their own spark of life. 

Readability matters because science should be fun.

At the other end of the publishing pipeline, it would ease the work of readers who currently have to contend with unreadable jargon. Unlike controversial applications of AI like image synthesis, it’s difficult to imagine that many people would lament changes to a process that is so often a source of frustration. 

Of course, that doesn’t mean that LLM-made papers would instantly solve all of science’s problems. We’re far from automating the task of generating new knowledge. Even strictly in the realm of scientific literature, there are many limitations. A baseline level of complexity will always prevent papers from becoming as fun and readable as, say, a viral post on social media.

But provided that prestigious publishers aren’t able to resist the tide too much, the advent of LLMs spells good news for those who write science — and for those who read it. 

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].

Why a neurodivergent team will be a golden asset in the AI workplace
Since AI is chained to linear reasoning, workplaces that embrace it will do well to have neurodivergent colleagues who reason more creatively.
When an antibiotic fails: MIT scientists are using AI to target “sleeper” bacteria
Most antibiotics target metabolically active bacteria, but AI can help efficiently screen compounds that are lethal to dormant microbes.
OpenAI and Microsoft are reportedly planning a $100B supercomputer
Microsoft is reportedly planning to build a $100 billion data center and supercomputer, called “Stargate,” for OpenAI.
Can we stop AI hallucinations? And do we even want to?
“Making stuff up” and “being creative” may be two sides of the same coin — but you have to be able to tell the difference.
When AI prompts result in copyright violations, who has to pay?
Who is responsible for copyright violations when they’re produced by generative AI? The technology is outpacing the law.
Up Next
lip-reading glasses
Subscribe to Freethink for more great stories