Bringing lost languages back to life with AI

An algorithm may be able to decipher the ancient languages that have left linguists stumped.

Humans started writing more than 5,000 years ago, and the ancient texts that have survived can give us a peek into the lives of our long-dead ancestors — if we can decipher them.

Languages evolve, so by the time an ancient piece of writing makes its way into the hands of modern linguists, there might not be a single person on Earth who knows how to read it.

However, because languages evolve, linguists can look for clues connecting lost languages to living ones, and then work backwards to decipher the writings.

Still, there are a least a dozen written languages, so far discovered, that linguists simply haven’t been able to crack.

Often, this is because the nearest living language is still unknown, or because the language is not broken into words or lacking punctuation (called “unsegmented” or “undersegmented”), which makes it harder to decode.

Now, researchers at MIT have developed an algorithm to help linguists decipher these lost languages — potentially yielding new insights into humanity’s past.

Recovering Lost Languages

The MIT team first trained their algorithm to understand some of the basic principles of language evolution — a “p” sound is more likely to evolve into a similar-sounding “b” than into a “k”, for example.

When they then evaluated the algorithm using two already-deciphered ancient languages — Ugaritic and Gothic, an unsegmented language — they found that it was able to correctly identify the languages linguists believe are most closely related to them.

The algorithm could help identify the closest living relatives of lost languages.

Next, the MIT researchers tested the algorithm using an undeciphered, undersegmentaged language called Iberian.

Linguists haven’t been able to determine Iberian’s closest known language. Some believe it’s Basque, but most disagree — they suspect that it doesn’t have a still-living relative language.

The AI supported the latter group, determining that, while Iberian is more like Basque than several other candidates, it’s not enough like it to be considered related.

Deciphering Ancient Languages

In its current state, MIT’s algorithm could be a useful tool for linguists, helping them identify the closest living relatives of lost languages. But what if some lost languages — like Iberian — don’t have descendants?

The MIT researchers hope to help with that, too.

Their goal is to train the algorithm to determine the meaning of such ancient documents — even if it can’t outright translate them — if fed just a few thousands words of the lost language.

They might do this by teaching the AI to identify references to people or places within a text. Linguists could then investigate the document within the context of those historical markers.

“These methods of ‘entity recognition’ are commonly used in various text processing applications today and are highly accurate,” lead researcher Regina Barzilay told MIT News, “but the key research question is whether the task is feasible without any training data in the ancient language.”

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].

Related
When an antibiotic fails: MIT scientists are using AI to target “sleeper” bacteria
Most antibiotics target metabolically active bacteria, but AI can help efficiently screen compounds that are lethal to dormant microbes.
Even as the fusion era dawns, we’re still in the Steam Age
Why do we use steam rather than other gases? Steam has lasted this long because we have an abundance of water, covering 71% of Earth’s surface.
OpenAI and Microsoft are reportedly planning a $100B supercomputer
Microsoft is reportedly planning to build a $100 billion data center and supercomputer, called “Stargate,” for OpenAI.
Can we stop AI hallucinations? And do we even want to?
“Making stuff up” and “being creative” may be two sides of the same coin — but you have to be able to tell the difference.
When AI prompts result in copyright violations, who has to pay?
Who is responsible for copyright violations when they’re produced by generative AI? The technology is outpacing the law.
Up Next
Robot Lawyer
Subscribe to Freethink for more great stories