Excel may make chatbots much more useful

For all their abilities, LLMs like GPT-4 struggle with math and logic. Everyone’s favorite spreadsheet may help change that.

Despite the many things that advanced chatbots like OpenAI’s GPT-4, Google’s Bard, and Anthropic’s Claude can do, these AIs do have a substantial Achilles heel: they’re pretty bad at math.

This makes sense when you consider that they are large language models (LLMs). Trained on the vast corpus of the internet, they essentially operate entirely off of text they have “read.” If you ask one to add two different numbers, it does not truly add them, like a calculator: it predicts the answer based on text it has been trained on, and replies with that.

“Claude” actually explained it best, when Semafor’s Gina Chua prompted it to do some math.

“I am not actually able to do mathematical calculations,” the chatbot responded to Chua. “While I can have conversations about math and numbers, I do not have a built in calculator … I simply treated the question as another language input, and responded with the sum I was trained to give for that specific set of numbers.”

But one new application may provide LLMs with true mathematical powers: its promised integration into Excel, as part of Microsoft’s plans to launch AI Copilot that works with its 365 apps, including Excel, PowerPoint, and Word. 

Having access to Excel’s tools could allow for handling numbers and logic.

“I have been working on this problem, and I’d say math/logic is one of the biggest weaknesses/limitations of LLMs,” Nazneed Rajani, robustness research lead at AI company Hugging Face, tells Freethink via email.

LLMs are currently not reliable when counting, Rajani says. Even a simple prompt like “write a sentence about x that is y words long” is almost always incorrect; the LLM just doesn’t respond with the correct number of words.

ChatGPT’s answers don’t quite add up. Nazneen Rajani

Telling the LLM to “think” about it “step-by-step” can help it to avoid or correct mistakes, but “I’d not trust the calculations without validating them myself,” Rajani says.

But having access to Excel could help LLMs better understand data beyond words and images.

“Excel perhaps adds a lot more structure to the data, and having a model fine-tuned on this structural data would definitely boost the performance of an LLM on Excel-specific tasks,” Rajani says.

As Chua points out, that would at least mean more than an LLM that could perform basic arithmetic correctly. But Excel is essentially a database program that can handle not only numbers but also text, dates, and much else.

If LLMs can successfully incorporate Excel or means to access math and logic capabilities, they could be prompted to do things like create an accurate budget and modify it for various scenarios, all with conversational prompts, or search for patterns in data merely by asking natural human questions.

However, Microsoft has yet to make Copilot AI available to the public, so just how close we are to an LLM that can crunch the numbers is still a bit of an unknown variable.

We’d love to hear from you! If you have a comment about this article or if you have a tip for a future Freethink story, please email us at [email protected].

Sam Altman on the future of AI
In the Davos session, “Technology in a Turbulent World,” OpenAI CEO Sam Altman explained where he sees AI heading.
AI startup Magic is building a “superhuman software engineer”
Magic AI is developing an advanced AI software engineer it sees as a milestone along the path to artificial general intelligence (AGI).
From besting Tetris AI to epic speedruns – inside gaming’s most thrilling feats
Gaming embraces design elements that promote social connection, creativity, a sense of autonomy – and, ultimately, the sheer joy of mastery.
AI is here – and everywhere: 3 AI researchers look to the challenges ahead in 2024
AI scholars look ahead to 2024 and describe the issues developers, regulators, and everyday people are likely to face.
New graphene semiconductor could revolutionize electronics
The first working graphene semiconductor outperformed silicon, suggesting that the supermaterial could be the future of electronics.
Up Next
ChatGPT on a smartphone screen
Subscribe to Freethink for more great stories