Chatbots like ChatGPT get stuff wrong. But researchers are building new A.I. systems that can verify their own math — and maybe more.
On a recent afternoon, Tudor Achim gave a brain teaser to an A.I. bot called Aristotle.
The question involved a 10-by-10 table filled with a hundred numbers. If you collected the smallest number in each row and the largest number in each column, he asked, could the largest of the small numbers ever be greater than the smallest of the large numbers?
The bot correctly answered “No.” But that was not surprising. Popular chatbots like ChatGPT may give the right answer, too. The difference was that Aristotle had proven that its answer was right. The bot generated a detailed computer program that verified “No” was the correct response.
Chatbots like ChatGPT from OpenAI and Gemini from Google can answer questions, write poetry, summarize news articles and generate images. But they also make mistakes that defy common sense. Sometimes, they make stuff up — a phenomenon called hallucination.
Mr. Achim, the chief executive and co-founder of a Silicon Valley start-up called Harmonic, is part of growing effort to build a new kind of A.I. that never hallucinates. Today, this technology is focused on mathematics. But many leading researchers believe they can extend the same techniques into computer programming and other areas.
Because math is a rigid discipline with formal ways of proving whether an answer is right or wrong, companies like Harmonic can build A.I. technologies that check their own answers and learn to produce reliable information.
Google DeepMind, the tech giant’s central A.I. lab, recently unveiled a system called AlphaProof that operates in this way. Competing in the International Mathematical Olympiad, the premier math competition for high schoolers, the system achieved “silver medal” performance, solving four of the competition’s six problems. It was the first time a machine had reached that level.