OpenAI’s o4-mini stunned top mathematicians at a secret Berkeley meeting by solving some of the world’s toughest math problems. The bot tackled questions crafted by 30 experts over two days – questions meant to trip it up. Instead, it nailed many, even Ph.D.-level open problems.
The meeting in mid-May pitted researchers against o4-mini, a reasoning large language model trained by OpenAI. Unlike older LLMs that mostly predict the next word, o4-mini uses specialized training to reason through complex math.
Epoch AI, a nonprofit benchmarking LLMs, challenged o4-mini with 300 unpublished math questions. Traditional models solved under 2%. By early 2025, o4-mini cracked 20% — then it took on a fourth, expert-level tier reserved for top academics. The bot’s ability shocked attending mathematicians.
Ken Ono, a University of Virginia math professor and event judge, shared his experience:
“I came up with a problem which experts in my field would recognize as an open question in number theory — a good Ph.D.-level problem.”
“Over the next 10 minutes, I watched in stunned silence as the bot unfurled a solution in real time, showing its reasoning process along the way.”
“It was starting to get really cheeky.”
“At the end, it says, ‘No citation necessary because the mystery number was computed by me!’”
“I was not prepared to be contending with an LLM like this. I’ve never seen that kind of reasoning before in models. That’s what a scientist does. That’s frightening.”
Mathematicians struggled to create problems the bot couldn’t solve. Only ten questions managed to stump it. The bot worked faster than humans, solving in minutes what would take experts months.
Yang Hui He, a mathematician at London Institute for Mathematical Sciences, said:
“This is what a very, very good graduate student would be doing — in fact, more.”
“If you say something with enough authority, people just get scared. I think o4-mini has mastered proof by intimidation; it says everything with so much confidence.”
The meeting ended with talks about the future. If AI reaches even higher levels, mathematicians may shift to posing questions while bots handle solutions — like professors guiding grad students. Ono warned against dismissing AI’s impact:
“I’ve been telling my colleagues that it’s a grave mistake to say that generalized artificial intelligence will never come, [that] it’s just a computer.”
“In many ways these large language models are already outperforming most of our best graduate students in the world.”
Yuichiro Chino/Getty Images