OpenAI is doubling down on AI reasoning with its MathGen team and new o1 model, pushing AI agents closer to human-like task-solving on computers.
Shortly after joining OpenAI in 2022, researcher Hunter Lightman focused on teaching AI models to tackle high school math competitions. That work led to the MathGen team’s breakthroughs in mathematical reasoning, a cornerstone for AI agents that perform complex tasks autonomously.
The latest OpenAI models show big math improvements—one even snagged a gold medal at the International Math Olympiad. But the company’s agents still stumble on tricky tasks and hallucinate, meaning full human-level reasoning is still a work in progress.
OpenAI CEO Sam Altman hinted at this future at the 2023 developer conference:
“Eventually, you’ll just ask the computer for what you need and it’ll do all of these tasks for you,” Sam Altman said.
The o1 reasoning model, launched in fall 2024, grew from years of work blending large language models, reinforcement learning (RL), and test-time computation that lets the AI plan and self-correct before answering.
Reinforcement learning, famously used by DeepMind’s AlphaGo, has been a key training technique here. OpenAI combined it with “chain-of-thought” strategies to boost reasoning beyond what models had done before.
Lightman called the breakthrough:
“One of the most exciting moments of my research career.”
OpenAI spun up an “Agents” team in 2023 to build on this research, leading to o1’s release and further agent development. The company prioritized these reasoning advances over product launches to push limits, securing GPU and talent resources required.
As OpenAI’s reasoning leap made headlines, Meta poached five top researchers with compensation packages upwards of $100 million, including new Meta Superintelligence Labs chief scientist Shengjia Zhao.
Still, AI agents today work best with verifiable, clear-cut tasks like coding. Subjective, complex jobs remain a challenge. Lightman says this is a “data problem” and new training methods for less verifiable tasks could improve agents’ flexibility.
OpenAI is already testing new general-purpose RL techniques powering models like their IMO gold medallist and o1. Researchers from OpenAI, Anthropic, and Google DeepMind agree reasoning AI is early-stage and not fully understood yet.
“I think these models will become more capable at math, and I think they’ll get more capable in other reasoning areas as well,” OpenAI’s Noam Brown said. “The progress has been incredibly fast. I don’t see any reason to think it will slow down.”
OpenAI’s upcoming GPT-5 aims to build on these gains to keep the company ahead of rivals like Google, Anthropic, xAI, and Meta in the AI agent race.
“We want agents that understand what users want without them needing to set options,” OpenAI researcher El Kishky said. “Agents that know when to call tools and how long to reason.”
OpenAI envisions an evolved ChatGPT—an AI ready to handle any internet task intuitively. The real test: who builds the best agent first.