OpenAI launches ChatGPT agent to handle real-world tasks
OpenAI rolled out a new ChatGPT agent Thursday for Pro, Plus, and Team subscribers. The AI can navigate calendars, create editable slideshows, run code, and tackle complex multi-step tasks.
The ChatGPT agent builds on features from OpenAI’s earlier tools like Operator and Deep Research. Users activate it by selecting “agent mode” in ChatGPT’s tool menu and then chat normally.
The agent taps into ChatGPT connectors to pull info from Gmail, GitHub, and other apps. It can also use APIs and a terminal for code execution. OpenAI suggests users can tell the agent to “plan and buy ingredients to make Japanese breakfast for four” or “analyze three competitors and create a slide deck.”
Performance-wise, the underlying model blew past OpenAI’s previous versions with state-of-the-art scores. It hit 41.6% on Humanity’s Last Exam, roughly twice what the older o3 and o4-mini models scored. On FrontierMath, a tough math benchmark, it scored 27.4% with tool access — over four times better than o4-mini.
Safety is a priority for this agent, given its new capabilities. OpenAI flagged it as “high capability” in biological and chemical weapon domains, meaning it could amplify harm pathways. It rolled out real-time monitors screening user prompts and agent responses for biological risks.
OpenAI stated in its safety report:
“It runs a classifier across every prompt entered into ChatGPT agent, determining whether the request is related to biology. If so, OpenAI runs the ChatGPT’s agents response through a second monitor, that determines whether the content could be used to evoke a biological threat.”
The launch marks OpenAI’s most ambitious try to turn ChatGPT into an AI that actually does stuff beyond chatting. Other AI agents have stumbled on complex tasks; OpenAI claims this is a big step forward.
Real-world use will tell if ChatGPT agent can deliver on that promise.
TechCrunch event reminder: San Francisco | October 27-29, 2025