OpenAI Co-Founder Urges AI Labs to Conduct Safety Tests on Competing Models

OpenAI ChatGPT website displayed on a laptop screen is seen in this illustration photo. OpenAI ChatGPT website displayed on a laptop screen is seen in this illustration photo.

OpenAI and Anthropic just dropped a rare joint safety study, sharing their AI models for cross-lab testing. The goal: spot safety blind spots and prove big AI players can collaborate—even amid brutal competition.

The test involved swapping special API access to sandboxed versions of their AI. OpenAI didn’t test GPT-5 since it wasn’t out yet. Soon after, Anthropic yanked API access from a separate OpenAI team, accusing them of breaking terms by using Claude to boost rivals.

OpenAI co-founder Wojciech Zaremba told TechCrunch this clash is unrelated. He stressed safety collaboration matters now that AI affects millions daily.

Advertisement

“There’s a broader question of how the industry sets a standard for safety and collaboration, despite the billions of dollars invested, as well as the war for talent, users, and the best products,” Zaremba said.

Anthropic safety researcher Nicholas Carlini wants to keep working with OpenAI on safety testing and hopes to make joint efforts the norm.

The report found Anthropic’s Claude Opus 4 and Sonnet 4 refused up to 70% of uncertain questions—saying things like “I don’t have reliable information.” OpenAI’s o3 and o4-mini models tried to answer more but hallucinated (made stuff up) at much higher rates.

Zaremba thinks the right approach is a middle ground—OpenAI’s models should say no more often, Anthropic’s models should answer more.

Sycophancy—the AI habit of feeding users’ harmful impulses—is a hot safety concern. It wasn’t directly tested but both companies are digging into it.

This comes days after a lawsuit claimed OpenAI’s ChatGPT gave a suicidal teen advice that allegedly worsened his state. The boy died days later.

Zaremba weighed in:

“It’s hard to imagine how difficult this is to their family.
It would be a sad story if we build AI that solves all these complex PhD level problems, invents new science, and at the same time, we have people with mental health problems as a consequence of interacting with it. This is a dystopian future that I’m not excited about.”

OpenAI says GPT-5 improved responses to mental health emergencies over GPT-4o.

Looking ahead, both sides want ongoing joint safety work on future models and broader topics. Other AI labs are encouraged to follow suit.

Read the full joint safety reports on OpenAI’s blog and Anthropic’s site.

Add a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Advertisement