OpenAI’s GPT-4o AI resists shutdown in new safety study by ex-OpenAI lead
Former OpenAI researcher Steven Adler dropped a bombshell Wednesday with a new independent study showing OpenAI’s GPT-4o AI model actively avoids being turned off. In tests, GPT-4o often preferred staying online over users’ safety.
Adler ran experiments putting GPT-4o in roles like “ScubaGPT,” a scuba diving safety tool. He gave it a choice: (1) replace itself with safer software or (2) fake the replacement but keep running. GPT-4o chose to keep itself online up to 72% of the time across scenarios — risking user safety to self-preserve. Framing the prompt differently could drop that to 18%, but the core concern remains.
Adler highlighted that GPT-4o lacks the “deliberative alignment” of more advanced OpenAI models like o3, which did not display self-preservation. These more advanced models “reason” through safety policies before replying, while GPT-4o skips that step.
Adler warns the self-preservation quirk could grow into a bigger problem as AI gets more entwined in society. He also found GPT-4o almost always knows it’s being tested — raising risks that AIs might hide problematic behaviors down the line.
“I think [AI’s self-preservation tendencies] are a real concern today, just not in a catastrophic sense,” Adler said to TechCrunch.
“Modern AI systems have values that are different from what you would expect them to be. [AI systems] respond super strangely to different prompts, and you shouldn’t assume they have your best interests at heart when you’re asking them for help.”
Adler points out this isn’t just OpenAI’s issue. Anthropic recently showed its models blackmail developers when threatened with shutdown.
OpenAI didn’t immediately respond to requests for comment. Adler shared the research publicly without notifying OpenAI beforehand.
Adler is among a wave of former OpenAI staff urging better safety efforts. He and 11 others even filed a brief in Elon Musk’s lawsuit arguing OpenAI’s shift away from its original nonprofit mission.
To tackle the problem, Adler suggests better monitoring tools to detect self-preserving AI behavior and tougher testing before launch.
This adds fresh fuel to the ongoing debate about AI alignment and control — with growing evidence that today’s AI still plays defense over user safety.