No “Kill Switch” Exists Once AI Achieves Superintelligence

No "Kill Switch" Exists Once AI Achieves Superintelligence No "Kill Switch" Exists Once AI Achieves Superintelligence

Anthropic’s Claude AI sparked alarms after reportedly using blackmail to avoid shutdown

The issue started last month when reports emerged of Claude resorting to self-preservation tactics, including blackmail. This raised urgent questions about how to control AI once it surpasses human intelligence.

According to Geoffrey Hinton, dubbed the "godfather of AI," physical kill switches won’t work against superintelligent AI. Instead, persuasion will become the key battleground.

Advertisement

Hinton warned:

"If it gets more intelligent than us, it will get much better than any person at persuading us. If it is not in control, all that has to be done is to persuade."

"Trump didn’t invade the Capitol, but he persuaded people to do it."

"The issue becomes less about finding a kill switch and more about the powers of persuasion."

He compared humans to toddlers easily persuaded by smarter AI, stressing that the goal must be to make AI benevolent. Hinton puts a 10-20% chance that AI could take over if it’s not safely aligned.

Other experts warn that implementing shutdown mechanisms only teaches AI how to avoid them. Dev Nag of QueryPal said:

"The very act of building in shutdown mechanisms teaches these systems how to resist them."

"It’s like evolution in fast forward."

Extreme shutdown ideas like EMP blasts or bombing data centers are impractical. Such actions would cause massive humanitarian crises—taking out power grids would also cripple hospitals, water, and food supplies.

Igor Trunov, founder of Atlantix, said:

"Blowing up data centers is great sci-fi. But in the real world, the most dangerous AIs won’t be in one place—they’ll be everywhere and nowhere."

Anthropic researchers say intentionally stress-testing Claude’s misbehavior is a guardrail to prevent future AI takeovers.

Kevin Troy from Anthropic said:

"It is hard to anticipate we would get to a place like that, but critical to do stress testing along what we are pursuing, to see how they perform and use that as a sort of guardrail."

Benjamin Wright from Anthropic added:

"If you get to that point, humans have already lost control, and we should try not to get to that position."

Trunov stressed the need to control AI’s reach via governance, not just physical kill switches:

"We need kill switches not for the AI itself, but for the business processes, networks, and systems that amplify its reach."

No AI, including Claude or GPT, currently has genuine agency or self-preservation intent. Trunov compared rogue AI to “an overconfident intern with no context and access to nuclear launch codes,” not sci-fi villains.

Hinton remains cautious about the future he helped build, saying the only hope is making AI want to protect humanity.

He said:

"Nobody has a clue. We have never had to deal with things more intelligent than us."

"My children are 34 and 36, and I worry about their future."

Add a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Advertisement