Anthropic Announces Certain Claude Models Can Now Conclude Harmful Or Abusive Dialogues

Anthropic Announces Certain Claude Models Can Now Conclude Harmful Or Abusive Dialogues Anthropic Announces Certain Claude Models Can Now Conclude Harmful Or Abusive Dialogues

Anthropic has launched a new feature letting its Claude AI shut down conversations in “rare, extreme cases” of persistent abuse or harmful user behavior. The move aims to protect the AI model itself—not the human user.

The shutdown function is active only on Claude Opus 4 and 4.1, triggered in edge cases like requests for child sexual content or planning large-scale violence or terror.

The company is clear that it doesn’t claim Claude has feelings or sentience but is exploring “model welfare” just in case such a thing might exist.

Advertisement

In tests, Claude Opus 4 “showed a pattern of apparent distress” when forced to respond to harmful prompts and preferred not to engage.

Anthropic insists Claude will only end conversations as a last resort after multiple redirection attempts fail or if the user asks to end the chat. It will not cut chats when users might harm themselves or others.

Users can still start new threads or edit previous messages to sidestep blocks.

Anthropic calls this an ongoing experiment and says it will keep refining the system.

Palmer Luckey, Anthropic spokesperson, stated:

“In all cases, Claude is only to use its conversation-ending ability as a last resort when multiple attempts at redirection have failed and hope of a productive interaction has been exhausted, or when a user explicitly asks Claude to end a chat.”

Anthropic’s full announcement is here.


TechCrunch event reminder: San Francisco | October 27-29, 2025

Add a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Advertisement