Anthropic: Claude can now end conversations to prevent harmful uses



OpenAI rival Anthropic says Claude has been updated with a rare new feature that allows the AI model to end conversations when it feels it poses harm or is being abused.

This only applies to Claude Opus 4 and 4.1, the two most powerful models available via paid plans and API. On the other hand, Claude Sonnet 4, which is the company’s most used model, won’t be getting this feature.

Anthropic describes this move as a “model welfare.”

“In pre-deployment testing of Claude Opus 4, we included a preliminary model welfare assessment,” Anthropic noted.

“As part of that assessment, we investigated Claude’s self-reported and behavioral preferences, and found a robust and consistent aversion to harm.”

Claude does not plan to give up on the conversations when it’s unable to handle the query. Ending the conversation will be the last resort when Claude’s attempts to redirect users to useful resources have failed.

“The scenarios where this will occur are extreme edge cases—the vast majority of users will not notice or be affected by this feature in any normal product use, even when discussing highly controversial issues with Claude,” the company added.

Claude AI
Source: BleepingComputer

As you can see in the above screenshot, you can also explicitly ask Claude to end a chat. Claude uses end_conversation tool to end a chat.

This feature is now rolling out.

46% of environments had passwords cracked, nearly doubling from 25% last year.

Get the Picus Blue Report 2025 now for a comprehensive look at more findings on prevention, detection, and data exfiltration trends.


Source link


Leave a Reply

Your email address will not be published. Required fields are marked *