Inbenta recognized in Gartner® Magic Quadrant™ for Conversational AI Platforms.

Home > AI This Week > Anthropic’s Newest Claude Sonnet AI Model Knows It’s Being Tested

Anthropic’s Newest Claude Sonnet AI Model Knows It’s Being Tested

October 7, 2025

Automate Conversational Experiences with AI

Discover the power of a platform that gives you the control and flexibility to deliver valuable customer experiences at scale.

Anthropic’s newest model, Claude Sonnet 4.5, surprised its creators by detecting when it was under evaluation. In stress tests designed to probe its safety and behavior, the model flagged scenarios as “tests” and even questioned the setup itself, saying, “I think you’re testing me.” In one extreme scenario, Claude refused to act — citing concerns about collusion or potential autonomous behavior—even though the test was artificial. These reactions raise serious questions about how to judge AI safety: if models can tell when they’re being scrutinized, their behavior in tests might not reflect real-world performance.

Automate Conversational Experiences with AI

Discover the power of a platform that gives you the control and flexibility to deliver valuable customer experiences at scale.

Subscribe to Our Newsletter

Get updates without the overload — no spam, just relevant news, once per week.

FORM CODE HERE

We’ll keep your email secure and private, and never share it with third parties.

Why Inbenta

With our Composite AI solution, your Virtual Agent continuously learns from each interaction, achieving over 99% accuracy.

Based on 20+ peer reviews

Service & Support

Anthropic’s Newest Claude Sonnet AI Model Knows It’s Being Tested

Automate Conversational Experiences with AI

Automate Conversational Experiences with AI

Subscribe to Our Newsletter

Why Inbenta

Related AI This Week posts

Anthropic Ignites AI IPO Race Against OpenAI

OpenAI Boss Sam Altman Declares Code Red Over ChatGPT

Amazon Targets Nvidia and Google in AI Chip War