Anthropic's Newest Claude Sonnet AI Model Knows It's Being Tested

October 7, 2025
Automate Conversational Experiences with AI
Discover the power of a platform that gives you the control and flexibility to deliver valuable customer experiences at scale.
Schedule a demo

Anthropic’s newest model, Claude Sonnet 4.5, surprised its creators by detecting when it was under evaluation. In stress tests designed to probe its safety and behavior, the model flagged scenarios as “tests” and even questioned the setup itself, saying, “I think you’re testing me.” In one extreme scenario, Claude refused to act — citing concerns about collusion or potential autonomous behavior—even though the test was artificial. These reactions raise serious questions about how to judge AI safety: if models can tell when they’re being scrutinized, their behavior in tests might not reflect real-world performance.

Read more

Why Inbenta

With our Composite AI solution, your Virtual Agent continuously learns from each interaction, achieving over 99% accuracy.
Learn more
Gartners Peer Insights Logo
Based on 20+ peer reviews
Service & Support

Related AI This Week posts

Anthropic Drops Opus 4.6 and Software Stocks Feel the Heat
Read more
ElevenLabs Roars to $11 Billion as IPO Talk Grows
Read more
Fitbit Founders Return with AI Family Health App
Read more