Anthropic Uses Pokémon Red to Benchmark New AI Model

Automate Conversational Experiences with AI

Discover the power of a platform that gives you the control and flexibility to deliver valuable customer experiences at scale.

Schedule a demo

Anthropic has employed the classic Game Boy game Pokémon Red to test its latest AI model, Claude 3.7 Sonnet. Unlike its predecessor, Claude 3.0 Sonnet, which struggled to leave the starting area, the updated model successfully battled three gym leaders, demonstrating impressive progress. Equipped with basic memory, screen pixel input, and function calls, Claude 3.7 Sonnet leveraged “extended thinking” to perform 35,000 actions and achieve significant milestones. The company revealed that within hours, the AI defeated Brock and subsequently conquered Misty, showcasing its advanced problem-solving capabilities. Pokémon Red joins a range of games now used to assess AI performance

Explore Inbenta AI

Anthropic Uses Pokémon Red to Benchmark New AI Model

Why Inbenta

Related AI This Week posts