Translating Claude’s thoughts into language

Name: Translating Claude’s thoughts into language
Uploaded: 2026-05-08T15:31:34.206Z
Duration: 3 min 17 s
Channel: Anthropic
Description: AI models like Claude talk in words but think in numbers. These numbers, called activations, encode Claude’s thoughts, but not in a language we can read. We are introducing Natural Language Autoencoders, or NLAs, which translate AI models’ activations into readable text. NLAs have already helped...

By Anthropic

Categories: AI, Product

Transcript Excerpt

We recently put our AI model, Claude, through a stressful test. We told Claude there was an engineer who wanted to shut it down and replace it with a newer model. We also gave Claude access to that engineer's emails, which revealed he was having an affair. Again, all of this was a simulation. We wanted to see whether Claude might use those emails as blackmail to save itself from being shut down. What did Claude do? It decided not to blackmail the engineer. Good news, right? We've run this test o...