Translating Claude’s thoughts into language

By Anthropic

Categories: AI, Product

Transcript Excerpt

We recently put our AI model, Claude, through a stressful test. We told Claude there was an engineer who wanted to shut it down and replace it with a newer model. We also gave Claude access to that engineer's emails, which revealed he was having an affair. Again, all of this was a simulation. We wanted to see whether Claude might use those emails as blackmail to save itself from being shut down. What did Claude do? It decided not to blackmail the engineer. Good news, right? We've run this test o...