Translating Claude’s thoughts into language
By Anthropic
Categories: AI, Product
Transcript Excerpt
We recently put our AI model, Claude, through a stressful test. We told Claude there was an engineer who wanted to shut it down and replace it with a newer model. We also gave Claude access to that engineer's emails, which revealed he was having an affair. Again, all of this was a simulation. We wanted to see whether Claude might use those emails as blackmail to save itself from being shut down. What did Claude do? It decided not to blackmail the engineer. Good news, right? We've run this test o...