We Tested Anthropic’s Fable 5 for a Week

Categories: Startup, Product, AI

Summary

Claude 3.5 Fable scored 91/100 on a senior engineer benchmark—matching human performance—by autonomously executing complex tasks over hours with minimal prompting. The model excels at sustained execution and taste-based decisions, costing 2x Opus but delivering capabilities that saturate traditional coding benchmarks.

Key Takeaways

  1. Fable achieves 91/100 on senior engineer benchmarks vs. Opus's 63/100 and GPT-4.5's 62/100, matching human-level code architecture decisions from a single prompt.
  2. Optimal workflow: give Fable a task and let it run autonomously for 3-4 hours or overnight. The model self-checks work and iterates without manual intervention.
  3. Fable generated a complete 3D browser-based game (Borges's Library of Babel with hexagonal galleries, infinite shelves) in a single 4-hour prompt execution with zero iteration.
  4. Pricing is $10/M input tokens and $50/M output tokens (2x Opus cost), but the model demonstrates superior judgment, taste, and attention to detail vs. previous models.
  5. Anthropic applied strict safeguards blocking cyber and biological use cases to make Fable safe for public release, despite internal concerns about capability level.

Related topics

Transcript Excerpt

This is the infinite library of Babel from the Bourhees story. It contains all of the books in the universe because books are just strings. If you look, you can even go into bookmarks and I can click one of my articles after automation and it finds it in the library. It's truly infinite. And look, I could go up the stairs. I can like look down. I can look up. This seems like it took a long time to make, right? Wrong. I made this entire thing in a single prompt with Fable 5, the new model from Anthropic. Like, like literally, let me show you. So, this is a prompt. It's from four days ago or so. I got this model a little bit ahead of time. Read Jorge Louie Bourhees's The Library of Babel and then plan and execute end to end a browser playable 3D game in which the player has dropped in blah b…

More from Every