I let Codex run for 6 hours. Here’s what happened.
Summary
Codex's new /goal feature enables truly autonomous AI tasks by creating self-verifying loops that run for hours without human intervention—the host ran a 5+ hour coding task fully autonomously for the first time, solving the micromanagement problem that plagued previous AI coding agents.
Key Takeaways
- Goals differ fundamentally from prompts: instead of turn-based request-response cycles, goals create loops where AI checks its work, decides next steps, and continues until it gathers evidence the goal is met.
- The /goal feature solved the 'keep going, try the next thing' problem—if you find yourself repeatedly telling AI agents to continue, you need goals instead of manual prompting.
- Lifecycle management matters: use /goal start task, /goal to check status, /goal pause/resume to manage runaway tasks, and /goal remove to cancel—giving you control over multi-hour autonomous runs.
- Goal prompting requires product thinking, not engineering thinking: define measurable success criteria and outcomes rather than just tasks—success depends on well-defined, specific goal statements.
- Prior to /goal in Codex, the longest autonomous AI coding runs were impractical; the host achieved 5 hours 45 minutes on first use, indicating this is the breakthrough enabling overnight AI task execution.
Related topics
Transcript Excerpt
Welcome back to How I AI. I'm Claravo, product leader and AI obsessive here on a mission to help you build better with these new tools. Today, I'm going to walk through my favorite feature in my most recent favorite AI product, goals in codecs. If you've been wondering how all these people on the timeline are getting their AI to run quote unquote overnight or handle very complex longunning tasks, I'm going to show you goals is the answer. We're going to walk through what it is, how I might use it, and a technical use case along with some non-technical examples of how goals can help you even if you're not coding. Let's get to it. This episode is brought to you by Mercury. As an AI founder, I'm constantly tracking run rate, watching revenue growth, paying vendors, and making sure I'm getting…
More from How I AI Podcast
- No hype Claude Opus 4.8 review—my real experience
- How the engineer behind Claude Cowork actually uses Claude | Felix Rieseberg (Anthropic)
- Why this Claude Code engineer uses HTML files as AI specs | Thariq Shihipar (Anthropic)
- The internal AI tool that's transforming how Stripe designs products | Owen Williams
- Claude Code Just Got WAY More Powerful