Inference Chips for Agent Workflows
By Y Combinator
Categories: VC, Startup, Design
Transcript Excerpt
Most AI chips are designed for a world where inference means prompt in response out. Agents don't work that [music] way. They loop, calling tools, branching, backtracking, holding context across dozens of steps. That's a completely [music] different hardware problem. Current GPUs hit 30 to 40% of peak utilization on these workloads because the work is bursty, bouncing between memory bound model calls, IO bound tool use, and CPU bound orchestration. That gap is where purpose-built silicon wins. [...