Your Coding Agent Should Do AI System Engineering — Ben Burtenshaw, Hugging Face

Categories: AI, Tools

Summary

Coding agents can now tackle AI systems engineering tasks like writing optimized CUDA kernels—previously thought impossible—by leveraging standard repos on Hugging Face Hub. The real bottleneck in GPU inference is memory bandwidth (not compute), making custom kernels that increase arithmetic intensity the key to faster model inference.

Key Takeaways

  1. Memory, not compute, is the GPU bottleneck. H100s can do 1 petaflop/second but only have 3TB/s memory bandwidth, meaning GPUs sit idle waiting for data—custom kernels solve this by increasing arithmetic intensity.
  2. Agents can now write valid, optimized CUDA kernels (proven in GPU mode hackathons and kernel benchmarks), eliminating the myth that kernel engineering requires only human expertise.
  3. Hugging Face Kernels library enables kernel publishers (like model publishers) to distribute hardware-specific kernels with TOML configs for compatibility across different GPUs and CUDA versions.
  4. Three progressive complexity levels for agentic AI systems engineering: (1) hybrid interactive agent for CUDA kernels, (2) zero-shot agent training LLMs, (3) multi-agent automated research lab.
  5. Flash Attention exemplifies the kernel optimization approach: move tensors once, maximize GPU math operations per read/write, then write back—keeping GPUs 'warm' instead of idle.

Topics

Transcript Excerpt

[music] >> Hi everyone. As you heard, I'm Ben from Hugging Face and the talk that I'm going to present to you today is called your coding agent should do AI systems engineering. So, there are two main takeaways that I want you to get from this talk. One, and probably the fun part, is that we can use coding agents to tackle the hardest engineering problems in AI, so systems engineering and machine learning engineering. And maybe the boring part is that in order to do this, we're going to need standard repos and we're going to need those on the hub. And in many cases, we already have them. So, I think in this case, I'm preaching to the choir here, but in case you haven't noticed, coding agents have been accepted. Many of of the of us have been using them for a few years, but in the last few …