Why AI needs a new kind of supercomputer network — the OpenAI Podcast Ep. 18

Categories: AI, Product

Summary

Training frontier models isn’t as simple as adding more GPUs—one small problem and the whole coordinated dance falls apart. OpenAI’s Mark Handley and Greg Steinbrecher discuss how a new supercomputer

Transcript Excerpt

Hello, I'm Andrew Mayne, and this is the OpenAI Podcast. On today's episode, we're discussing how to make supercomputers better at training models. Joining me are Mark Handley from the core networking team and Greg Steinbrecher from workload systems. They'll discuss how a breakthrough has made training more efficient so everyone gets smarter models faster. This has really allowed us to remove one of the key barriers to continuing to scale. We're talking about a lot of the world's fastest GPUs and making them all work together on a single task. We know we've won when researchers stop needing to know what network protocol this particular cluster is using. So tell me a bit about your background. I started out doing physics and math in undergrad, wanting to basically understand how complex sys…