Build Hour: GPT-Realtime-2
Summary
OpenAI released three new audio models including GPT-4 Realtime-2 with 128K context window (4x increase) and sub-200ms latency, enabling three distinct product patterns: voice-to-action, systems-to-voice, and voice-to-voice for production applications across customer service, translation, and ambient agents.
Key Takeaways
- GPT-4 Realtime-2 context window expanded 4x to 128K tokens, roughly equivalent to one hour of conversation, eliminating truncation and enabling better instruction following and intelligence without losing conversation history.
- Real-time Whisper model delivers streaming transcription with tunable latency as low as 200 milliseconds across 80 input languages, enabling live captions, meeting notes, and responsive voice products without noticeable delay.
- Three distinct voice architectures available: voice-to-action (hands-free apps), systems-to-voice (voice chief of staff), and voice-to-voice (customer service), allowing builders to choose the right pattern for their use case.
- Parallel tool calling replaces waterfall execution, allowing multiple function calls simultaneously rather than sequentially, reducing latency in multi-step voice agent workflows for faster user response times.
- Real-time translation model supports 70+ input and 13 output languages with low-latency streaming, powering use cases like video calls, live streams, and customer service without perceived language barriers.
Topics
- GPT-4 Realtime-2 Voice Models
- Real-time Streaming Transcription
- Voice Agent Architecture Patterns
- Sub-200ms Latency Audio Processing
- Parallel Tool Calling in Voice APIs
Transcript Excerpt
Hey everyone, welcome back for another build hour. We're so excited to have you here today. My name is Sarah Urbonus. I lead startup marketing here at OpenAI. I am joined by two absolute legends who internally need no introduction, but Terry Erica, can you introduce yourselves and share what you've been up to this past week? >> Absolutely. So, I'm Terry. I am a a multimodal API PM here >> and I'm a solutions engineer on our technical success or helping our largest digital native customers scale on our API platform and so today everything is about GPT realtime 2 which we just released last week. So the goal of build hours is really to empower you with the best practices tools and AI expertise to scale your company using open AAI APIs and models. Whether you're a founder or an engineer at a …