What happens now that AI is good at math? — the OpenAI Podcast Ep. 17

By OpenAI

Categories: AI, Product

Summary

AI has gone from mathematically incompetent to solving 42-year-old open problems in 12 hours—a shift that surprised even top researchers. By combining reasoning models with human verification, mathematicians can now tackle research-level problems that require 50+ pages of thinking, fundamentally changing how mathematical discovery works.

Key Takeaways

The breakthrough wasn't instant—it required iterative collaboration. Ernest Ryu spent 12 hours over 3 days prompting ChatGPT on a classical optimization problem, acting as verifier and guide rather than just copy-pasting queries. This human-in-the-loop approach proved essential for solving genuinely open research problems.
Mathematical problem-solving became a viable benchmark for AI progress. In 2 years, models jumped from no reasoning capability to solving International Math Olympiad problems at gold-medal human level, then to research-level proofs—a reliable way to measure AI advancement toward AGI.
Expert skepticism was completely overturned in months. 80% of top mathematicians said LLMs couldn't solve major open problems (1.5 years ago), but 8 months later models were doing research-level mathematics—a stark lesson in forecasting AI capabilities.
The difference between competition and research math matters for capability assessment. Olympiad problems have short solutions and known answers; real research requires novel approaches and 50+ pages of reasoning—models exceeded competition benchmarks but proved capable at actual research too.
Building toward AGI requires solving math first. The conversation frames mathematics as essential infrastructure—models need to handle complex, multi-step reasoning problems to reach artificial general intelligence, making it more than a benchmark.

Topics

Reasoning Models and Mathematical Proof
Human-in-the-Loop AI Collaboration
AI Capability Benchmarking
Open Problem Solving with LLMs
AGI Progress Indicators

Transcript Excerpt

Hello, I'm Andrew Mayne, and this is the OpenAI podcast. Today, our guests are researchers Sebastian Bubeck and Ernest Ryu, and we're going to talk about math, how it went from almost laughable to Olympiad level, and why you need math to reach AGI. The progress of the last few years has been nothing short of miraculous. We will be able to have LLMs be able to solve problems that require more than 50 pages of thinking. Mathematics was just the perfect benchmark to see the model making progress du...