Robotics' End Game: Nvidia's Jim Fan

By Sequoia Capital

Categories: VC, Startup

Summary

Nvidia's Jim Fan reveals robotics' path to AGI by copying LLM success: use video prediction models to learn physics at scale, then fine-tune actions on learned world states. This "great parallel" approach enables robots to generalize to unseen tasks without explicit physics coding.

Key Takeaways

Vision-Language-Action (VLA) models are fundamentally flawed for robotics—they're language-heavy with physics as an afterthought. The next paradigm shifts to world models that predict next physical states, making vision and action co-equal citizens.
Video prediction models emerge physics properties (gravity, buoyancy, lighting, refraction) automatically without explicit coding. This "physics emergence" at scale creates a foundation for robot policies that generalize to unseen tasks.
Dream Zero policy model jointly decodes next world states and actions, enabling zero-shot task solving. The tight correlation between video prediction accuracy and action success makes the model's reasoning interpretable and debuggable.
Robotics' end-game strategy mirrors LLM progression: pre-training (world model) + instruction tuning (action fine-tuning) + reinforcement learning (last mile). This 3-step framework compressed 6 years of LLM progress into a robotics-specific playbook.
Action fine-tuning collapses infinite possible future states into the "thin slice that matters for real robots." This selective alignment prevents models from hallucinating impossible actions and keeps predictions grounded in physical feasibility.

Topics

World Models for Robotics
Dream Zero Policy Learning
Physics Emergence from Video Prediction
Action Fine-Tuning Methods
Embodied AI Generalization

Transcript Excerpt

And up first, I'm delighted to introduce my friend Jim Fan. Uh Jim leads the embodied autonomous research uh group at NVIDIA, otherwise known as NVIDIA Robotics. Um I think that robot robots are just one of the most thrilling things that's going to happen. Uh a car basically is a big robot, but I'm excited for robots can go beep boop and lift things for us. And so Jim was Jim was a standout at last year's AIN, and we're delighted to have you back. >> Thanks [applause] everyone. Thanks. So, it wa...