I cloned myself with Gemini Omni in 15 minutes (and it's terrifyingly good)

Categories: AI, Product

Summary

Google's Gemini Omni can generate a functional AI video avatar of yourself in 15 minutes by scanning your face via QR code, unlocking solo video production capabilities that previously required professional crews—but the character referencing still has reliability issues.

Key Takeaways

  1. AI avatar creation in Google Flow takes ~2 minutes of phone-based facial scanning (multiple angles) plus processing time, making professional video production accessible to solo creators without filming experience.
  2. Google Flow functions as an AI creative suite, not just a video generator—it helps brainstorm storyboards, suggest cinematography (close-ups, wide shots, reveals), framing, and blocking that solo creators would struggle to conceptualize alone.
  3. Multimodal AI models (image + video) enable creators to produce content they previously couldn't—solo podcast producers can now generate high-production hype videos without knowing framing, blocking, or cinematography fundamentals.
  4. Current limitations: Gemini Omni's character referencing has documented reliability issues ('it has a hard time referencing the me character in some early tests'), suggesting avatar consistency across multiple video scenes needs improvement.
  5. The infrastructure layer for production AI (like Merge mentioned) becomes critical when scaling—integrations, permissions, model routing, and cost optimization are the real bottleneck after building the AI product itself.

Related topics

Transcript Excerpt

Today, I am doing a very strange episode where I'm going to create a video avatar of myself and in about 15 minutes get to a full minute long video starring none other than your favorite podcast host, Claire Ho. Let's get to it. This episode is brought to you by Merge. Building an AI product is one thing. The hard part is everything around it. Connecting to the tools your team and customers rely on, letting agents take action with the right permissions, and keeping [music] everything reliable and cost-efficient once you're in production. Most teams end up piecing that [music] together themselves. So, instead of building the product you actually care about, you get pulled into integrations, [music] permissions, routing, and all the infrastructure underneath. Merge is the infrastructure laye…

More from How I AI Podcast