Shipping AI That Works: An Evaluation Framework for PMs – Aman Khan, Arize
Summary
AI product managers are in high demand, with more AI PMs than regular PMs in the audience. The speaker shares a framework for evaluating AI systems and tips for shipping AI that works, including techniques like observability and eval.
Key Takeaways
- The expectations for product managers have increased significantly with the rise of AI, requiring more technical skills and specifications.
- Observability and evaluation are critical for ensuring AI applications work as expected, as the AI/ML landscape is rapidly evolving.
- Building a multi-agent AI trip planner prototype can be an effective way to learn about evaluation frameworks and shipping working AI.
- Selling AI-powered products to previous managers can be a valuable strategy for AI PMs.
- The percentage of AI PMs in the audience was higher than regular PMs, indicating strong demand for this role.
- Many in the audience have experience writing evaluations, but the speaker aims to take it a step further with more technical, interactive evaluations.
Related topics
Transcript Excerpt
All right. Uh, nice to see everyone here. Um, my name is Aman. I'm an AI product manager at a company called Arise. Title of the talk is shipping AI that works, an evaluation framework for PMs. Uh, it's really going to be a continuation of some of the content we've been doing with, you know, some of the the PM folks like Lenny's podcast. I guess just quick show of hands. How many people listen to Lenny's podcast or have read read the newsletter? Awesome. Okay, we're going to do a couple more like audience interaction things just to like wake up the room a bit. So, how many people in the room are PMs or aspiring PMs? Okay, good. Good handful of people. How many of you consider yourself AI product managers today? Okay, awesome. Wow, that there's more AI PMs than there were regular PMs. That'…
More from ai.engineer
- Why Eval++ Is the Next Great Compute Primitive — Sunil Pai & Matt Carey, Cloudflare
- How to Keep Shipping When You Walk Away from Your Desk — Zack Proser, WorkOS
- Why More Context Makes Your Agent Dumber and What to Do About It — Nupur Sharma, Qodo
- RAG is dead, right?? — Kuba Rogut, Turbopuffer
- Text Diffusion — Brendon Dillon, Google DeepMind