The Small Model Infrastructure Nobody Built (So We Did) — Filip Makraduli, Superlinked
Summary
Most embedding infrastructure assumes you know exactly which model you want ahead of time. This talk starts where that assumption breaks. Filip Makraduli walks through the real profiling mistakes, inf
Related topics
Transcript Excerpt
Hello everyone. Um, welcome to this talk. I'll be speaking about small model inference and a gap that we've recognized in the market. And what we did about it and why we kind of made this approach. And as you can see this background slide here, um, this is no accident. So, if you can guess what this is, I'll prompt you at the end of the slides, you win a little reward. So, you can catch me at the break afterwards. So, think about this, but also listen to me, so don't think too hard. So, the story starts with me posting an article a few months ago on Substack that got a little bit of traction, got a few people interested, and I explained flash attention, I explained how models worked, how processes can be memory-bound, compute-bound, and I felt really good cuz I kind of went deep into this …
More from ai.engineer
- The agent-ready web: Simplify user actions with WebMCP — Tara Agyemang, Google
- Why Eval++ Is the Next Great Compute Primitive — Sunil Pai & Matt Carey, Cloudflare
- How to Keep Shipping When You Walk Away from Your Desk — Zack Proser, WorkOS
- Why More Context Makes Your Agent Dumber and What to Do About It — Nupur Sharma, Qodo
- RAG is dead, right?? — Kuba Rogut, Turbopuffer