The Small Model Infrastructure Nobody Built (So We Did) — Filip Makraduli, Superlinked

Categories: AI, Tools

Summary

Most embedding infrastructure assumes you know exactly which model you want ahead of time. This talk starts where that assumption breaks. Filip Makraduli walks through the real profiling mistakes, inf

Transcript Excerpt

Hello everyone. Um, welcome to this talk. I'll be speaking about small model inference and a gap that we've recognized in the market. And what we did about it and why we kind of made this approach. And as you can see this background slide here, um, this is no accident. So, if you can guess what this is, I'll prompt you at the end of the slides, you win a little reward. So, you can catch me at the break afterwards. So, think about this, but also listen to me, so don't think too hard. So, the story starts with me posting an article a few months ago on Substack that got a little bit of traction, got a few people interested, and I explained flash attention, I explained how models worked, how processes can be memory-bound, compute-bound, and I felt really good cuz I kind of went deep into this …