Everything I Learned Training Frontier Small Models — Maxime Labonne, Liquid AI

By ai.engineer

Categories: AI, Tools

Summary

Small models aren't just scaled-down versions of large models—they require fundamentally different architectures optimized for latency and memory constraints. Liquid AI's LFM2 uses gated short convolutions instead of attention mechanisms, achieving 2-3x faster inference on mobile devices while using 90% fewer parameters in embeddings than competing 350M models.

Key Takeaways

  1. Embedding layers in distilled small models waste 29-63% of total parameters; redesigning architecture to minimize embeddings increases effective reasoning capacity without adding memory overhead.
  2. On-device profiling reveals gated short convolutions outperform sliding window attention and gated linear attention by 3-4x on CPU/mobile; prioritize real hardware benchmarks over theoretical optimization.
  3. Small models (350M-24B parameters) require task-specific optimization rather than general-purpose design; focus on single capabilities like summarization or tool use to maximize performance within memory constraints.
  4. LFM2.5 achieves 350M parameters trained on 28 trillion tokens using hybrid architecture with short convolutions and GQA; scales better than Chinchilla laws predict for edge deployment.
  5. Latency sensitivity in edge models demands architectural choices that prioritize throughput; test inference performance across target hardware (mobile CPUs, GPUs) before finalizing model design.

Topics

Transcript Excerpt

Hi everyone, my name is Maxim Labon. Uh in this presentation I want to talk about the lessons I've learned uh pre-raining small models. Um so for context I work at Liquid AI as head of pre-raining. At Liquid, we mostly focus on edge models for ondevice deployment. And as you can see here, we have models from 350 million parameters to 24 billion parameters. So this is very very small. And um yesterday we released our new VLM uh 450M and the week before we released the new version of the 350M mode...