26 October, 2025

2026 is the year of fine-tuned small models

I've been writing quite a bit about AI the last few years.

First, I talked about what LLMs are in the first place: really big Markov chains that have hit a threshold where they appear to be reasoning, to a level of fidelity that it makes no sense to argue about whether they are really "thinking" or not. Computers that can reason about the data they are processing is a brand new thing, and it's going to become part of all software.

Then I wrote about what I've learned about writing LLM applications. My key takeaway here is that LLMs are good at transforming text into less text. If you ask them to generate more text than you gave them they are usually pretty bad at that task. But the number of applications where you want to turn a lot of text into less text is truly enormous.

Finally I talked about AI's probably effect on programming jobs, in particular web development jobs because those are the ones I care about most. I think AI-assisted development is going to create a huge amount of new software, which is good, and a whole new breed of highly-assisted software developers, which is also good. I don't think current software developers will lose out on jobs as a result: the demand for software is insatiable and there's more than enough jobs to go around.

On the basis that anything I end up discussing more than three times in real life I should blog about, the next thing I should write about is the current state of the industry and where I think it's about to go. Take this with a heaping spoonful of salt, because I'm not notably good at predictions, but this is what I've got:

My big bet for 2026 is that companies searching for margin and seeing diminishing improvements in the frontier models will start training and fine tuning small models again.
— Laurie Voss (@seldo.com) October 16, 2025 at 5:07 PM

Current state: diminishing returns

The industry right now has a few categories of players that it's worth enumerating:

Frontier model labs: companies like OpenAI, Anthropic, Google, xAI, Alibaba and DeepSeek are building the very best models that exist, defining what LLMs are capable of, in fierce competition with each other. These models are proprietary and only those companies and their partners can run them.
Open Source models: some of the same companies above, and also Meta, are releasing open source (or at least open-weights) models that anyone can run. This leads to...
Inference providers: a whole bunch of companies like Together AI, Replicate, Modal, Fireworks, Groq, BentoML, Koyeb and more that will host the open-source models and run them on your behalf.
Application companies: an absolute blizzard of companies that are using AI models to build more-or-less domain-specific applications. They can build on top of frontier models, or open source models hosted by the inference providers.

Until recently, any application company wanting to stay ahead of its competitors needed to be building on top of the frontier models. The models were advancing very quickly and all your competitors would switch to them as soon as they became available. This was troublesome for the application companies because the frontier models were expensive, and also since everybody was using the same model it was a big challenge to differentiate your product from your competitors -- UX and prompt engineering were your only, narrow, moats.

My thesis is that that's changing. It's very hard to measure objectively, because the frontier models release benchmarks comparing themselves to each other but change the benchmarks frequently, so there is no single benchmark that I've found that compares, say, GPT 3 to GPT 5's performance. But the "vibe" is that while the jump in performance from GPT 2 to 3 was enormous, the jump from 3 to 4 was less big, and the jump to 5 barely noticeable, with similar progressions for Claude and other frontier models.

Meanwhile, open-weights models are catching up. That's a little easier to demonstrate, as this post and its accompanying graph above show. You can get pretty great results running an open model more cheaply on one of the inference providers than going to a frontier model. If you've got basic inference needs, those models are also getting smaller, which also makes them cheaper:

Next: the search for differentiation (and margins)

In a world where hundreds of application companies are fighting for customers but switching to the latest frontier model no longer brings meaningful differentiation, my thesis is that companies will begin to search for differentiation using fine-tuning.

Fine tuning has always been an available path, of course, but there was no point spending millions creating your fine-tuned model (and hiring the experts to do it) if it was going to be made obsolete months later by the latest frontier release. But things are changing. Fine tuning is getting orders of magnitude cheaper, and services are emerging that will train models for you, meaning you don't necessarily need to hire AI researchers to do it.

There's evidence of this already happening from AirBnB and others, with my favorite example being Cursor: different parts of your interaction are handled by very specific, small, fine-tuned models.

This has two advantages for companies doing it:

Differentation: if your model is trained on your data, then (assuming you have more data than your competition) it can be better than your competitors, competing not just on UX and prompt engineering around the same frontier model.
Margin: if your models are small, they can be cheaper to run. As the AI bubble deflates revenue and margins will matter more, and companies will be trying to do more with less.

This is already happening, and my is that 2026 will see a lot more of it.

So what should I do?

It's no use making a prediction if it doesn't lead to some kind of action, of course. If you're at a frontier model lab, I have no advice for you other than "get money off the table while your valuation is still insane". If you're at an inference provider, I think you're in for a good year. If you're at an application company, I think now's the time to start building up your dataset and looking at smaller models.