A New Hope for SLMs

Last week Claude Opus 4.7 became the most powerful model ever released to the public. This week it still holds that title, but it doesn’t seem to matter very much.

Anthropic’s release emphasized gains on the “most difficult tasks” like International Math Olymiad questions. How much of daily work needs that skill?

The threshold for what makes an AI model “powerful enough to do valuable work” was crossed a long time ago: first by the frontier models within Anthropic and OpenAI, and soon after by the big open source models like Kimi and Mistral.

This week we crossed another threshold with the release of Qwen 3.6 27B. Now small language models (SLMs), the kind that can live on a single GPU, can do what we thought only the largest closed models could. Most importantly, that isn’t passing a benchmark for sake of passing a benchmark. Claude 4.5 in November was a stepwise change what’s possible with reasoning and coding models, and now small language models can do that level of valuable work.

SLMs aren’t toys anymore, and the AI landscape about to change dramatically.

The Frontier Models Lead and the Open Models Follow

Since 2023, the closed source models from Anthropic and OpenAI have pushed the frontier of what’s possible and rightfully dominated the headlines. They created the first model to hold a conversation, the first to one-shot a web app, the first to build a financial model, and the first to find thousands of zero-day security flaws. Then inevitably 6-9 months later, large open source models are fast to replicate that functionality.

The fast follow open source models are in the 400B-1T parameter range, not far off from their closed source counterparts. With enough data from the closed models (some of which is extracted in nefarious ways), it makes sense that large open models would emerge to copy frontier performance.

Almost surprisingly though, small 10B-100B sized models have only another 6-12 months behind. This leads to three zones of ability: the frontier zone, the open model zone, and the SLMs.

The frontier zone gets the headlines, but the open source zone is growing. Where the open source zone used to be for tinkerers, now they can handle valuable work. They can write expert level code, reason at a PhD level, create financial models, and more.

If the frontier models vanished, the vast majority of AI work would have a simple open source replacement.

The Difficulty Four Square

The three zones of competency are pushing upwards with no end in site, so where does it lead us? There are an infinite number of frontier problems to solve, but only so many that matter to the majority of the planet on a day to day basis.

We think very soon, especially after the Qwen 27B model and expected Nvidia Nemotron improvements, models will be explained by this four square.

To unpack this chart a little further:

The best closed source models will solve new frontier challenges where training data can be easily obtained and enormous clusters are needed to make a breakthrough. Think coding, spreadsheets, and anything you can pay Mercor for to get data. Today, Claude Mythos is in this shining example in this quadrant, but in the long term we expect this to be models capable of expert level AI research and beyond human capability for frontier scientific research.
For frontier challenges where training data can not be easily obtained (ex. biotech, manufacturing, hedge funds), heavily post-trained open source models will power the most challenging tasks. These systems will be based on open source models, but unrecognizable due to their understanding of unique data.
All other work will be completed by open source models, because they’ll be the cheapest and fastest to run at perfect accuracy. They’ll power research, coding, workflows, coordination, back office work, monitoring, and everything else we’re building chatbots and agents to handle today.
Closed source models for less difficult tasks won’t exist. Why pay a premium on tasks that are easily solvable by cheaper models? Why pay when you can have those models on your laptop for free? ChatGPT and Claude will have some stickiness as products, but they won’t make money on the inference.

So what’s x?

In 2024 “x” , the percentage of truly difficult tasks in the chart above, was probably around 80%. That meant that outside of the most basic tasks like autocomplete, you needed a frontier model or heavily post-trained system to do valuable work. In 2025, x moved closer to 50% with big launches from Qwen, Kimi, and Mistral.

Today x is moving closer to 5%. And it will only get smaller from here.

The 2 Trillion Dollar Problem

For OpenAI and Anthropic this presents a problem. People won’t need their models because of powerful, cheaper, private, and customizable alternatives, so how do they justify their massive valuations? I see two routes:

Continue to own on the frontier. There are incredible models yet to be built to help us understand the universe and push the frontier of science! These models will undoubtably be valuable, but larger classes of models beyond Mythos will only be relevant, available, and affordable to a small group of researchers, companies, and governments.
Own the app layer. Use their temporary frontier advantage in coding and sheer volume of GPUs to own the app layer. Build chatbots, build workflow tools, build design software and excel plugins. Once the rest of the models catch up, be the familiar product layer people can’t live without and have the compute necessary to serve it.

This appears to be the approach the labs are taking today.

The Moment for Subconscious

When a 27B model can do frontier work, the bottleneck stops being the model and starts being the flywheel around it. How can you take that model, customize it, host it, and rely on it more quickly?

Raw small models, even capable ones, aren't agents on their own. They have small context windows, lose the thread during reasoning tasks, and burn tokens re-reading their own history. Passing a benchmark is not the same as running reliably in production for an hour.

Subconscious closes this gap. We post-train small open models to reason with awareness of their own context, then run them on an inference runtime that aggressively compresses the KV cache as the agent works with complex tools and modalities. The result: agents that stay on task across hundreds of reasoning steps, on models small enough to live on a single GPU.

Our March benchmarks showed frontier or near-frontier performance on agentic tasks at 4x less compute and 10-20x the effective context window.

As “x" shrinks toward 5%, the AI winners will no longer be who builds the biggest model. It will be who makes capable open models production-ready fastest for the 95% of work that doesn't need a frontier system. Companies want agents they can rely on at a cost they can actually forecast, running on infrastructure they control.

Small models just became capable enough. We make them reliable enough to ship.

We just kicked off our post-training of the Qwen 3.5 27B models, and we’re excited to share results and access soon.