Introducing TIM-Qwen3.6-27B
May 21, 2026

Jack O'Brien
Co-Founder & CEO
Today we're releasing TIM-Qwen3.6-27B, our latest post-trained small language model running on our TIMRUN inference system, available via the OpenAI chat completions and Anthropic messages API formats. You can sign up at get an API key today.
Last month we wrote that open models had crossed the threshold of being capable enough to do real work, on par with the most expensive models for the tasks most people are using AI for daily. This release uses our novel inference runtime and post-training advancements to close that gap even further.
One unified system, not two
When launching a language model, the model and the inference runtime usually get built by different teams, with the runtime built to serve generic inference. We've taken the opposite approach, designing TIMRUN and our post-training around each other from day one. The runtime handles memory and context aggressively during long agent runs, and the model is trained to lean into how it works. The numbers below are what falls out when those two pieces are built together.
With our unique inference runtime and post-training process, we take models like Qwen, Nemotron, Gemma, and Kimi and significantly improve their capabilities.
What TIMRUN does
TIMRUN now supports our Subconscious Cache system which enables nearly lossless context engineering and extremely efficient caching for batched agent inference. In a long-horizon agentic tasks, TIMRUN compress the cached latent states on the fly so the working context doesn’t fill with tokens the model is no longer interested in.
The downstream effects:
- Large context extension. Effective context windows of up to 10x the base model's nominal limit, without quality drop off.
- Support more concurrent requests. Roughly 3x the concurrent requests on the same hardware. Better bang for your GPU.
- Sustained high throughput. Where general-purpose engines slow down or fall over under long-context concurrency, TIMRUN stays fast over time.
- Better reasoning on the work that matters. Gains on SWE-bench, computer use, workflow automation, and long-context reasoning tasks.
We benchmarked TIMRUN against SGLang (an open-source version of OpenAI and SpaceXAI’s inference runtime) on long-context agent workloads. At 131K tokens of context and concurrency 12 on 2 H100s, half of SGLang's requests OOM. At 16 concurrent requests SGLang fails entirely. TIMRUN scales smoothly to concurrency 24 on the same hardware, with 49% lower latency and 3.09x peak system throughput in the range where both engines actually run.
A more detailed technical report on the runtime and efficiency metrics behind TIM-Qwen3.6-27B is coming in the next few weeks.
What this unlocks at the edge
Our system is enables better performance and concurrency in the cloud, but it unlocks never before possible functionality at the edge.
Memory and compute are the binding constraints on edge inference, so our ability to compress context aggressively without losing reasoning quality changes the math on edge AI. The same device can hold a larger model, complete longer context tasks, or do work that simply wasn't possible before. We're seeing this play out in active POCs, and we'll have more to share soon.
We’re currently running our system on workstations like the Nvidia DGX Spark, laptops like MacBooks and Lenovos, and even mobile devices like iPhones and Samsung Galaxy phones.
Schema shift to OpenAI and Anthropic compatibility
The previous Subconscious SDK required unique inputs and generated a bespoke reasoning tree response. That was a necessary approach for the earlier iterations of our integrated model and runtime, but incompatible with the world developers expect for agentic systems.
The new API is now simply chat completions and messages compatible, thanks to breakthroughs we’ve made in our runtime development. If you have code that already uses the OpenAI or Claude SDK, you can point it at our endpoint and try TIM-Qwen3.6-27B in a few minutes.
Our previous SDK is deprecated as of today.
What's next
Next week during Boston Tech Week we're running a huge hackathon at Wayfair HQ with BaseTen and Cloudflare. TIM-Qwen3.6-27B is what teams will be building on top of. We expect most to use our cloud hosted API, but we plan to allow some teams early access to run our systems locally on their laptops and mobile devices.
Expect a more detailed technical report on our latest TIMRUN runtime and performance benchmarks in the next several weeks.
If you want to try our system yourself, head to subconscious.dev.