Question 1

What is TIMRUN?

Accepted Answer

TIMRUN is our specialized inference runtime designed for agent workloads. It caches and compresses tokens aggressively during processing directly on the GPU. As a result, our system extends the context window of the models it serves by 10x, enables 2.3x concurrent workloads running on the same hardware, and sustains 3.5x faster token throughput where general-purpose runtimes slow down. TIMRUN can serve LLMs, SLMs, and multimodal modals.

Question 2

Can TIMRUN run my model?

Accepted Answer

Most likely. TIMRUN is compatible with LLMs and Multimodal models. Our team has experimented extensively with the Qwen, GLM, Nemotron, and Kimi models, and many more open and closed models are compatible.

Question 3

What is Subconscious GLM-5.2?

Accepted Answer

Subconscious GLM-5.2 is the open-source GLM-5.2 model served on our TIMRUN runtime, built for agentic coding. It is available via an OpenAI and Anthropic compatible API.

Question 4

What is TIM-Qwen3.6-27B?

Accepted Answer

TIM-Qwen3.6-27B is our post-trained small language model running on our TIMRUN inference system. We took the already powerful Qwen3.6 27B model and significantly improved its capabilities with TIMRUN and our post-training process. We offer this system via an OpenAI completions and Anthropic messages compatible API.

Question 5

Is the API compatible with OpenAI and Anthropic SDKs?

Accepted Answer

Yes. The API supports OpenAI chat completions and Anthropic messages formats. If you have code that already uses the OpenAI or Claude SDK, you can point it at our endpoint and try our system with 3 lines of code.

Question 6

Can I use the API with OpenCode, Claude Code, and Codex?

Accepted Answer

Yes. Our API is compatible with any tool that uses the OpenAI chat completions or Anthropic messages format. Our documentation at docs.subconscious.dev has pointers to get you started.

Question 7

Can I use the API with OpenClaw and Hermes?

Accepted Answer

Yes. Our API is compatible with any tool that uses the OpenAI chat completions or Anthropic messages format. Our documentation at docs.subconscious.dev has pointers to get you started.

Question 8

Can I use the API with LangChain, CrewAI, Mastra, or n8n?

Accepted Answer

Yes. Any framework that uses the OpenAI completions or Anthropic API works with Subconscious. Swap in our base URL and API key and you are up and running.

Question 9

Can I run TIMRUN on edge devices?

Accepted Answer

Yes. TIMRUN compresses context aggressively without losing reasoning quality, which changes the math on edge AI. With TIMRUN, the same device can run a larger model, complete longer context tasks, or do work that simply was not possible before. We are currently running on workstations like the Nvidia DGX Spark, laptops, and even mobile devices like iPhones and Samsung Galaxy phones. Sign up for our platform and head to the local devices tab to learn more.

Question 10

Are your models open source?

Accepted Answer

Yes. We open source our post-trained models on Hugging Face at huggingface.co/SubconsciousDev. We do not, however, open source our proprietary inference runtime TIMRUN.

Question 11

Do you support enterprise and dedicated API deployments?

Accepted Answer

Yes. We offer dedicated GPU infrastructure with no rate limits, optional post-training on your tools and data, and custom SLAs. Sign up for our platform and head to the dedicated endpoint tab to get started.

Question 12

Do you pretrain models?

Accepted Answer

No. Subconscious is a runtime optimization and post-training company. We develop our TIMRUN runtime and TIM family of post-trained models. We take open models and post-train them to improve their reasoning ability on policy with our TIMRUN runtime. For specific customers, we help them post-train models for their unique data and tooling.

Inference Systems Designed for Agents

Run agents in your cloud

Run agents with our API

Use open models, enhanced for agentic workloads

Measurable improvements with our TIMRUN runtime

Power your agents with less GPUs and get better performance.

Power coding agents with our API.

Power agentic products with our API.

Run capable agents on edge devices, for the first time.

A drop-in replacement for vLLM and SGLang.

Longer runs and more concurrency on the same GPUs.

Handle millions of tokens with context management at runtime.

With the highly efficient Subconscious Cache, save 10x on cost at scale.

Run 2.3x as many workloads on the same compute footprint.

3.5x faster token throughput down deep reasoning chains.

Integrate in three lines of code.

Frequently asked questions

Make your GPUs go further.

Subconscious