DOS AI serves high-quality open-source LLMs via an OpenAI-compatible API. Self-hosted models run on dedicated RTX Pro 6000 GPUs with 96 GB VRAM in Asia-Southeast 1. Cloud models are served via partner providers for maximum coverage.
Use dos-auto as the model ID to let DOS AI automatically select the best model for each request. Smart routing uses a 15-dimension classifier to analyze your prompt and route to the optimal model based on task complexity, cost, and latency.
response = client.chat.completions.create(
model="dos-auto", # Smart routing picks the best model
messages=[{"role": "user", "content": "..."}],
)| Model | Provider | Context | Input | Output | Model ID |
|---|---|---|---|---|---|
| Qwen3.5-35B-A3B | Alibaba | 128K | $0.15 / 1M | $0.15 / 1M | dos-ai |
| Model | Provider | Context | Input | Output | Model ID |
|---|---|---|---|---|---|
| Llama 4 Maverick 17B-128E | Meta / DeepInfra | 1M | $0.17 / 1M | $0.66 / 1M | llama-4-maverick |
| Llama 4 Scout 17B-16E | Meta / DeepInfra | 640K | $0.11 / 1M | $0.38 / 1M | llama-4-scout |
| DeepSeek V3 | DeepSeek | 128K | $0.25 / 1M | $0.25 / 1M | deepseek-v3 |
| Llama 3.3 70B | Meta | 128K | $0.20 / 1M | $0.20 / 1M | llama-3.3-70b |
| Llama 3.1 8B | Meta | 128K | $0.05 / 1M | $0.05 / 1M | llama-3.1-8b |
All prices are in USD. The catalog is DB-driven -- new models are added regularly. Check
GET /v1/catalogor the dashboard for the latest list. See Pricing for billing details.
| Model | Provider | Dimensions | Model ID |
|---|---|---|---|
| Qwen3-Embedding-4B AWQ | Alibaba / Self-hosted | 2560 | qwen3-embedding-4b |
Alibaba's Mixture-of-Experts model with 35 billion total parameters and 3 billion active parameters per forward pass. This architecture delivers excellent quality at remarkably low cost and latency, making it our recommended default model for most use cases.
- Best for: General-purpose chat, code generation, reasoning, multilingual tasks
- Strengths: Outstanding cost-efficiency, fast response times, strong multilingual support (especially CJK languages)
- Model ID:
dos-ai
Meta's latest Mixture-of-Experts model with 17 billion active parameters and 128 experts. Strong reasoning and multilingual capabilities with an industry-leading 1 million token context window.
- Best for: Complex reasoning, long-context analysis, multilingual tasks
- Strengths: Massive context window, strong benchmark scores, efficient MoE architecture
- Model ID:
llama-4-maverick
Meta's efficient MoE model with 17 billion active parameters and 16 experts. Fast and cost-effective for everyday tasks with a 640K context window.
- Best for: Everyday tasks, fast responses, cost-sensitive workloads
- Strengths: Good balance of speed and quality, large context window
- Model ID:
llama-4-scout
DeepSeek's latest Mixture-of-Experts model, known for strong performance across coding, math, and reasoning benchmarks.
- Best for: Code generation, mathematical reasoning, structured output
- Strengths: Competitive benchmark scores, good at structured/JSON output, strong code capabilities
- Model ID:
deepseek-v3
Meta's 70-billion-parameter dense model. Offers top-tier reasoning and instruction-following capabilities.
- Best for: Complex reasoning, long-form content, detailed analysis
- Strengths: Strong English performance, excellent instruction following, robust safety tuning
- Model ID:
llama-3.3-70b
Meta's efficient 8-billion-parameter model. An excellent choice when you need fast, affordable responses and the task does not require the full capability of a larger model.
- Best for: Simple tasks, high-throughput workloads, prototyping, cost-sensitive applications
- Strengths: Very low latency, lowest cost, suitable for classification and extraction tasks
- Model ID:
llama-3.1-8b
| Use Case | Recommended Model | Why |
|---|---|---|
| Let DOS AI decide | dos-auto |
Smart routing picks the best model per request |
| General assistant / chatbot | Qwen3.5-35B-A3B | Best balance of quality, speed, and cost |
| Long-context analysis (100K+ tokens) | Llama 4 Maverick | 1M context window, strong reasoning |
| Complex reasoning / analysis | Llama 3.3 70B | Dense model, top reasoning capability |
| Code generation / math | DeepSeek V3 | Top coding and math benchmark scores |
| High-volume / low-cost tasks | Llama 3.1 8B | Fastest and cheapest option |
| Multilingual (CJK languages) | Qwen3.5-35B-A3B | Superior CJK language performance |
You can retrieve the current list of available models programmatically:
curl https://api.dos.ai/v1/models \
-H "Authorization: Bearer YOUR_API_KEY"For the full retail catalog with pricing and metadata:
curl https://api.dos.ai/v1/catalog \
-H "Authorization: Bearer YOUR_API_KEY"See the Models API reference for the full response schema.