DOS AI enforces rate limits to ensure fair usage and platform stability for all users.
| Metric | Limit |
|---|---|
| Requests per minute | 60 |
| Window type | Sliding window (60 seconds) |
The rate limiter uses a sliding window algorithm. This means the limit is calculated over a rolling 60-second period, not fixed calendar minutes.
Every API response includes headers that report your current rate limit status:
| Header | Description |
|---|---|
X-RateLimit-Limit |
Maximum requests allowed in the current window |
X-RateLimit-Remaining |
Requests remaining in the current window |
HTTP/1.1 200 OK
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 42
Content-Type: application/json
Use these headers to proactively manage your request rate and avoid hitting the limit.
When you exceed the rate limit, the API returns a 429 Too Many Requests response:
{
"error": {
"message": "Rate limit exceeded. Please wait before making another request.",
"type": "rate_limit_error",
"code": 429
}
}The request is not processed and no credits are charged.
The recommended strategy is exponential backoff with jitter. This progressively increases wait time between retries and adds randomness to prevent thundering herd problems.
import time
import random
import requests
def call_with_backoff(url, headers, data, max_retries=5):
for attempt in range(max_retries):
response = requests.post(url, headers=headers, json=data)
if response.status_code == 200:
return response.json()
if response.status_code == 429:
base_delay = 2 ** attempt # 1, 2, 4, 8, 16 seconds
jitter = random.uniform(0, base_delay * 0.5)
wait_time = base_delay + jitter
print(f"Rate limited. Retrying in {wait_time:.1f}s...")
time.sleep(wait_time)
continue
response.raise_for_status()
raise Exception("Max retries exceeded")async function callWithBackoff(url, options, maxRetries = 5) {
for (let attempt = 0; attempt < maxRetries; attempt++) {
const response = await fetch(url, options);
if (response.ok) {
return await response.json();
}
if (response.status === 429) {
const baseDelay = Math.pow(2, attempt) * 1000;
const jitter = Math.random() * baseDelay * 0.5;
const waitTime = baseDelay + jitter;
console.log(`Rate limited. Retrying in ${(waitTime / 1000).toFixed(1)}s...`);
await new Promise((resolve) => setTimeout(resolve, waitTime));
continue;
}
throw new Error(`API error: ${response.status}`);
}
throw new Error("Max retries exceeded");
}Use the rate limit headers to throttle requests before hitting the limit:
import time
import requests
def call_with_throttle(url, headers, data):
response = requests.post(url, headers=headers, json=data)
remaining = int(response.headers.get("X-RateLimit-Remaining", 60))
if remaining < 5:
time.sleep(2)
elif remaining < 10:
time.sleep(0.5)
return response.json()- Batch your work -- Pace requests evenly rather than sending them all at once.
- Use streaming -- Streaming responses (
stream: true) count as a single request regardless of response length. - Cache responses -- Avoid making the same API call repeatedly.
- Monitor usage -- Check
X-RateLimit-Remainingheaders and slow down as you approach the limit. - Use fewer, larger requests -- One comprehensive prompt is better than multiple small ones.
For organizations with high-volume needs, we offer custom limits:
- Higher request-per-minute caps based on your workload
- Burst allowances for predictable traffic spikes
- Dedicated capacity with guaranteed throughput
Contact support@dos.ai to discuss custom rate limits for your organization.