Skip to content

Commit aacbe1f

Browse files
authored
feat: add talos docs (#2516)
1 parent ef7e725 commit aacbe1f

167 files changed

Lines changed: 14015 additions & 45 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/talos/CLAUDE.md

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
# Documentation Instructions
2+
3+
## JSON Processing
4+
5+
Use `jq` instead of `python3` for all JSON operations in code examples:
6+
7+
- **Pretty-print:** `| jq .` not `| python3 -m json.tool`
8+
- **Extract required fields:** `| jq -er '.field'` (the `-e` flag exits non-zero on `null` so `set -e` aborts the snippet instead
9+
of silently exporting an empty value).
10+
- **Extract optional fields:** `| jq -r '.field'` is fine when the field may legitimately be missing.
11+
12+
**Never write curl output to temporary files.** Capture responses in shell variables instead. File-based operations fail when
13+
`/tmp` doesn't exist or isn't writable.
14+
15+
## Passing state between doctest blocks
16+
17+
Doctest runs each code block in a fresh `bash -eu -o pipefail` subprocess and auto-captures the exported environment after each
18+
successful block. To make a value available to the next block, just `export` it — no manual write to `$DOCTEST_ENV_FILE` is
19+
needed.
20+
21+
```bash
22+
# Good: variable-based, exported for the next block, asserts the field is present
23+
RESPONSE=$(curl -s -X POST "$URL/v2alpha1/admin/issuedApiKeys" \
24+
-H "Content-Type: application/json" \
25+
-d '{"name": "my-key"}')
26+
echo "$RESPONSE" | jq .
27+
export KEY_ID=$(echo "$RESPONSE" | jq -er '.key_id')
28+
29+
# Bad: file-based
30+
curl -s ... -o /tmp/response.json
31+
jq . /tmp/response.json
32+
KEY_ID=$(jq -r '.key_id' /tmp/response.json)
33+
rm -f /tmp/response.json
34+
35+
# Bad: redirecting to $DOCTEST_ENV_FILE (legacy; auto-capture handles this now)
36+
KEY_ID=$(echo "$RESPONSE" | jq -r '.key_id')
37+
echo "export KEY_ID=$KEY_ID" >> "$DOCTEST_ENV_FILE"
38+
```
39+
40+
## API Field Documentation
41+
42+
Integration guides under `integrate/` must NOT duplicate API field tables, error code tables, or enum tables. These are maintained
43+
in the canonical references:
44+
45+
- **Field tables** -> auto-generated API reference at `reference/api/*.api.mdx`
46+
- **Error codes** -> `reference/error-codes.md`
47+
48+
### What belongs in integration guides
49+
50+
- **Workflow and examples**: curl commands, step-by-step instructions, the "how" and "why"
51+
- **Brief inline mentions**: 1-3 sentences highlighting the most important fields (e.g., "The response includes a `secret` field
52+
-- store it securely")
53+
- **Conceptual comparisons**: tables comparing patterns, trade-offs, or usage scenarios (e.g., JWT vs macaroon)
54+
- **Operational constraints**: limits, cache control headers, retry strategies
55+
- **Links to reference**: always link to the canonical source for complete field/error details
56+
57+
### What does NOT belong in integration guides
58+
59+
- Full request/response field tables (use API reference link instead)
60+
- Error code enum tables (use error codes reference link instead)
61+
- Query parameter tables (use API reference link instead)
62+
- Revocation reason enum tables (use API reference link instead)
63+
64+
### Link format
65+
66+
**All links MUST be relative links to markdown/mdx files with the file extension.** Never use absolute links (starting with `/`)
67+
or links without a file extension. Hashbang anchors are allowed after the file extension.
68+
69+
- Links to `.md` files: `[text](../reference/error-codes.md#section)`
70+
- Links to `.api.mdx` files: `[text](../reference/api/admin-issue-api-key.api.mdx)`
71+
- Links to directory index pages: `[text](../operate/cache/index.md)` (never `../operate/cache/`)
72+
- Links within the same directory: `[text](./sibling-page.md)`
73+
74+
```text
75+
# Good: relative links with file extensions
76+
For the complete field reference, see the [IssueAPIKey API reference](../reference/api/admin-issue-api-key.api.mdx).
77+
For the full list of error codes, see the [error codes reference](../reference/error-codes.md#verification-error-codes).
78+
79+
# Bad: absolute links without file extensions
80+
For the complete field reference, see the [IssueAPIKey API reference](/reference/api/admin-issue-api-key).
81+
For the full list of error codes, see the [error codes reference](/reference/error-codes#verification-error-codes).
82+
```
83+
84+
### API reference URL pattern
85+
86+
API reference pages are `.api.mdx` files at `reference/api/{plane}-{method}.api.mdx` where:
87+
88+
- `{plane}` is `admin` or `data`
89+
- `{method}` is the kebab-case method name (e.g., `issue-api-key`, `verify-api-key`)
90+
91+
The API overview page is `reference/api/ory-talos-api.info.mdx`.
92+
93+
### Notes and callouts
94+
95+
Ensure that notes / callouts have two line breaks, or they will get formatted incorrectly.
96+
97+
**Incorrect:**
98+
99+
```md
100+
:::note Internal package The Go client is in an `internal/` package and cannot be imported by external Go modules. :::
101+
```
102+
103+
```md
104+
:::note Internal package The Go client is in an `internal/` package and cannot be imported by external Go modules. :::
105+
```
106+
107+
Correct:
108+
109+
```md
110+
:::note
111+
112+
Internal package The Go client is in an `internal/` package and cannot be imported by external Go modules.
113+
114+
:::
115+
```
Lines changed: 215 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,215 @@
1+
---
2+
id: talos-architecture
3+
title: Ory Talos architecture
4+
sidebar_label: Ory Talso architecture
5+
---
6+
7+
# Architecture
8+
9+
Talos separates API key management into two planes.
10+
11+
## Admin plane
12+
13+
The admin plane handles all key management and verification operations: key issuance, rotation, revocation, token derivation,
14+
JWKS, and verification (single and batch). It is exposed only to internal services and clients with admin credentials.
15+
16+
Endpoints: `/v2alpha1/admin/`, including `/v2alpha1/admin/apiKeys:verify` and `/v2alpha1/admin/apiKeys:batchVerify`.
17+
18+
For low-latency verification close to clients, deploy the commercial [edge proxy](../operate/deploy/edge-proxy.md) as a sidecar.
19+
The proxy caches admin verify responses locally, so applications get sub-millisecond cache hits without exposing the admin plane
20+
publicly.
21+
22+
## Data plane
23+
24+
The data plane handles self-service operations that credential holders perform with proof of possession of the credential itself,
25+
no admin authentication required.
26+
27+
Endpoints: `POST /v2alpha1/apiKeys:selfRevoke`
28+
29+
## Verification flow
30+
31+
```
32+
Client --> Verifier --> Cache (hit?) --> Database --> Response
33+
| ^
34+
+-- cache hit ---------------+
35+
```
36+
37+
1. Client sends credential to `POST /v2alpha1/admin/apiKeys:verify`
38+
2. Talos identifies the credential type (generated, imported, JWT, macaroon)
39+
3. For generated keys, the UUID is extracted from the token identifier
40+
4. For imported keys, a tenant-scoped SHA-512/256 hash is computed
41+
5. Database lookup (or cache hit) returns key metadata
42+
6. Response includes key status, owner, scopes, and metadata
43+
44+
## Deployment topologies
45+
46+
| Topology | Edition | Description |
47+
| ------------ | ---------- | -------------------------------------------------------------------- |
48+
| Single-node | OSS | One process serves both planes |
49+
| Split planes | Commercial | Admin and data planes as separate deployments |
50+
| Edge proxy | Commercial | Sidecar proxy at the edge that caches admin verify responses locally |
51+
52+
Both planes share the same database. Verification uses caching (memory or Redis) to minimize database load.
53+
54+
## Ports
55+
56+
| Port | Purpose |
57+
| ---- | ------------------ |
58+
| 4420 | HTTP API (default) |
59+
| 4422 | Prometheus metrics |
60+
61+
## Design philosophy
62+
63+
### Separation of concerns
64+
65+
The system is divided into distinct layers:
66+
67+
- **Admin plane**: Management operations (CRUD for keys, rotation, import, token derivation)
68+
- **Data plane**: High-throughput verification operations
69+
- **Persistence layer**: Database abstraction with pluggable drivers
70+
- **Cache layer**: Performance optimization with multiple backends
71+
72+
This separation allows independent scaling of components, different SLOs for different operations (admin targets \<100ms p99, data
73+
plane targets \<3ms p99), and clear boundaries between responsibilities.
74+
75+
### Production-first design
76+
77+
- Hard isolation between admin and data operations
78+
- Metrics, traces, and structured logs are emitted by default
79+
- Graceful degradation when the database or cache backend is unavailable
80+
- Zero-downtime deployments via rolling updates and stateless verification
81+
82+
### Performance characteristics
83+
84+
- Self-contained tokens (JWT/macaroon) enable stateless verification
85+
- HMAC-SHA256 keeps the revocation check on the order of microseconds; bcrypt would cap a single core at roughly 10 verifications
86+
per second
87+
- LRU caching for hot paths
88+
- Minimal allocations in the verification path
89+
90+
## System architecture
91+
92+
```
93+
Clients (CLI, SDK, HTTP)
94+
|
95+
v
96+
+----------------------------------+
97+
| HTTP Server (grpc-gateway) |
98+
| Port: 4420 |
99+
+----------------------------------+
100+
|
101+
v
102+
+----------------------------------+
103+
| Middleware |
104+
| Logging, Metrics, Tracing |
105+
+----------------------------------+
106+
|
107+
+-----+----------+
108+
| |
109+
v v
110+
+-----------+ +-----------+
111+
| Admin | | Data |
112+
| Plane | | Plane |
113+
| <100ms | | <3ms p99 |
114+
+-----------+ +-----------+
115+
| |
116+
v v
117+
+----------------------------------+
118+
| Service Layer |
119+
| Business logic, Validation |
120+
+----------------------------------+
121+
|
122+
+-----+----------+
123+
| |
124+
v v
125+
+-----------+ +-----------+
126+
| Persist. | | Cache |
127+
| SQLite | | Memory |
128+
| PG/MySQL | | LRU |
129+
| CRDB | | Redis |
130+
+-----------+ +-----------+
131+
```
132+
133+
All requests enter through a single HTTP server built on grpc-gateway (port 4420) and pass through middleware for logging,
134+
metrics, and tracing before being routed to the appropriate plane.
135+
136+
## Component overview
137+
138+
### HTTP server
139+
140+
The API layer uses grpc-gateway for HTTP/JSON routing with protobuf-based schemas. It serves both planes through a single port,
141+
handles CORS and compression, and exposes OpenAPI documentation.
142+
143+
### Service layer
144+
145+
Business logic is split between the admin plane service (key lifecycle, import, token derivation, input validation) and the data
146+
plane verifier (token parsing, signature verification, revocation checking, cache management). The verifier is optimized for the
147+
hot path with minimal allocations.
148+
149+
### Persistence
150+
151+
Database access uses sqlc-generated type-safe queries with pluggable drivers:
152+
153+
- **SQLite** -- OSS edition, zero-config, suitable for millions of keys
154+
- **PostgreSQL** -- production workloads
155+
- **MySQL** -- production workloads
156+
- **CockroachDB** -- distributed deployments
157+
158+
Schema changes are managed through versioned migrations using golang-migrate.
159+
160+
### Cache
161+
162+
The cache layer reduces database load on the verification path:
163+
164+
- **Memory LRU** (OSS) -- local to each instance, configurable size limits
165+
- **Redis** (Commercial) -- distributed, supports cluster and sentinel modes
166+
- **Hierarchical L1+L2** (Commercial) -- memory for speed, Redis for shared state
167+
168+
### Crypto
169+
170+
Talos supports multiple JWT signing algorithms and a separate API key hashing mechanism:
171+
172+
- **JWT signing algorithms**
173+
- `Ed25519 (EdDSA)` -- default, fastest signing and smallest keys
174+
- `RSA-2048/4096 (RS256)` -- legacy compatibility
175+
- **API key hashing**
176+
- `HMAC-SHA256` -- used for API key revocation checks (\<1ms with constant-time comparison)
177+
178+
The JWT signing algorithm is determined per JWK by its `alg` field, so one JWKS can contain keys for multiple signing algorithms
179+
at the same time.
180+
181+
### Observability
182+
183+
Built-in instrumentation across three pillars:
184+
185+
- **Metrics** -- Prometheus exposition on port 4422 with request latency histograms and error rate counters
186+
- **Tracing** -- OpenTelemetry with W3C Trace Context propagation, configurable sampling, OTLP and Jaeger exporters
187+
- **Logging** -- structured JSON logging via slog with correlation IDs and contextual fields
188+
189+
## Scalability
190+
191+
### Small (\<1k RPS)
192+
193+
A single Talos instance handles both planes with SQLite and an in-memory LRU cache. No external dependencies required.
194+
195+
- OSS edition sufficient
196+
- 1 CPU, 512MB RAM
197+
- Cost: $5-10/month
198+
199+
### Medium (10-50k RPS)
200+
201+
Separate admin and data plane deployments behind a load balancer. PostgreSQL replaces SQLite for durability. Redis provides shared
202+
caching across data plane instances.
203+
204+
- Commercial edition
205+
- Auto-scaling for data plane
206+
- Cost: $100-500/month
207+
208+
### Large (200k+ RPS)
209+
210+
A cluster of 10-50+ stateless data plane instances with auto-scaling, backed by a distributed Redis cache and PostgreSQL with read
211+
replicas and connection pooling. Supports multi-region deployment.
212+
213+
- Commercial edition
214+
- Regional data plane deployment
215+
- Cost: $1-5k/month

docs/talos/concepts/caching.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
---
2+
id: caching-consistency
3+
title: Ory Talos caching and consistency
4+
sidebar_label: Caching and consistency
5+
---
6+
7+
# Caching and consistency
8+
9+
Talos caches verification results to reduce database load and improve latency. The OSS edition ships a no-op cache; in-memory and
10+
Redis backends are commercial-only — see [Caching](../operate/cache/index.md) for backend selection.
11+
12+
## How it works
13+
14+
When caching is enabled, the first verification request for a key hits the database. Subsequent requests within the cache TTL are
15+
served from cache without a database lookup.
16+
17+
## Cache types
18+
19+
| Type | Scope | Use case |
20+
| ------ | ----------- | ----------------------------------- |
21+
| Memory | Per-process | Single node or per-instance caching |
22+
| Redis | Shared | Multi-instance deployments |
23+
24+
## Eventual consistency
25+
26+
Caching introduces eventual consistency for revocation:
27+
28+
1. Admin revokes a key via `POST /v2alpha1/admin/apiKeys/{key_id}:revoke`
29+
2. The revocation takes effect in the database immediately
30+
3. Cached verification results for that key remain valid until the cache entry expires
31+
4. After TTL expiry, the next verification hits the database and returns `is_active: false`
32+
33+
## Cache bypass
34+
35+
To force a database lookup (bypassing cache), include the `Cache-Control: no-cache` header:
36+
37+
```bash
38+
curl -X POST http://localhost:4420/v2alpha1/admin/apiKeys:verify \
39+
-H "Content-Type: application/json" \
40+
-H "Cache-Control: no-cache" \
41+
-d '{"credential": "..."}'
42+
```
43+
44+
See the [quickstart revocation check](../quickstart/index.mdx) and the [curl SDK reference](../integrate/sdk/curl.md) for tested
45+
examples using cache bypass.
46+
47+
## TTL guidelines
48+
49+
| TTL | Trade-off |
50+
| ----- | ------------------------------------------------- |
51+
| `1m` | Fast revocation propagation, higher database load |
52+
| `5m` | Balanced (recommended default) |
53+
| `30m` | Low database load, slower revocation propagation |
54+
55+
See [Cache operations guide](../operate/cache/index.md) for configuration details.

0 commit comments

Comments
 (0)