Skip to content

Commit 6cbadb9

Browse files
committed
docs: refine v2 docs for accuracy against implementation
Audit pass over the recipe-driven 2.0 docs (English + Japanese). Cross-checks every substantive claim in the modified pages against the recotem source tree, correcting mismatches in the serving API reference, security threat model, environment-variable scoping, batch artifact swap flow, and the overview page. Notable fixes: - serving-api.md: response examples use the actual 12-hex request_id format; /v1/recipes/{name} field table marks the optional fields nullable and distinguishes recipe_hash (bare 64-hex) from config_digest (sha256:<hex>); UNKNOWN_USER example matches the string the server actually emits; stub filtering wording covers both artifact- and YAML-load failures. - security.md: threat-model row lists the full cloud-prefix blacklist. - environment-variables.md / recipe-reference.md: RECOTEM_METADATA_FIELD_DENY is applied at metadata-index load, so it covers batch-recommend* too. - guide/batch.md: load errors surface on /v1/health/details, not /v1/health. - guide/index.md: a recipe now exposes a set of HTTP endpoints, not one.
1 parent 396b197 commit 6cbadb9

24 files changed

Lines changed: 1070 additions & 467 deletions

docs/deployment/docker.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ services:
6969
healthcheck:
7070
test:
7171
- "CMD-SHELL"
72-
- "python -c \"import sys, urllib.request; sys.exit(0 if urllib.request.urlopen('http://127.0.0.1:8080/health', timeout=5).status == 200 else 1)\""
72+
- "python -c \"import sys, urllib.request; sys.exit(0 if urllib.request.urlopen('http://127.0.0.1:8080/v1/health', timeout=5).status == 200 else 1)\""
7373
interval: 30s
7474
timeout: 10s
7575
retries: 3
@@ -122,7 +122,7 @@ Named Docker volumes (as in `compose.yaml`) are pre-created with the right owner
122122

123123
### Image-level HEALTHCHECK
124124

125-
The Dockerfile declares its own `HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3` that probes the public `/health` endpoint with `urllib.request.urlopen(f'http://127.0.0.1:{RECOTEM_PORT}/health', timeout=3)` (so it picks up an overridden `RECOTEM_PORT`). For one-shot `train` containers this fires after the process has already exited and causes no spurious failures. The Compose-level healthcheck shown in the annotated example also targets `/health` and overrides the image default for the `serve` service orchestrators should rely on the HTTP 200 response from `/health`.
125+
The Dockerfile declares its own `HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3` that probes `urllib.request.urlopen(f'http://127.0.0.1:{RECOTEM_PORT}/health', timeout=3)` (so it picks up an overridden `RECOTEM_PORT`). Note: the image-default probe targets `/health` (no `/v1` prefix); because the v1 router is mounted at `/v1` the public health endpoint is `/v1/health`. The Compose-level healthcheck shown in the annotated example overrides the image default for the `serve` service and targets `/v1/health`; orchestrators should rely on the HTTP 200 response from `/v1/health`. For one-shot `train` containers the image healthcheck fires after the process has already exited and causes no spurious failures.
126126

127127
### Reverse proxy binding
128128

@@ -177,18 +177,18 @@ See [cron / systemd Deployment](./cron-systemd) for host-based scheduling patter
177177
| `RECOTEM_ENV` | no | `""` | `--insecure-no-auth` permitted when set to `development`, `dev`, or `test`; `--dev-allow-unsigned` permitted only when set to `development`. |
178178
| `RECOTEM_ARTIFACT_ROOT` | no | `""` | If set, local `output.path` must resolve under this directory (symlink-escape guard) |
179179
| `RECOTEM_LOCK_DIR` | no | `""` | Override directory for per-recipe training lock files. Needed when `output.path` is a remote URI (lock files must be host-local). Falls back to a temp dir under the system temp directory. |
180-
| `RECOTEM_METADATA_FIELD_DENY` | no | `""` | Comma-separated column names stripped from `/predict` responses after the metadata join |
181-
| `RECOTEM_METRICS_ENABLED` | no | `""` | Set to `1`/`true`/`yes`/`on` to enable the Prometheus `/metrics` endpoint. Requires `recotem[metrics]` extra. |
180+
| `RECOTEM_METADATA_FIELD_DENY` | no | `""` | Comma-separated column names stripped from `/v1/recipes/{name}:recommend` and `:recommend-related` responses after the metadata join |
181+
| `RECOTEM_METRICS_ENABLED` | no | `""` | Set to `1`/`true`/`yes`/`on` to enable the Prometheus `/v1/metrics` endpoint. Requires `recotem[metrics]` extra. |
182182
| `RECOTEM_STARTUP_PARALLELISM` | no | `""` (auto) | Number of parallel threads used to load artifacts at startup. Default is `min(len(recipes), 8)`. Clamped 1–32. Set to `1` for sequential loading (useful for memory-constrained environments or debugging). |
183183

184184
*`auto` switches to `console` for an interactive TTY and `json` otherwise.
185185

186186
## Health check
187187

188-
The `/health` endpoint is unauthenticated and safe for container probes:
188+
The `/v1/health` endpoint is unauthenticated and safe for container probes:
189189

190190
```bash
191-
curl http://localhost:8080/health
191+
curl http://localhost:8080/v1/health
192192
```
193193

194194
```json
@@ -201,4 +201,4 @@ curl http://localhost:8080/health
201201

202202
`status` is `degraded` (HTTP 503) if any recipe failed to load. Use a Kubernetes readiness probe or Docker HEALTHCHECK targeting this endpoint — see [Serving API](../serving-api) for the full response contract.
203203

204-
For per-recipe detail including `kid`, `trained_at`, and `best_class`, use the authenticated `/health/details` endpoint.
204+
For per-recipe detail including `kid`, `trained_at`, and `best_class`, use the authenticated `/v1/health/details` endpoint.

docs/environment-variables.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -72,9 +72,9 @@ These variables control the runtime environment, graceful shutdown, and log outp
7272

7373
| Variable | Default | Scope | Clamping | Description |
7474
|---|---|---|---|---|
75-
| `RECOTEM_ENV` | (empty) | serve || Deployment environment tag. `--insecure-no-auth` is permitted only when set to `development`, `dev`, or `test`. `--dev-allow-unsigned` is permitted only when set to `development`. When set to `production`, `prod`, or `staging`, the `/docs`, `/redoc`, and `/openapi.json` endpoints are disabled (requests return 404). |
75+
| `RECOTEM_ENV` | (empty) | serve || Deployment environment tag. `--insecure-no-auth` is permitted only when set to `development`, `dev`, or `test`. `--dev-allow-unsigned` is permitted only when set to `development`. The `/docs`, `/redoc`, and `/openapi.json` endpoints are fail-secure: they are enabled only when this variable is one of `development`, `dev`, or `test`; for any other value (including unset, `production`, `prod`, `staging`, or a custom tag) those paths return 404. |
7676
| `RECOTEM_DRAIN_SECONDS` | `30` | serve | [1, 300] | SIGTERM graceful drain window in seconds. In-flight requests are given this window to complete before uvicorn closes remaining connections. For Kubernetes, set `terminationGracePeriodSeconds` to at least `RECOTEM_DRAIN_SECONDS + 5`. |
77-
| `RECOTEM_LOG_FORMAT` | `auto` | both || Log output format. `auto` uses JSON when stdout is not a TTY, console otherwise. `json` forces structured JSON. `console` forces human-readable output. |
77+
| `RECOTEM_LOG_FORMAT` | `auto` | both || Log output format. `auto` uses JSON when `stderr` is not a TTY, console otherwise. `json` forces structured JSON. `console` forces human-readable output. |
7878

7979
## Operational
8080

@@ -84,8 +84,8 @@ These variables configure storage paths, locking, metadata field filtering, and
8484
|---|---|---|---|---|
8585
| `RECOTEM_ARTIFACT_ROOT` | (empty) | train || If set, local `output.path` values in recipes must lie under this directory. Symlink escapes are rejected. Use this to confine where train processes can write artifacts on the host. |
8686
| `RECOTEM_LOCK_DIR` | (empty) | train || Override directory for per-recipe training lock files. Local `output.path` values always lock at `<output_path>.lock`. Remote `output.path` values (`s3://`, `gs://`, etc.) require a host-local lock file; if `RECOTEM_LOCK_DIR` is unset they fall back to `<tempdir>/recotem-locks/`. Note: `flock` is host-local — for cross-host single-writer guarantees use scheduler-level mutex (Kubernetes `concurrencyPolicy: Forbid`, etc.). |
87-
| `RECOTEM_METADATA_FIELD_DENY` | (empty) | serve || Comma-separated list of column names stripped from `/predict` responses after the item-metadata join. Matching is case-insensitive — `"Internal_ID"` in the metadata is stripped if `"internal_id"` is in the deny list. Use this to keep PII columns out of API responses. |
88-
| `RECOTEM_METRICS_ENABLED` | (unset) | serve || Truthy values: `1`, `true`, `yes`, `on`. Enables the Prometheus `/metrics` endpoint. Requires the `recotem[metrics]` extra (`pip install "recotem[metrics]"`). The endpoint is opt-in and off by default. |
87+
| `RECOTEM_METADATA_FIELD_DENY` | (empty) | serve || Comma-separated list of column names dropped from the item-metadata index at load time, so they never appear on any recommendation response (`:recommend`, `:recommend-related`, and `:batch-recommend*` when `include_metadata=true`). Matching is case-insensitive — `"Internal_ID"` in the metadata is stripped if `"internal_id"` is in the deny list. Use this to keep PII columns out of API responses. |
88+
| `RECOTEM_METRICS_ENABLED` | (unset) | serve || Truthy values: `1`, `true`, `yes`, `on`. Enables the Prometheus `/v1/metrics` endpoint. Requires the `recotem[metrics]` extra (`pip install "recotem[metrics]"`). The endpoint is opt-in and off by default. |
8989

9090
## Data source
9191

docs/index.md

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ title: Architecture
44

55
# Architecture
66

7-
Recotem is a recipe-driven recommender system: a single YAML file (the _recipe_) defines the data source, training configuration, and artifact destination. One recipe produces one trained model and one `/predict/{name}` HTTP endpoint.
7+
Recotem is a recipe-driven recommender system: a single YAML file (the _recipe_) defines the data source, training configuration, and artifact destination. One recipe produces one trained model and a set of `/v1/recipes/{name}:<verb>` HTTP endpoints.
88

99
## System overview
1010

@@ -26,10 +26,10 @@ Recotem is a recipe-driven recommender system: a single YAML file (the _recipe_)
2626
│ │ │
2727
│ ├── HMAC verify │
2828
│ ├── deserialize payload │
29-
│ └── FastAPI /predict/{name} │
30-
│ │ │
31-
│ ▼ │
32-
│ API client request │
29+
│ └── FastAPI /v1/recipes/{name}:recommend
30+
│ │
31+
│ ▼
32+
│ API client request
3333
└─────────────────────────────────────┘
3434
```
3535

@@ -40,7 +40,10 @@ Recotem is a recipe-driven recommender system: a single YAML file (the _recipe_)
4040
A recipe is the single source of truth for a model:
4141

4242
```
43-
1 recipe YAML → 1 trained artifact → 1 /predict/{name} endpoint
43+
1 recipe YAML → 1 trained artifact → /v1/recipes/{name}:recommend
44+
/v1/recipes/{name}:recommend-related
45+
/v1/recipes/{name}:batch-recommend
46+
/v1/recipes/{name}:batch-recommend-related
4447
```
4548

4649
The recipe captures:
@@ -75,8 +78,8 @@ magic | version | reserved | kid | hmac | header_json | payload
7578
|-------|------------------|-------------|
7679
| Operator | Recipe YAML, signing keys, env vars, `RECOTEM_SIGNING_KEYS` | Fully trusted |
7780
| Training host | Reads source data, writes signed artifact | Trusted (operator-controlled) |
78-
| Serving host | Reads artifact directory, serves `/predict` | Trusted (operator-controlled) |
79-
| API client | Sends `/predict` requests with an API key | Untrusted user input |
81+
| Serving host | Reads artifact directory, serves `/v1/recipes/{name}:<verb>` | Trusted (operator-controlled) |
82+
| API client | Sends `/v1/recipes/{name}:<verb>` requests with an API key | Untrusted user input |
8083
| Artifact file | Immutable signed binary; any tamper fails HMAC | Authenticated by HMAC |
8184

8285
Recipes can reference environment variables for dynamic values (via `${RECOTEM_RECIPE_*}` expansion). The expansion mechanism is restricted to that prefix and never applied inside `source.query` or `source.query_parameters` to foreclose SQL injection.
@@ -90,7 +93,7 @@ The serving process polls the recipes directory for artifact file changes. When
9093
3. Atomically replace the in-memory model reference.
9194
4. The previous model is evicted; all subsequent requests use the new model.
9295

93-
Hot-swap is **recipe-scoped**: updating artifact `A` does not affect the in-flight model for recipe `B`. The serving process never restarts. If HMAC verification or deserialization of the new artifact fails, the previous model continues serving and the failure is recorded in `/health` and in the `recotem_artifact_load_failures_total` Prometheus metric (when metrics are enabled).
96+
Hot-swap is **recipe-scoped**: updating artifact `A` does not affect the in-flight model for recipe `B`. The serving process never restarts. If HMAC verification or deserialization of the new artifact fails, the previous model continues serving and the failure is recorded in `/v1/health` and in the `recotem_artifact_load_failures_total` Prometheus metric (when metrics are enabled).
9497

9598
The watcher poll interval is configured by `RECOTEM_WATCH_INTERVAL` (default 5 s, clamped to 1–30 s).
9699

@@ -123,7 +126,7 @@ This separation means:
123126
| Command | Purpose |
124127
|---------|---------|
125128
| `recotem train <recipe.yaml>` | Fetch data, run Optuna search, train best model, sign artifact |
126-
| `recotem serve --recipes <dir>` | Start FastAPI `/predict` server with hot-swap |
129+
| `recotem serve --recipes <dir>` | Start FastAPI `/v1/recipes` server with hot-swap |
127130
| `recotem inspect <artifact>` | Read and verify artifact header (no payload deserialization) |
128131
| `recotem validate <recipe.yaml>` | Validate recipe schema and probe data-source connectivity |
129132
| `recotem schema` | Emit JSON Schema for the Recipe model (IDE integration) |

0 commit comments

Comments
 (0)