Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion .vitepress/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,6 @@ function v2DocsSidebar(lang: 'en' | 'ja'): DefaultTheme.SidebarItem[] {
{ text: 'CSV / Parquet', link: `${prefix}/data-sources/csv` },
{ text: 'BigQuery', link: `${prefix}/data-sources/bigquery` },
{ text: 'SQL', link: `${prefix}/data-sources/sql` },
{ text: 'GA4', link: `${prefix}/data-sources/ga4` },
{ text: lang === 'ja' ? 'プラグイン' : 'Plugins', link: `${prefix}/data-sources/plugins` },
],
},
Expand Down
131 changes: 0 additions & 131 deletions docs/data-sources/ga4.md

This file was deleted.

14 changes: 7 additions & 7 deletions docs/deployment/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ services:
healthcheck:
test:
- "CMD-SHELL"
- "python -c \"import sys, urllib.request; sys.exit(0 if urllib.request.urlopen('http://127.0.0.1:8080/health', timeout=5).status == 200 else 1)\""
- "python -c \"import sys, urllib.request; sys.exit(0 if urllib.request.urlopen('http://127.0.0.1:8080/v1/health', timeout=5).status == 200 else 1)\""
interval: 30s
timeout: 10s
retries: 3
Expand Down Expand Up @@ -122,7 +122,7 @@ Named Docker volumes (as in `compose.yaml`) are pre-created with the right owner

### Image-level HEALTHCHECK

The Dockerfile declares its own `HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3` that probes the public `/health` endpoint with `urllib.request.urlopen(f'http://127.0.0.1:{RECOTEM_PORT}/health', timeout=3)` (so it picks up an overridden `RECOTEM_PORT`). For one-shot `train` containers this fires after the process has already exited and causes no spurious failures. The Compose-level healthcheck shown in the annotated example also targets `/health` and overrides the image default for the `serve` service orchestrators should rely on the HTTP 200 response from `/health`.
The Dockerfile declares its own `HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3` that probes `urllib.request.urlopen(f'http://127.0.0.1:{RECOTEM_PORT}/health', timeout=3)` (so it picks up an overridden `RECOTEM_PORT`). Note: the image-default probe targets `/health` (no `/v1` prefix); because the v1 router is mounted at `/v1` the public health endpoint is `/v1/health`. The Compose-level healthcheck shown in the annotated example overrides the image default for the `serve` service and targets `/v1/health`; orchestrators should rely on the HTTP 200 response from `/v1/health`. For one-shot `train` containers the image healthcheck fires after the process has already exited and causes no spurious failures.

### Reverse proxy binding

Expand Down Expand Up @@ -177,18 +177,18 @@ See [cron / systemd Deployment](./cron-systemd) for host-based scheduling patter
| `RECOTEM_ENV` | no | `""` | `--insecure-no-auth` permitted when set to `development`, `dev`, or `test`; `--dev-allow-unsigned` permitted only when set to `development`. |
| `RECOTEM_ARTIFACT_ROOT` | no | `""` | If set, local `output.path` must resolve under this directory (symlink-escape guard) |
| `RECOTEM_LOCK_DIR` | no | `""` | Override directory for per-recipe training lock files. Needed when `output.path` is a remote URI (lock files must be host-local). Falls back to a temp dir under the system temp directory. |
| `RECOTEM_METADATA_FIELD_DENY` | no | `""` | Comma-separated column names stripped from `/predict` responses after the metadata join |
| `RECOTEM_METRICS_ENABLED` | no | `""` | Set to `1`/`true`/`yes`/`on` to enable the Prometheus `/metrics` endpoint. Requires `recotem[metrics]` extra. |
| `RECOTEM_METADATA_FIELD_DENY` | no | `""` | Comma-separated column names stripped from `/v1/recipes/{name}:recommend` and `:recommend-related` responses after the metadata join |
| `RECOTEM_METRICS_ENABLED` | no | `""` | Set to `1`/`true`/`yes`/`on` to enable the Prometheus `/v1/metrics` endpoint. Requires `recotem[metrics]` extra. |
| `RECOTEM_STARTUP_PARALLELISM` | no | `""` (auto) | Number of parallel threads used to load artifacts at startup. Default is `min(len(recipes), 8)`. Clamped 1–32. Set to `1` for sequential loading (useful for memory-constrained environments or debugging). |

*`auto` switches to `console` for an interactive TTY and `json` otherwise.

## Health check

The `/health` endpoint is unauthenticated and safe for container probes:
The `/v1/health` endpoint is unauthenticated and safe for container probes:

```bash
curl http://localhost:8080/health
curl http://localhost:8080/v1/health
```

```json
Expand All @@ -201,4 +201,4 @@ curl http://localhost:8080/health

`status` is `degraded` (HTTP 503) if any recipe failed to load. Use a Kubernetes readiness probe or Docker HEALTHCHECK targeting this endpoint — see [Serving API](../serving-api) for the full response contract.

For per-recipe detail including `kid`, `trained_at`, and `best_class`, use the authenticated `/health/details` endpoint.
For per-recipe detail including `kid`, `trained_at`, and `best_class`, use the authenticated `/v1/health/details` endpoint.
11 changes: 5 additions & 6 deletions docs/environment-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,9 +72,9 @@ These variables control the runtime environment, graceful shutdown, and log outp

| Variable | Default | Scope | Clamping | Description |
|---|---|---|---|---|
| `RECOTEM_ENV` | (empty) | serve | — | Deployment environment tag. `--insecure-no-auth` is permitted only when set to `development`, `dev`, or `test`. `--dev-allow-unsigned` is permitted only when set to `development`. When set to `production`, `prod`, or `staging`, the `/docs`, `/redoc`, and `/openapi.json` endpoints are disabled (requests return 404). |
| `RECOTEM_ENV` | (empty) | serve | — | Deployment environment tag. `--insecure-no-auth` is permitted only when set to `development`, `dev`, or `test`. `--dev-allow-unsigned` is permitted only when set to `development`. The `/docs`, `/redoc`, and `/openapi.json` endpoints are fail-secure: they are enabled only when this variable is one of `development`, `dev`, or `test`; for any other value (including unset, `production`, `prod`, `staging`, or a custom tag) those paths return 404. |
| `RECOTEM_DRAIN_SECONDS` | `30` | serve | [1, 300] | SIGTERM graceful drain window in seconds. In-flight requests are given this window to complete before uvicorn closes remaining connections. For Kubernetes, set `terminationGracePeriodSeconds` to at least `RECOTEM_DRAIN_SECONDS + 5`. |
| `RECOTEM_LOG_FORMAT` | `auto` | both | — | Log output format. `auto` uses JSON when stdout is not a TTY, console otherwise. `json` forces structured JSON. `console` forces human-readable output. |
| `RECOTEM_LOG_FORMAT` | `auto` | both | — | Log output format. `auto` uses JSON when `stderr` is not a TTY, console otherwise. `json` forces structured JSON. `console` forces human-readable output. |

## Operational

Expand All @@ -84,19 +84,18 @@ These variables configure storage paths, locking, metadata field filtering, and
|---|---|---|---|---|
| `RECOTEM_ARTIFACT_ROOT` | (empty) | train | — | If set, local `output.path` values in recipes must lie under this directory. Symlink escapes are rejected. Use this to confine where train processes can write artifacts on the host. |
| `RECOTEM_LOCK_DIR` | (empty) | train | — | Override directory for per-recipe training lock files. Local `output.path` values always lock at `<output_path>.lock`. Remote `output.path` values (`s3://`, `gs://`, etc.) require a host-local lock file; if `RECOTEM_LOCK_DIR` is unset they fall back to `<tempdir>/recotem-locks/`. Note: `flock` is host-local — for cross-host single-writer guarantees use scheduler-level mutex (Kubernetes `concurrencyPolicy: Forbid`, etc.). |
| `RECOTEM_METADATA_FIELD_DENY` | (empty) | serve | — | Comma-separated list of column names stripped from `/predict` responses after the item-metadata join. Matching is case-insensitive — `"Internal_ID"` in the metadata is stripped if `"internal_id"` is in the deny list. Use this to keep PII columns out of API responses. |
| `RECOTEM_METRICS_ENABLED` | (unset) | serve | — | Truthy values: `1`, `true`, `yes`, `on`. Enables the Prometheus `/metrics` endpoint. Requires the `recotem[metrics]` extra (`pip install "recotem[metrics]"`). The endpoint is opt-in and off by default. |
| `RECOTEM_METADATA_FIELD_DENY` | (empty) | serve | — | Comma-separated list of column names dropped from the item-metadata index at load time, so they never appear on any recommendation response (`:recommend`, `:recommend-related`, and `:batch-recommend*` when `include_metadata=true`). Matching is case-insensitive — `"Internal_ID"` in the metadata is stripped if `"internal_id"` is in the deny list. Use this to keep PII columns out of API responses. |
| `RECOTEM_METRICS_ENABLED` | (unset) | serve | — | Truthy values: `1`, `true`, `yes`, `on`. Enables the Prometheus `/v1/metrics` endpoint. Requires the `recotem[metrics]` extra (`pip install "recotem[metrics]"`). The endpoint is opt-in and off by default. |

## Data source

These variables tune behaviour of specific data sources. They are read only by `recotem train` and only when the corresponding source is used. See the [Data sources](./data-sources/) reference for full context.
These variables tune behaviour of specific data sources. They are read only by `recotem train` and only when the corresponding source is used. See the [Data sources](./recipe-reference#source) reference for full context.

| Variable | Default | Scope | Clamping | Description |
|---|---|---|---|---|
| `RECOTEM_BQ_REQUIRE_STORAGE_API` | (unset) | train | — | Truthy values: `1`, `true`, `yes`, `on`. When set, the BigQuery source raises `DataSourceError` (exit 3) instead of silently falling back to the slower REST API when the BigQuery Storage Read API fails (e.g. missing `bigquery.readSessions.create` IAM permission). Use this to surface IAM gaps rather than accepting degraded throughput. |
| `RECOTEM_MAX_SQL_ROWS` | `50_000_000` | train | [1_000, 500_000_000] | Hard cap on the number of rows returned by the SQL data source. Exceeding the cap raises `DataSourceError` (exit 3). Caps **row count**, not DataFrame resident memory — see [SQL source — memory bound caveat](./data-sources/sql#memory-bound-caveat). |
| `RECOTEM_SQL_ALLOW_PRIVATE` | (unset) | train | — | Truthy values: `1`, `true`, `yes`, `on`. Opts the SQL source into accepting private/loopback DSN hosts (default deny, for SSRF). Covers every driver-routing form — netloc, `?host=`, `?hostaddr=`, `?service=`, `?unix_socket=`, absolute-path host, and network DSNs with no host info — all default-deny without this flag. Also disables the DNS-rebinding re-check before each probe/fetch — opting in means trusting the host end-to-end. |
| `RECOTEM_GA4_MAX_PAGES` | `500` | train | [1, 10_000] | Hard ceiling on GA4 Data API pagination loops. Reached when a property is too large for the default; raise after confirming quota. |

## Recipe expansion

Expand Down
Loading