You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Access-log rotation, record truncation, and log-shipping configs
- _configure_access_log() in vgi_rpc/rpc/__init__.py wires
RotatingFileHandler / TimedRotatingFileHandler / plain FileHandler
based on max_bytes / when / fallback. Path supports {pid} and
{server_id} placeholders; parent dirs are auto-created.
- VgiAccessLogFormatter (logging_utils.py) now caps each emitted line
at max_record_bytes, sheds optional fields when over, and finally
collapses to a "record_too_large" sentinel when nothing else fits.
original_request_bytes records the dropped request_data length.
- access_log.schema.json adds the truncated and original_request_bytes
fields and relaxes the "method_type=unary requires request_data"
rule when truncated is set.
- New tests/test_access_log_rotation.py covering placeholder
substitution, rotation, and per-record truncation.
- Docs: access-log-spec.md describes the truncation contract;
access-log.md is rewritten around the new rotation/truncation flow;
docs/log-shipping/ ships Fluent Bit and Vector configs for S3, GCS,
and Azure as starter recipes.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: CLAUDE.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -64,7 +64,7 @@ The full process before committing code is
64
64
65
65
-**`introspect.py`** — Introspection support. Provides the built-in `__describe__` RPC method, `MethodDescription`, `ServiceDescription`, `build_describe_batch`, `parse_describe_batch`, `compute_protocol_hash`, and `introspect()`. Enabled on `RpcServer` via `enable_describe=True`. The wire format is `DESCRIBE_VERSION = "4"` — slim 8-column schema (`name`, `method_type`, `has_return`, `params_schema_ipc`, `result_schema_ipc`, `has_header`, `header_schema_ipc`, `is_exchange`). Python-flavoured fields (`doc`, `param_types_json`, `param_defaults_json`, `param_docs_json`) were dropped in v4 so the wire stays language-neutral; the Protocol source class is the source of truth for human-readable type names, defaults, and docstrings. The response batch's custom metadata also carries `vgi_rpc.protocol_hash` — a SHA-256 hex digest over the canonical describe payload that uniquely identifies the protocol contract within a process and is stable across runs/builds for the same Protocol.
66
66
67
-
-**`access_log.schema.json`** + **`access_log_conformance.py`** — Cross-language access-log spec. Every conformant server emits one JSON record per RPC call on the `vgi_rpc.access` logger; the schema is enforced by `vgi-rpc-test --access-log <path>`. Always-required fields include `protocol_hash` (the hash from `__describe__`) so consumers reading archived JSONL can decide whether a cached schema decoder still applies; `protocol_version` (operator-supplied free-form label) is optional. See `docs/access-log-spec.md` for the full contract and `docs/porting-guide.md` for the cross-language conformance status of Go/TypeScript/Java/Rust ports.
67
+
- **`access_log.schema.json`** + **`access_log_conformance.py`** — Cross-language access-log spec. **Access logging is an HTTP-transport concern only.** Pipe, subprocess, shared-memory pipe, and Unix-socket transports do not emit access logs — those transports run trusted, co-located worker processes where per-call audit logging adds no value. The `--access-log` flag and `_configure_access_log` helper are only meaningful when serving over HTTP (e.g. `vgi-rpc-test --http --access-log ...` or `serve_http(..., access_log=...)`). Every conformant HTTP server emits one JSON record per RPC call on the `vgi_rpc.access` logger; the schema is enforced by `vgi-rpc-test --access-log <path>`. Always-required fields include `protocol_hash` (the hash from `__describe__`) so consumers reading archived JSONL can decide whether a cached schema decoder still applies; `protocol_version` (operator-supplied free-form label) is optional. Records carry `truncated: true` (or `truncated: "record_too_large"`) when the formatter shed fields to stay under `--access-log-max-record-bytes` (default 1 MiB). Rotation is via `--access-log-max-bytes` (size) or `--access-log-when` (time); paths support `{pid}` and `{server_id}` placeholders so multiple HTTP workers in one container don't collide. Reference shipper configs (Vector + Fluent Bit, S3/GCS/Azure) live under `docs/log-shipping/`. See `docs/access-log-spec.md` for the full contract and `docs/porting-guide.md` for the cross-language conformance status of Go/TypeScript/Java/Rust ports.
68
68
69
69
-**`shm.py`** — Shared memory transport support. Provides `ShmAllocator`, `ShmSegment`, and pointer batch helpers for zero-copy Arrow IPC batch transfer between co-located processes. Used by `ShmPipeTransport`.
Downstream log shippers (Vector's `file` source, Fluent Bit's `tail` input) impose a per-line ceiling — Vector defaults to 100 KiB and Fluent Bit's `Buffer_Max_Size` defaults to 256 KiB. Lines longer than the shipper's ceiling are silently dropped.
117
+
118
+
To stay compatible, an emitter MAY enforce a per-record byte cap. When it does, it MUST shed fields in this order and signal the truncation via top-level keys:
119
+
120
+
1. Drop `request_data` and add `original_request_bytes` (integer, character length of the dropped field). Set `truncated: true`.
121
+
2. Replace `claims` with `{}`. Keep `truncated: true`.
122
+
3. If the record still exceeds the cap, emit a sentinel form: keep all always-required envelope fields plus `error_message` (when `status == "error"`) and set `truncated: "record_too_large"`. All other optional fields are dropped.
123
+
124
+
`error_message` MUST NOT be truncated — operators rely on the full server-side message for debugging. The Python reference implementation uses a default cap of 1 048 576 bytes (1 MiB), configurable via `--access-log-max-record-bytes` or the env var `VGI_RPC_ACCESS_LOG_MAX_RECORD_BYTES`. Pair the cap with shipper configs that raise their per-line limits to match (Vector's `max_line_bytes`, Fluent Bit's `Buffer_Max_Size`).
125
+
126
+
| Field | Type | Condition |
127
+
|---|---|---|
128
+
|`truncated`| boolean or `"record_too_large"`| Present iff field-shedding was applied. `true` = at least one optional field dropped. `"record_too_large"` = sentinel form; most optional fields dropped. |
129
+
|`original_request_bytes`| integer | Present when `request_data` was dropped due to truncation. Reports the character length of the dropped string. |
130
+
131
+
A `unary` record carrying `truncated` is NOT required to also carry `request_data` — the schema relaxes that rule when truncation is signalled.
132
+
133
+
## 5c. Encoding & atomicity
134
+
135
+
- One JSON object per line, terminated by `\n`. UTF-8 encoded. No literal newlines inside field values (the standard `json.dumps` escapes them).
136
+
- A single emitter process appending via the stdlib `logging.FileHandler` is thread-safe (the handler holds a lock) and atomic on Linux.
137
+
-**Two processes writing to the same access-log file is unsupported.** Concurrent appends from multiple processes can interleave, and concurrent rotation will race. Run one access-log file per process — use `{pid}` and/or `{server_id}` placeholders in the path. The Python reference implementation expands these placeholders in `--access-log` paths automatically.
138
+
139
+
## 5d. Rotation
140
+
141
+
Implementations MAY rotate the access log via rename (e.g. `access.jsonl` → `access.jsonl.1`). Both `logging.handlers.RotatingFileHandler` (size-based) and `TimedRotatingFileHandler` (time-based) in Python's stdlib implement this correctly, and Vector and Fluent Bit are designed to follow rename-rotated files. **Do not truncate-in-place** — shippers will lose their read position.
142
+
143
+
The Python reference implementation exposes:
144
+
145
+
-`--access-log-max-bytes N` / `VGI_RPC_ACCESS_LOG_MAX_BYTES` — size-based rotation when > 0.
-`--access-log-backup-count N` / `VGI_RPC_ACCESS_LOG_BACKUP_COUNT` — number of rotated files retained (default 5).
148
+
114
149
## 6. Extra fields
115
150
116
151
Implementations MAY add fields beyond those defined here. Validators MUST NOT reject records carrying unknown fields (`additionalProperties: true`). Conformance is measured by what the schema requires, not by what it forbids.
@@ -133,3 +168,4 @@ The exit code is `0` if every record passes, `1` if any record fails, `2` if the
0 commit comments