|
8 | 8 |
|
9 | 9 | ## Current Tasks |
10 | 10 |
|
11 | | -### Kafka Transport Metrics Parity with gRPC `[NEXT]` |
| 11 | +### Core Pillars Implementation `[NEXT]` |
| 12 | + |
| 13 | +Full plan: `docs/superpowers/plans/2026-03-26-core-pillars.md` |
| 14 | + |
| 15 | +**Phase 1: OTel Tracing Auto-Propagation** |
| 16 | +- [ ] Auto-initialise OTel layer in logger when `otel` feature + `OTEL_EXPORTER_OTLP_ENDPOINT` set |
| 17 | +- [ ] gRPC trace context propagation (tonic interceptors, `traceparent` header) |
| 18 | +- [ ] Kafka trace context propagation (message headers) |
| 19 | +- [ ] HTTP client trace context injection |
| 20 | +- [ ] HTTP server trace context extraction |
| 21 | + |
| 22 | +**Phase 2: Unified HealthState** |
| 23 | +- [ ] `src/health/` module with global `HealthRegistry` singleton |
| 24 | +- [ ] `HealthComponent` trait — modules register at construction |
| 25 | +- [ ] Wire transport, circuit breaker, config reloader into registry |
| 26 | +- [ ] `/readyz` aggregates from `HealthRegistry::is_healthy()` |
| 27 | +- [ ] `/health/detailed` JSON endpoint with per-component status |
| 28 | + |
| 29 | +**Phase 3: Unified Graceful Shutdown** |
| 30 | +- [ ] `src/shutdown/` module with global `CancellationToken` |
| 31 | +- [ ] SIGTERM/SIGINT → `token.cancel()` → all modules drain |
| 32 | +- [ ] Wire http-server, tiered-sink, config-reloader, gRPC transport |
| 33 | + |
| 34 | +**Phase 4: New Transports** |
| 35 | +- [ ] File transport (NDJSON, wraps existing `NdjsonWriter`) |
| 36 | +- [ ] Pipe transport (stdin/stdout, newline-delimited) |
| 37 | +- [ ] HTTP transport (POST to endpoint, uses `HttpClient`) |
| 38 | +- [ ] Redis/Valkey Streams transport (`XADD`/`XREADGROUP`/`XACK`) |
| 39 | + |
| 40 | +**Phase 5: DLQ Transport Integration** |
| 41 | +- [ ] DLQ Kafka backend uses `Box<dyn Transport>` instead of raw producer |
| 42 | +- [ ] DLQ can write to any transport (file, HTTP, Redis, Kafka) |
| 43 | + |
| 44 | +**Phase 6: Always-On Defaults** |
| 45 | +- [ ] Make config, logger, metrics, health, shutdown default features |
| 46 | +- [ ] Downstream dfe-* app remediation (remove boilerplate) |
| 47 | +- [ ] Audit hyperi-pylib and write alignment plan |
12 | 48 |
|
13 | | -gRPC transport (v1.19.7) auto-emits `dfe_transport_*` metrics. Kafka transport does not — two gaps: |
14 | | - |
15 | | -1. **Add metrics instrumentation to `KafkaTransport::send()`** — emit `dfe_transport_sent_total{transport="kafka"}`, `dfe_transport_send_errors_total{transport="kafka"}`, `dfe_transport_backpressured_total{transport="kafka"}`, and send duration histogram, matching gRPC parity. |
16 | | - |
17 | | -2. **Wire `StatsContext` into `KafkaTransport`** — currently `new_with_context()` is a stub that ignores the context. The struct uses `DefaultConsumerContext` / plain `FutureProducer`, so `statistics.interval.ms` callbacks (set to 1000ms by all profiles) go to a no-op. Need to either make the struct generic over context (complicates `Transport` trait impl) or use `StatsContext` by default when the `metrics` feature is enabled. This enables `rdkafka_broker_rtt_avg_seconds`, `rdkafka_global_msg_cnt`, `rdkafka_topic_partition_consumer_lag`, etc. |
18 | | - |
19 | | -Downstream impact: dfe-fetcher, dfe-receiver, dfe-loader all use `KafkaTransport` and would get these for free once wired. |
20 | | - |
21 | | -### Gap Analysis P2 — HTTP Client, Database URLs, Cache |
22 | | - |
23 | | -- [ ] HTTP client module with retry middleware (reqwest + reqwest-middleware + reqwest-retry) |
24 | | - - Wrap reqwest with exponential backoff, configurable timeouts |
25 | | - - Auto-register config via `unmarshal_key_registered` |
26 | | - - Metrics integration (request count, duration, errors) |
27 | | -- [ ] Database URL builders (PostgreSQL, ClickHouse, Redis) |
28 | | - - Build connection strings from env vars with standard prefixes |
29 | | - - `SensitiveString` for password fields |
30 | | -- [ ] Cache module with disk/memory backends |
31 | | - - Consolidate secrets cache pattern into reusable module |
32 | | - - TTL, stale-while-revalidate, size bounds |
| 49 | +--- |
33 | 50 |
|
34 | 51 | ### Completed Recent |
35 | 52 |
|
36 | | -- [x] **Config registry** (v1.19.3-v1.19.5) — auto-registering reflectable config, `/config` admin endpoint, `SensitiveString`, heuristic redaction, change notification, `ConfigReloader` hook |
37 | | -- [x] **CEL expression profile** (v1.19.2) — `matches()` blocked by default, `ProfileConfig` with per-category overrides, string literal false-positive prevention |
38 | | -- [x] **Config cascade wiring** (v1.19.2) — expression, memory, version_check, scaling, grpc, secrets auto-read from figment cascade |
| 53 | +- [x] **Universal metrics instrumentation** (v1.19.8) — tiered-sink, spool, dlq, cache, http-client, secrets all auto-emit Prometheus metrics via global singleton. Core pillar design decision documented in CLAUDE.md. |
| 54 | +- [x] **Kafka transport metrics + StatsContext** (v1.19.8) — `KafkaTransport` always uses `StatsContext` for consumer and producer. `dfe_transport_*` metrics on `send()`. `rdkafka_*` metrics auto-emitted. Zero downstream code changes. |
| 55 | +- [x] **gRPC transport metrics** (v1.19.7) — `dfe_transport_*` metrics on send/recv. Server push handler uses `try_send` with backpressure status codes. |
| 56 | +- [x] **HTTP client module** (v1.19.6) — reqwest + reqwest-middleware + reqwest-retry, exponential backoff, config cascade |
| 57 | +- [x] **Database URL builders** (v1.19.6) — PostgreSQL, ClickHouse, Redis/Valkey, MongoDB. Display trait redacts passwords. |
| 58 | +- [x] **Cache module** (v1.19.6) — moka-backed concurrent in-memory cache, per-source TTL, source isolation |
| 59 | +- [x] **Dependency update** (v1.19.6) — all deps to latest, cargo-audit ignores for transitive advisories |
| 60 | +- [x] **Config registry** (v1.19.3-v1.19.5) — auto-registering reflectable config, `/config` admin endpoint, `SensitiveString`, heuristic redaction, change notification |
| 61 | +- [x] **CEL expression profile** (v1.19.2) — `matches()` blocked by default, `ProfileConfig` with per-category overrides |
| 62 | +- [x] **Config cascade wiring** (v1.19.2) — expression, memory, version_check, scaling, grpc, secrets auto-read from cascade |
39 | 63 | - [x] **MemoryGuard underflow fix** (v1.19.1) — `fetch_sub` replaced with `fetch_update` + `saturating_sub` |
40 | | -- [x] **Test restructure** (v1.19.1) — `tests/integration/`, `tests/e2e/`, `tests/common/` per testing standard |
| 64 | +- [x] **Test restructure** (v1.19.1) — `tests/integration/`, `tests/e2e/`, `tests/common/` |
41 | 65 | - [x] **hyperi-ci release-merge** — CLI command replaces per-project workflow files |
42 | | -- [x] **Rust edition 2024** — migrated from 2021; `temp-env` replaces unsafe `set_var`/`remove_var` in tests across 6 files |
43 | | -- [x] **async-trait removal** — public traits (`Sink`, `Transport`, `SecretProvider`) now use `fn ... -> impl Future + Send` (Rust 1.75+ native) |
44 | | -- [x] **kafka_config module** — `config_from_file`, 7 named profiles, `merge_with_overrides`; librdkafka settings loaded from config git dir (only cascade exception) |
45 | | -- [x] **File output sink** — `src/io/`, `src/output/`, `output-file` feature |
46 | | -- [x] **CLI module** — CommonArgs, StandardCommand, DfeApp trait (`cli` feature) |
47 | | -- [x] **Top module** — ratatui TUI dashboard, Prometheus parser, oneshot mode (`top` feature) |
48 | | -- [x] **CI gating fix** — Semantic Release now gated on CI success via workflow_run |
49 | | - |
50 | | ---- |
51 | | - |
52 | | -## Completed |
53 | | - |
54 | | -- [x] Vector compat integration tests — 6 tests using real Vector binary + VectorCompatClient (fetch-vector.sh + YAML) |
55 | | -- [x] vault_env integration tests fixed — clear_vault_env() prevents VAULT_TOKEN leakage |
56 | | -- [x] Dependency update sweep — all crates to latest, tonic/prost 0.14 migration (v1.8.4) |
57 | | -- [x] Stale hs-rustlib removed from JFrog hypersec-cargo-local and hyperi-cargo-local |
58 | | -- [x] MaskingLayer fixed — writer-based redaction for both JSON and text formats (v1.8.4) |
59 | | -- [x] Logger output capturing tests — 10 tests (JSON, text, filtering, masking) |
60 | | -- [x] Coloured log output — custom FormatEvent with owo-colors colour scheme |
61 | | -- [x] Metrics graceful shutdown tests — 4 tests (shutdown, rapid cycle, render after stop, concurrent) |
62 | | -- [x] gRPC transport integration tests — 8 tests (send/recv, ordering, large payload, compression) |
63 | | -- [x] gRPC transport with Vector wire protocol compatibility (v1.8.0) |
64 | | - - tonic-based gRPC replacing Zenoh transport |
65 | | - - DFE native proto (`dfe.transport.v1`) + vendored Vector proto |
66 | | - - Vector compat source/sink for migration from Vector pipelines |
67 | | - - build.rs for conditional proto code generation |
68 | | -- [x] Zenoh transport removed — replaced by gRPC (v1.8.0) |
69 | | -- [x] Version check module — startup check against releases.hyperi.io (v1.7.0) |
70 | | -- [x] Deployment validation module — Helm chart and Dockerfile contract checks (v1.7.0) |
71 | | -- [x] CI: ARC self-hosted runners enabled (v1.7.1–v1.8.3) |
72 | | -- [x] Clippy/formatting fixes — approx_constant lint, dprint float formatting (v1.8.1–v1.8.3) |
73 | | -- [x] Package rename: hs-rustlib -> hyperi-rustlib, published v1.4.3 to JFrog |
74 | | -- [x] Rebrand: HyperSec -> HyperI across source, docs, configs, workflows |
75 | | -- [x] Registry migration: hypersec registry -> hyperi registry |
76 | | -- [x] Submodule URLs: hypersec-io -> hyperi-io |
77 | | -- [x] CI config: .hypersec-ci.yaml -> .hyperi-ci.yaml |
78 | | -- [x] Directory-config store with git2 integration (v1.4.0) |
79 | | -- [x] OpenTelemetry metrics support (v1.4.0) |
80 | | -- [x] Secrets management module (OpenBao/Vault, AWS) (v1.3.x) |
81 | | -- [x] HTTP server module (axum-based) (v1.2.0) |
82 | | -- [x] Transport module (Kafka/Memory abstraction) |
83 | | -- [x] TieredSink module (disk spillover with circuit breaker) |
84 | | -- [x] Spool module (disk-backed queue) |
85 | | -- [x] Configuration module (7-layer cascade with figment) |
86 | | -- [x] Logger module (structured JSON, RFC3339, masking) |
87 | | -- [x] Metrics module (Prometheus + process/container) |
88 | | -- [x] Environment detection module |
89 | | -- [x] Runtime paths module (XDG + container awareness) |
90 | | -- [x] Dependency audit (serde_yml -> serde-yaml-ng, queue-file -> yaque, once_cell -> LazyLock) |
91 | | -- [x] Config cascade alignment with hyperi-pylib unified spec (v1.6.0) |
92 | | - - load_home_dotenv default false, app_name support, container/user config paths |
93 | | - - Created docs/CONFIG-CASCADE.md |
94 | | - - PG layer documented as built-for-not-with (YAML gitops covers current needs) |
95 | 66 |
|
96 | 67 | --- |
97 | 68 |
|
98 | | -## Backlog (P1 - Config Registry) |
99 | | - |
100 | | -### Reflectable Config Registry |
101 | | - |
102 | | -Central registry where every module registers its config section at startup. |
103 | | -Currently modules independently call `unmarshal_key()` — no visibility into |
104 | | -what config keys exist, their types, defaults, or descriptions. |
105 | | - |
106 | | -**Goal:** Any DFE app can list/dump/expose all available config sections. |
107 | | - |
108 | | -- [x] Auto-registration via `unmarshal_key_registered` — records `(key, type_name, defaults, effective)` in global registry. Zero code changes in downstream apps. |
109 | | -- [x] `registry::sections()` — list all registered sections |
110 | | -- [x] `registry::dump_effective()` — JSON map of effective values |
111 | | -- [x] `registry::dump_defaults()` — JSON map of defaults (via `T::default()`) |
112 | | -- [x] Heuristic auto-redaction (password, secret, token, key, credential, auth, private, cert, encryption) |
113 | | -- [x] `#[serde(skip_serializing)]` as additional layer for fields that should never appear |
114 | | -- [x] expression, memory, version_check, scaling, grpc, secrets wired with `from_cascade()` auto-register |
115 | | -- [x] Modules without defaults (tiered_sink, http_server, kafka, spool, dlq) use `unmarshal_key_registered` from downstream apps |
116 | | -- [x] `/config` admin endpoint (opt-in via `enable_config_endpoint`) — returns redacted effective + defaults JSON |
117 | | -- [x] Change notification (opt-in) — `registry::on_change(key, callback)` + `registry::update()` |
118 | | - - Modules that need hot-reload subscribe; others keep `OnceLock` (init-once) |
119 | | -- [x] `ConfigReloader.with_registry_update(key)` connects hot-reload to registry |
120 | | -- [x] `SensitiveString` type — compile-time safe, `Serialize` always redacts |
121 | | -- [x] 19 registry + 12 sensitive string tests covering all redaction guarantees |
122 | | -- [ ] Migrate all dfe-* and hyperi-* apps to `unmarshal_key_registered` pattern |
123 | | -- [ ] Align hyperi-pylib with same registry pattern |
124 | | - |
125 | | ---- |
126 | | - |
127 | | -## Backlog (P2 - from Gap Analysis) |
128 | | - |
129 | | -### Phase 1 - Core Enterprise |
130 | | - |
131 | | -- [ ] Database URL builders module (PostgreSQL, Redis) |
132 | | -- [ ] HTTP client module with retry middleware (reqwest-retry) |
| 69 | +## Backlog |
133 | 70 |
|
134 | 71 | ### Secrets Providers |
135 | 72 |
|
136 | | -- [ ] GCP Secret Manager provider (`secrets-gcp` feature, `google-cloud-secretmanager` crate) |
137 | | -- [ ] Azure Key Vault provider (`secrets-azure` feature, `azure_security_keyvault` crate) |
| 73 | +- [ ] GCP Secret Manager provider (`secrets-gcp` feature) |
| 74 | +- [ ] Azure Key Vault provider (`secrets-azure` feature) |
138 | 75 |
|
139 | | -### Phase 2 - Enhanced Features |
| 76 | +### Kafka — Opinionated SASL-SCRAM Named Constructors |
140 | 77 |
|
141 | | -- [ ] Cache module with disk/Redis backing |
142 | | -- [ ] CLI framework helpers (wrap Clap) |
| 78 | +- [ ] `KafkaConfig::external_sasl_scram(brokers, username, password)` — SASL_SSL + SCRAM-SHA-512 |
| 79 | +- [ ] `KafkaConfig::internal_sasl_scram(brokers, username, password)` — SASL_PLAINTEXT + SCRAM-SHA-512 |
143 | 80 |
|
144 | | -### Phase 3 - Advanced |
| 81 | +### Other |
145 | 82 |
|
146 | | -- [ ] Standalone Kafka client (if transport layer insufficient) |
147 | 83 | - [ ] PII anonymiser (evaluate Rust libraries) |
148 | 84 | - [ ] Python bindings for ClickHouse client (PyO3) |
149 | 85 |
|
150 | | -### Kafka — Opinionated SASL-SCRAM Named Constructors |
151 | | - |
152 | | -- [ ] Add `KafkaConfig::external_sasl_scram(brokers, username, password)` — SASL_SSL + SCRAM-SHA-512 |
153 | | -- [ ] Add `KafkaConfig::internal_sasl_scram(brokers, username, password)` — SASL_PLAINTEXT + SCRAM-SHA-512 |
154 | | -- [ ] Encodes the decision once: SCRAM works unchanged on Apache Kafka, AutoMQ, MSK, Confluent Cloud |
155 | | -- [ ] Remove per-project manual assembly of protocol + sasl + tls fields in dfe-loader, dfe-receiver |
156 | | - |
157 | 86 | --- |
158 | 87 |
|
159 | 88 | ## Notes |
160 | 89 |
|
161 | 90 | - Use `CARGO_BUILD_JOBS=2` for all cargo commands |
162 | | -- Transport backends: Kafka, gRPC (native + Vector compat), Memory (Zenoh removed in v1.8.0) |
| 91 | +- Transport backends: Kafka, gRPC (native + Vector compat), Memory |
| 92 | +- Core pillars plan: `docs/superpowers/plans/2026-03-26-core-pillars.md` |
163 | 93 | - See docs/GAP_ANALYSIS.md for detailed comparison with hyperi-pylib |
164 | | -- See docs/CLICKHOUSE_PYTHON_BINDINGS.md for Python binding proposal |
0 commit comments