|
1 | 1 | # TODO - hyperi-rustlib |
2 | 2 |
|
3 | | -**Project Goal:** Rust shared library equivalent to hyperi-pylib (Python) and hyperi-golib (Go) |
| 3 | +**Project Goal:** Opinionated Rust framework for high-throughput data pipelines at PB scale |
4 | 4 |
|
5 | | -**Target:** Production-ready library for HyperI Rust applications |
| 5 | +**Target:** Production-ready library with auto-wiring config, logging, metrics, tracing, health, and graceful shutdown |
6 | 6 |
|
7 | 7 | --- |
8 | 8 |
|
9 | 9 | ## Current Tasks |
10 | 10 |
|
11 | | -### Kafka Transport Metrics Parity with gRPC `[NEXT]` |
| 11 | +### v2.0.0 Release `[NEXT]` |
12 | 12 |
|
13 | | -gRPC transport (v1.19.7) auto-emits `dfe_transport_*` metrics. Kafka transport does not — two gaps: |
| 13 | +All core pillar work is done. Need to: |
| 14 | +- [ ] Release-merge to release branch (feat!: breaking change → v2.0.0) |
| 15 | +- [ ] Verify crates.io publication |
| 16 | +- [ ] Docs consolidation (TRANSPORT.md, CORE-PILLARS.md, per-feature docs) |
| 17 | +- [ ] Add Redis vs Kafka comparison table to transport docs |
14 | 18 |
|
15 | | -1. **Add metrics instrumentation to `KafkaTransport::send()`** — emit `dfe_transport_sent_total{transport="kafka"}`, `dfe_transport_send_errors_total{transport="kafka"}`, `dfe_transport_backpressured_total{transport="kafka"}`, and send duration histogram, matching gRPC parity. |
| 19 | +### DLQ Transport Integration |
16 | 20 |
|
17 | | -2. **Wire `StatsContext` into `KafkaTransport`** — currently `new_with_context()` is a stub that ignores the context. The struct uses `DefaultConsumerContext` / plain `FutureProducer`, so `statistics.interval.ms` callbacks (set to 1000ms by all profiles) go to a no-op. Need to either make the struct generic over context (complicates `Transport` trait impl) or use `StatsContext` by default when the `metrics` feature is enabled. This enables `rdkafka_broker_rtt_avg_seconds`, `rdkafka_global_msg_cnt`, `rdkafka_topic_partition_consumer_lag`, etc. |
| 21 | +- [ ] DLQ Kafka backend uses `Box<dyn TransportSender>` / `AnySender` instead of raw producer |
| 22 | +- [ ] DLQ can write to any transport (file, HTTP, Redis, Kafka) |
18 | 23 |
|
19 | | -Downstream impact: dfe-fetcher, dfe-receiver, dfe-loader all use `KafkaTransport` and would get these for free once wired. |
| 24 | +### Identity / Auth Module (Discussion) |
20 | 25 |
|
21 | | -### Gap Analysis P2 — HTTP Client, Database URLs, Cache |
| 26 | +- [ ] Token validation middleware (JWT/OIDC) for gRPC interceptor + axum middleware |
| 27 | +- [ ] Service identity (service name + instance ID for mTLS, audit logs) |
| 28 | +- [ ] Break-glass: static bearer token from secrets module |
| 29 | +- [ ] Design decision: dfe-engine as SSoT, rustlib validates tokens only |
22 | 30 |
|
23 | | -- [ ] HTTP client module with retry middleware (reqwest + reqwest-middleware + reqwest-retry) |
24 | | - - Wrap reqwest with exponential backoff, configurable timeouts |
25 | | - - Auto-register config via `unmarshal_key_registered` |
26 | | - - Metrics integration (request count, duration, errors) |
27 | | -- [ ] Database URL builders (PostgreSQL, ClickHouse, Redis) |
28 | | - - Build connection strings from env vars with standard prefixes |
29 | | - - `SensitiveString` for password fields |
30 | | -- [ ] Cache module with disk/memory backends |
31 | | - - Consolidate secrets cache pattern into reusable module |
32 | | - - TTL, stale-while-revalidate, size bounds |
| 31 | +### Downstream Remediation |
33 | 32 |
|
34 | | -### Completed Recent |
35 | | - |
36 | | -- [x] **Config registry** (v1.19.3-v1.19.5) — auto-registering reflectable config, `/config` admin endpoint, `SensitiveString`, heuristic redaction, change notification, `ConfigReloader` hook |
37 | | -- [x] **CEL expression profile** (v1.19.2) — `matches()` blocked by default, `ProfileConfig` with per-category overrides, string literal false-positive prevention |
38 | | -- [x] **Config cascade wiring** (v1.19.2) — expression, memory, version_check, scaling, grpc, secrets auto-read from figment cascade |
39 | | -- [x] **MemoryGuard underflow fix** (v1.19.1) — `fetch_sub` replaced with `fetch_update` + `saturating_sub` |
40 | | -- [x] **Test restructure** (v1.19.1) — `tests/integration/`, `tests/e2e/`, `tests/common/` per testing standard |
41 | | -- [x] **hyperi-ci release-merge** — CLI command replaces per-project workflow files |
42 | | -- [x] **Rust edition 2024** — migrated from 2021; `temp-env` replaces unsafe `set_var`/`remove_var` in tests across 6 files |
43 | | -- [x] **async-trait removal** — public traits (`Sink`, `Transport`, `SecretProvider`) now use `fn ... -> impl Future + Send` (Rust 1.75+ native) |
44 | | -- [x] **kafka_config module** — `config_from_file`, 7 named profiles, `merge_with_overrides`; librdkafka settings loaded from config git dir (only cascade exception) |
45 | | -- [x] **File output sink** — `src/io/`, `src/output/`, `output-file` feature |
46 | | -- [x] **CLI module** — CommonArgs, StandardCommand, DfeApp trait (`cli` feature) |
47 | | -- [x] **Top module** — ratatui TUI dashboard, Prometheus parser, oneshot mode (`top` feature) |
48 | | -- [x] **CI gating fix** — Semantic Release now gated on CI success via workflow_run |
| 33 | +- [ ] Migrate dfe-loader to v2.0.0 (transport factory, remove boilerplate) |
| 34 | +- [ ] Migrate dfe-receiver to v2.0.0 (RoutedSender, transport factory) |
| 35 | +- [ ] Migrate dfe-archiver to v2.0.0 |
| 36 | +- [ ] Migrate dfe-fetcher to v2.0.0 |
| 37 | +- [ ] Migrate dfe-transform-wasm to v2.0.0 |
| 38 | +- [ ] Migrate dfe-transform-vrl to v2.0.0 |
| 39 | +- [ ] Audit hyperi-pylib and write alignment plan |
49 | 40 |
|
50 | 41 | --- |
51 | 42 |
|
52 | | -## Completed |
53 | | - |
54 | | -- [x] Vector compat integration tests — 6 tests using real Vector binary + VectorCompatClient (fetch-vector.sh + YAML) |
55 | | -- [x] vault_env integration tests fixed — clear_vault_env() prevents VAULT_TOKEN leakage |
56 | | -- [x] Dependency update sweep — all crates to latest, tonic/prost 0.14 migration (v1.8.4) |
57 | | -- [x] Stale hs-rustlib removed from JFrog hypersec-cargo-local and hyperi-cargo-local |
58 | | -- [x] MaskingLayer fixed — writer-based redaction for both JSON and text formats (v1.8.4) |
59 | | -- [x] Logger output capturing tests — 10 tests (JSON, text, filtering, masking) |
60 | | -- [x] Coloured log output — custom FormatEvent with owo-colors colour scheme |
61 | | -- [x] Metrics graceful shutdown tests — 4 tests (shutdown, rapid cycle, render after stop, concurrent) |
62 | | -- [x] gRPC transport integration tests — 8 tests (send/recv, ordering, large payload, compression) |
63 | | -- [x] gRPC transport with Vector wire protocol compatibility (v1.8.0) |
64 | | - - tonic-based gRPC replacing Zenoh transport |
65 | | - - DFE native proto (`dfe.transport.v1`) + vendored Vector proto |
66 | | - - Vector compat source/sink for migration from Vector pipelines |
67 | | - - build.rs for conditional proto code generation |
68 | | -- [x] Zenoh transport removed — replaced by gRPC (v1.8.0) |
69 | | -- [x] Version check module — startup check against releases.hyperi.io (v1.7.0) |
70 | | -- [x] Deployment validation module — Helm chart and Dockerfile contract checks (v1.7.0) |
71 | | -- [x] CI: ARC self-hosted runners enabled (v1.7.1–v1.8.3) |
72 | | -- [x] Clippy/formatting fixes — approx_constant lint, dprint float formatting (v1.8.1–v1.8.3) |
73 | | -- [x] Package rename: hs-rustlib -> hyperi-rustlib, published v1.4.3 to JFrog |
74 | | -- [x] Rebrand: HyperSec -> HyperI across source, docs, configs, workflows |
75 | | -- [x] Registry migration: hypersec registry -> hyperi registry |
76 | | -- [x] Submodule URLs: hypersec-io -> hyperi-io |
77 | | -- [x] CI config: .hypersec-ci.yaml -> .hyperi-ci.yaml |
78 | | -- [x] Directory-config store with git2 integration (v1.4.0) |
79 | | -- [x] OpenTelemetry metrics support (v1.4.0) |
80 | | -- [x] Secrets management module (OpenBao/Vault, AWS) (v1.3.x) |
81 | | -- [x] HTTP server module (axum-based) (v1.2.0) |
82 | | -- [x] Transport module (Kafka/Memory abstraction) |
83 | | -- [x] TieredSink module (disk spillover with circuit breaker) |
84 | | -- [x] Spool module (disk-backed queue) |
85 | | -- [x] Configuration module (7-layer cascade with figment) |
86 | | -- [x] Logger module (structured JSON, RFC3339, masking) |
87 | | -- [x] Metrics module (Prometheus + process/container) |
88 | | -- [x] Environment detection module |
89 | | -- [x] Runtime paths module (XDG + container awareness) |
90 | | -- [x] Dependency audit (serde_yml -> serde-yaml-ng, queue-file -> yaque, once_cell -> LazyLock) |
91 | | -- [x] Config cascade alignment with hyperi-pylib unified spec (v1.6.0) |
92 | | - - load_home_dotenv default false, app_name support, container/user config paths |
93 | | - - Created docs/CONFIG-CASCADE.md |
94 | | - - PG layer documented as built-for-not-with (YAML gitops covers current needs) |
| 43 | +### Completed This Session |
| 44 | + |
| 45 | +- [x] **Transport trait split** — `Transport` split into `TransportBase` + `TransportSender` + `TransportReceiver` with blanket `Transport` impl |
| 46 | +- [x] **Transport factory** — `AnySender` enum dispatch from config, `RoutedSender` for per-key dispatch (receiver/fetcher only) |
| 47 | +- [x] **File transport** — NDJSON with position tracking, commit persistence, rotation |
| 48 | +- [x] **Pipe transport** — stdin/stdout for Unix pipeline composition |
| 49 | +- [x] **HTTP transport** — POST send + embedded axum receive (bidirectional) |
| 50 | +- [x] **Redis/Valkey Streams transport** — XADD/XREADGROUP/XACK with consumer groups |
| 51 | +- [x] **HealthRegistry** — global singleton, modules auto-register health check closures, `/readyz` aggregates, `/health/detailed` JSON |
| 52 | +- [x] **Shutdown manager** — global CancellationToken, SIGTERM/SIGINT handler, modules listen on token |
| 53 | +- [x] **OTel trace propagation** — W3C traceparent auto-injected/extracted in gRPC, Kafka, HTTP transports |
| 54 | +- [x] **Universal metrics** — all modules auto-emit Prometheus metrics via global recorder |
| 55 | +- [x] **Logging + config cascade** — added to all new transports |
| 56 | +- [x] **Health wiring** — Kafka, gRPC, CircuitBreaker, HttpServer auto-register |
| 57 | +- [x] **Shutdown wiring** — HttpServer, TieredSink drainer, ConfigReloader listen on global token |
| 58 | +- [x] **KafkaTransport StatsContext** — always-on, rdkafka_* metrics auto-emitted |
| 59 | +- [x] **gRPC transport metrics** — dfe_transport_* parity with Kafka |
| 60 | +- [x] **HTTP client, database URL builders, cache modules** (v1.19.6) |
| 61 | +- [x] **Config registry, SensitiveString, /config endpoint** (v1.19.3-v1.19.5) |
| 62 | +- [x] **Dependency updates, cargo-audit ignores** (v1.19.6) |
95 | 63 |
|
96 | 64 | --- |
97 | 65 |
|
98 | | -## Backlog (P1 - Config Registry) |
99 | | - |
100 | | -### Reflectable Config Registry |
101 | | - |
102 | | -Central registry where every module registers its config section at startup. |
103 | | -Currently modules independently call `unmarshal_key()` — no visibility into |
104 | | -what config keys exist, their types, defaults, or descriptions. |
105 | | - |
106 | | -**Goal:** Any DFE app can list/dump/expose all available config sections. |
107 | | - |
108 | | -- [x] Auto-registration via `unmarshal_key_registered` — records `(key, type_name, defaults, effective)` in global registry. Zero code changes in downstream apps. |
109 | | -- [x] `registry::sections()` — list all registered sections |
110 | | -- [x] `registry::dump_effective()` — JSON map of effective values |
111 | | -- [x] `registry::dump_defaults()` — JSON map of defaults (via `T::default()`) |
112 | | -- [x] Heuristic auto-redaction (password, secret, token, key, credential, auth, private, cert, encryption) |
113 | | -- [x] `#[serde(skip_serializing)]` as additional layer for fields that should never appear |
114 | | -- [x] expression, memory, version_check, scaling, grpc, secrets wired with `from_cascade()` auto-register |
115 | | -- [x] Modules without defaults (tiered_sink, http_server, kafka, spool, dlq) use `unmarshal_key_registered` from downstream apps |
116 | | -- [x] `/config` admin endpoint (opt-in via `enable_config_endpoint`) — returns redacted effective + defaults JSON |
117 | | -- [x] Change notification (opt-in) — `registry::on_change(key, callback)` + `registry::update()` |
118 | | - - Modules that need hot-reload subscribe; others keep `OnceLock` (init-once) |
119 | | -- [x] `ConfigReloader.with_registry_update(key)` connects hot-reload to registry |
120 | | -- [x] `SensitiveString` type — compile-time safe, `Serialize` always redacts |
121 | | -- [x] 19 registry + 12 sensitive string tests covering all redaction guarantees |
122 | | -- [ ] Migrate all dfe-* and hyperi-* apps to `unmarshal_key_registered` pattern |
123 | | -- [ ] Align hyperi-pylib with same registry pattern |
124 | | - |
125 | | ---- |
126 | | - |
127 | | -## Backlog (P2 - from Gap Analysis) |
128 | | - |
129 | | -### Phase 1 - Core Enterprise |
130 | | - |
131 | | -- [ ] Database URL builders module (PostgreSQL, Redis) |
132 | | -- [ ] HTTP client module with retry middleware (reqwest-retry) |
| 66 | +## Backlog |
133 | 67 |
|
134 | 68 | ### Secrets Providers |
135 | 69 |
|
136 | | -- [ ] GCP Secret Manager provider (`secrets-gcp` feature, `google-cloud-secretmanager` crate) |
137 | | -- [ ] Azure Key Vault provider (`secrets-azure` feature, `azure_security_keyvault` crate) |
| 70 | +- [ ] GCP Secret Manager provider (`secrets-gcp` feature) |
| 71 | +- [ ] Azure Key Vault provider (`secrets-azure` feature) |
138 | 72 |
|
139 | | -### Phase 2 - Enhanced Features |
| 73 | +### Kafka — Opinionated SASL-SCRAM Named Constructors |
140 | 74 |
|
141 | | -- [ ] Cache module with disk/Redis backing |
142 | | -- [ ] CLI framework helpers (wrap Clap) |
| 75 | +- [ ] `KafkaConfig::external_sasl_scram(brokers, username, password)` |
| 76 | +- [ ] `KafkaConfig::internal_sasl_scram(brokers, username, password)` |
143 | 77 |
|
144 | | -### Phase 3 - Advanced |
| 78 | +### Other |
145 | 79 |
|
146 | | -- [ ] Standalone Kafka client (if transport layer insufficient) |
147 | 80 | - [ ] PII anonymiser (evaluate Rust libraries) |
148 | 81 | - [ ] Python bindings for ClickHouse client (PyO3) |
149 | 82 |
|
150 | | -### Kafka — Opinionated SASL-SCRAM Named Constructors |
151 | | - |
152 | | -- [ ] Add `KafkaConfig::external_sasl_scram(brokers, username, password)` — SASL_SSL + SCRAM-SHA-512 |
153 | | -- [ ] Add `KafkaConfig::internal_sasl_scram(brokers, username, password)` — SASL_PLAINTEXT + SCRAM-SHA-512 |
154 | | -- [ ] Encodes the decision once: SCRAM works unchanged on Apache Kafka, AutoMQ, MSK, Confluent Cloud |
155 | | -- [ ] Remove per-project manual assembly of protocol + sasl + tls fields in dfe-loader, dfe-receiver |
156 | | - |
157 | 83 | --- |
158 | 84 |
|
159 | 85 | ## Notes |
160 | 86 |
|
161 | 87 | - Use `CARGO_BUILD_JOBS=2` for all cargo commands |
162 | | -- Transport backends: Kafka, gRPC (native + Vector compat), Memory (Zenoh removed in v1.8.0) |
163 | | -- See docs/GAP_ANALYSIS.md for detailed comparison with hyperi-pylib |
164 | | -- See docs/CLICKHOUSE_PYTHON_BINDINGS.md for Python binding proposal |
| 88 | +- Transport backends: Kafka, gRPC, Memory, File, Pipe, HTTP, Redis/Valkey |
| 89 | +- Core pillars plan: `docs/superpowers/plans/2026-03-26-core-pillars.md` |
| 90 | +- Two deployment modes: Kafka-mediated (persistence) vs direct gRPC (low latency) |
| 91 | +- Routed transport is receiver/fetcher only — all other stages are 1:1 |
| 92 | +- Breaking change: `feat!:` commit triggers v2.0.0 via semantic-release |
0 commit comments