From aa2afbc4e20fd6ab0089662524ca800d6e4d527d Mon Sep 17 00:00:00 2001 From: Derek Date: Thu, 26 Mar 2026 14:53:38 +1100 Subject: [PATCH 1/7] chore: save session state [skip ci] --- STATE.md | 28 +++++++++ TODO.md | 187 +++++++++++++++++-------------------------------------- 2 files changed, 86 insertions(+), 129 deletions(-) diff --git a/STATE.md b/STATE.md index 99ee2ab..080b47e 100644 --- a/STATE.md +++ b/STATE.md @@ -108,6 +108,34 @@ CARGO_BUILD_JOBS=2 cargo clippy --- +## Core Pillars (Non-Negotiable Design Decision) + +Every module in hyperi-rustlib MUST auto-integrate with the core infrastructure +pillars using the **global singleton pattern**. Services get observability for +free — no handles passed, no opt-in, no extra code in downstream apps. + +| Pillar | Singleton | Module | Pattern | +|--------|-----------|--------|---------| +| Config | `OnceLock` | `config` | `T::from_cascade()` reads from global figment | +| Logging | Global `tracing` subscriber | `logger` | `tracing::info!()` macros — always available | +| Metrics | Global `metrics` recorder | `metrics` | `metrics::counter!()` macros — no-op if no recorder | +| Tracing | Global OTel subscriber | `otel` | Trace context auto-propagated (planned: always-on) | +| Health | Global `HealthState` | `http-server` | Unified readiness flag (planned: auto-wired) | +| Shutdown | `CancellationToken` | (planned) | Unified graceful shutdown (planned: auto-wired) | + +**Rule:** When adding ANY new module or feature to rustlib: +1. If it has configurable behaviour → load from cascade via `from_cascade()` +2. If it does I/O or processing → add `#[cfg(feature = "metrics")]` counters/gauges/histograms +3. If it can fail or has interesting state → add `tracing::` log calls +4. If it affects service health → report into unified `HealthState` + +**The goal:** A DFE app that does `MetricsManager::new("dfe_loader")` + +`logger::setup_default()` + `config::setup()` at startup gets full +observability across every rustlib feature it uses — transport, tiered-sink, +spool, cache, secrets, HTTP client, DLQ — with zero additional wiring. + +--- + ## Decisions - **Dynamic linking for C deps** — rdkafka, libgit2, zstd, zlib, openssl all link against system libs via pkg-config. Eliminates ~30min C++ build for rdkafka. aws-lc-sys is the one exception (AWS SDK hardcodes it, no opt-out). diff --git a/TODO.md b/TODO.md index c33c189..df51114 100644 --- a/TODO.md +++ b/TODO.md @@ -8,157 +8,86 @@ ## Current Tasks -### Kafka Transport Metrics Parity with gRPC `[NEXT]` +### Core Pillars Implementation `[NEXT]` + +Full plan: `docs/superpowers/plans/2026-03-26-core-pillars.md` + +**Phase 1: OTel Tracing Auto-Propagation** +- [ ] Auto-initialise OTel layer in logger when `otel` feature + `OTEL_EXPORTER_OTLP_ENDPOINT` set +- [ ] gRPC trace context propagation (tonic interceptors, `traceparent` header) +- [ ] Kafka trace context propagation (message headers) +- [ ] HTTP client trace context injection +- [ ] HTTP server trace context extraction + +**Phase 2: Unified HealthState** +- [ ] `src/health/` module with global `HealthRegistry` singleton +- [ ] `HealthComponent` trait — modules register at construction +- [ ] Wire transport, circuit breaker, config reloader into registry +- [ ] `/readyz` aggregates from `HealthRegistry::is_healthy()` +- [ ] `/health/detailed` JSON endpoint with per-component status + +**Phase 3: Unified Graceful Shutdown** +- [ ] `src/shutdown/` module with global `CancellationToken` +- [ ] SIGTERM/SIGINT → `token.cancel()` → all modules drain +- [ ] Wire http-server, tiered-sink, config-reloader, gRPC transport + +**Phase 4: New Transports** +- [ ] File transport (NDJSON, wraps existing `NdjsonWriter`) +- [ ] Pipe transport (stdin/stdout, newline-delimited) +- [ ] HTTP transport (POST to endpoint, uses `HttpClient`) +- [ ] Redis/Valkey Streams transport (`XADD`/`XREADGROUP`/`XACK`) + +**Phase 5: DLQ Transport Integration** +- [ ] DLQ Kafka backend uses `Box` instead of raw producer +- [ ] DLQ can write to any transport (file, HTTP, Redis, Kafka) + +**Phase 6: Always-On Defaults** +- [ ] Make config, logger, metrics, health, shutdown default features +- [ ] Downstream dfe-* app remediation (remove boilerplate) +- [ ] Audit hyperi-pylib and write alignment plan -gRPC transport (v1.19.7) auto-emits `dfe_transport_*` metrics. Kafka transport does not — two gaps: - -1. **Add metrics instrumentation to `KafkaTransport::send()`** — emit `dfe_transport_sent_total{transport="kafka"}`, `dfe_transport_send_errors_total{transport="kafka"}`, `dfe_transport_backpressured_total{transport="kafka"}`, and send duration histogram, matching gRPC parity. - -2. **Wire `StatsContext` into `KafkaTransport`** — currently `new_with_context()` is a stub that ignores the context. The struct uses `DefaultConsumerContext` / plain `FutureProducer`, so `statistics.interval.ms` callbacks (set to 1000ms by all profiles) go to a no-op. Need to either make the struct generic over context (complicates `Transport` trait impl) or use `StatsContext` by default when the `metrics` feature is enabled. This enables `rdkafka_broker_rtt_avg_seconds`, `rdkafka_global_msg_cnt`, `rdkafka_topic_partition_consumer_lag`, etc. - -Downstream impact: dfe-fetcher, dfe-receiver, dfe-loader all use `KafkaTransport` and would get these for free once wired. - -### Gap Analysis P2 — HTTP Client, Database URLs, Cache - -- [ ] HTTP client module with retry middleware (reqwest + reqwest-middleware + reqwest-retry) - - Wrap reqwest with exponential backoff, configurable timeouts - - Auto-register config via `unmarshal_key_registered` - - Metrics integration (request count, duration, errors) -- [ ] Database URL builders (PostgreSQL, ClickHouse, Redis) - - Build connection strings from env vars with standard prefixes - - `SensitiveString` for password fields -- [ ] Cache module with disk/memory backends - - Consolidate secrets cache pattern into reusable module - - TTL, stale-while-revalidate, size bounds +--- ### Completed Recent -- [x] **Config registry** (v1.19.3-v1.19.5) — auto-registering reflectable config, `/config` admin endpoint, `SensitiveString`, heuristic redaction, change notification, `ConfigReloader` hook -- [x] **CEL expression profile** (v1.19.2) — `matches()` blocked by default, `ProfileConfig` with per-category overrides, string literal false-positive prevention -- [x] **Config cascade wiring** (v1.19.2) — expression, memory, version_check, scaling, grpc, secrets auto-read from figment cascade +- [x] **Universal metrics instrumentation** (v1.19.8) — tiered-sink, spool, dlq, cache, http-client, secrets all auto-emit Prometheus metrics via global singleton. Core pillar design decision documented in CLAUDE.md. +- [x] **Kafka transport metrics + StatsContext** (v1.19.8) — `KafkaTransport` always uses `StatsContext` for consumer and producer. `dfe_transport_*` metrics on `send()`. `rdkafka_*` metrics auto-emitted. Zero downstream code changes. +- [x] **gRPC transport metrics** (v1.19.7) — `dfe_transport_*` metrics on send/recv. Server push handler uses `try_send` with backpressure status codes. +- [x] **HTTP client module** (v1.19.6) — reqwest + reqwest-middleware + reqwest-retry, exponential backoff, config cascade +- [x] **Database URL builders** (v1.19.6) — PostgreSQL, ClickHouse, Redis/Valkey, MongoDB. Display trait redacts passwords. +- [x] **Cache module** (v1.19.6) — moka-backed concurrent in-memory cache, per-source TTL, source isolation +- [x] **Dependency update** (v1.19.6) — all deps to latest, cargo-audit ignores for transitive advisories +- [x] **Config registry** (v1.19.3-v1.19.5) — auto-registering reflectable config, `/config` admin endpoint, `SensitiveString`, heuristic redaction, change notification +- [x] **CEL expression profile** (v1.19.2) — `matches()` blocked by default, `ProfileConfig` with per-category overrides +- [x] **Config cascade wiring** (v1.19.2) — expression, memory, version_check, scaling, grpc, secrets auto-read from cascade - [x] **MemoryGuard underflow fix** (v1.19.1) — `fetch_sub` replaced with `fetch_update` + `saturating_sub` -- [x] **Test restructure** (v1.19.1) — `tests/integration/`, `tests/e2e/`, `tests/common/` per testing standard +- [x] **Test restructure** (v1.19.1) — `tests/integration/`, `tests/e2e/`, `tests/common/` - [x] **hyperi-ci release-merge** — CLI command replaces per-project workflow files -- [x] **Rust edition 2024** — migrated from 2021; `temp-env` replaces unsafe `set_var`/`remove_var` in tests across 6 files -- [x] **async-trait removal** — public traits (`Sink`, `Transport`, `SecretProvider`) now use `fn ... -> impl Future + Send` (Rust 1.75+ native) -- [x] **kafka_config module** — `config_from_file`, 7 named profiles, `merge_with_overrides`; librdkafka settings loaded from config git dir (only cascade exception) -- [x] **File output sink** — `src/io/`, `src/output/`, `output-file` feature -- [x] **CLI module** — CommonArgs, StandardCommand, DfeApp trait (`cli` feature) -- [x] **Top module** — ratatui TUI dashboard, Prometheus parser, oneshot mode (`top` feature) -- [x] **CI gating fix** — Semantic Release now gated on CI success via workflow_run - ---- - -## Completed - -- [x] Vector compat integration tests — 6 tests using real Vector binary + VectorCompatClient (fetch-vector.sh + YAML) -- [x] vault_env integration tests fixed — clear_vault_env() prevents VAULT_TOKEN leakage -- [x] Dependency update sweep — all crates to latest, tonic/prost 0.14 migration (v1.8.4) -- [x] Stale hs-rustlib removed from JFrog hypersec-cargo-local and hyperi-cargo-local -- [x] MaskingLayer fixed — writer-based redaction for both JSON and text formats (v1.8.4) -- [x] Logger output capturing tests — 10 tests (JSON, text, filtering, masking) -- [x] Coloured log output — custom FormatEvent with owo-colors colour scheme -- [x] Metrics graceful shutdown tests — 4 tests (shutdown, rapid cycle, render after stop, concurrent) -- [x] gRPC transport integration tests — 8 tests (send/recv, ordering, large payload, compression) -- [x] gRPC transport with Vector wire protocol compatibility (v1.8.0) - - tonic-based gRPC replacing Zenoh transport - - DFE native proto (`dfe.transport.v1`) + vendored Vector proto - - Vector compat source/sink for migration from Vector pipelines - - build.rs for conditional proto code generation -- [x] Zenoh transport removed — replaced by gRPC (v1.8.0) -- [x] Version check module — startup check against releases.hyperi.io (v1.7.0) -- [x] Deployment validation module — Helm chart and Dockerfile contract checks (v1.7.0) -- [x] CI: ARC self-hosted runners enabled (v1.7.1–v1.8.3) -- [x] Clippy/formatting fixes — approx_constant lint, dprint float formatting (v1.8.1–v1.8.3) -- [x] Package rename: hs-rustlib -> hyperi-rustlib, published v1.4.3 to JFrog -- [x] Rebrand: HyperSec -> HyperI across source, docs, configs, workflows -- [x] Registry migration: hypersec registry -> hyperi registry -- [x] Submodule URLs: hypersec-io -> hyperi-io -- [x] CI config: .hypersec-ci.yaml -> .hyperi-ci.yaml -- [x] Directory-config store with git2 integration (v1.4.0) -- [x] OpenTelemetry metrics support (v1.4.0) -- [x] Secrets management module (OpenBao/Vault, AWS) (v1.3.x) -- [x] HTTP server module (axum-based) (v1.2.0) -- [x] Transport module (Kafka/Memory abstraction) -- [x] TieredSink module (disk spillover with circuit breaker) -- [x] Spool module (disk-backed queue) -- [x] Configuration module (7-layer cascade with figment) -- [x] Logger module (structured JSON, RFC3339, masking) -- [x] Metrics module (Prometheus + process/container) -- [x] Environment detection module -- [x] Runtime paths module (XDG + container awareness) -- [x] Dependency audit (serde_yml -> serde-yaml-ng, queue-file -> yaque, once_cell -> LazyLock) -- [x] Config cascade alignment with hyperi-pylib unified spec (v1.6.0) - - load_home_dotenv default false, app_name support, container/user config paths - - Created docs/CONFIG-CASCADE.md - - PG layer documented as built-for-not-with (YAML gitops covers current needs) --- -## Backlog (P1 - Config Registry) - -### Reflectable Config Registry - -Central registry where every module registers its config section at startup. -Currently modules independently call `unmarshal_key()` — no visibility into -what config keys exist, their types, defaults, or descriptions. - -**Goal:** Any DFE app can list/dump/expose all available config sections. - -- [x] Auto-registration via `unmarshal_key_registered` — records `(key, type_name, defaults, effective)` in global registry. Zero code changes in downstream apps. -- [x] `registry::sections()` — list all registered sections -- [x] `registry::dump_effective()` — JSON map of effective values -- [x] `registry::dump_defaults()` — JSON map of defaults (via `T::default()`) -- [x] Heuristic auto-redaction (password, secret, token, key, credential, auth, private, cert, encryption) -- [x] `#[serde(skip_serializing)]` as additional layer for fields that should never appear -- [x] expression, memory, version_check, scaling, grpc, secrets wired with `from_cascade()` auto-register -- [x] Modules without defaults (tiered_sink, http_server, kafka, spool, dlq) use `unmarshal_key_registered` from downstream apps -- [x] `/config` admin endpoint (opt-in via `enable_config_endpoint`) — returns redacted effective + defaults JSON -- [x] Change notification (opt-in) — `registry::on_change(key, callback)` + `registry::update()` - - Modules that need hot-reload subscribe; others keep `OnceLock` (init-once) -- [x] `ConfigReloader.with_registry_update(key)` connects hot-reload to registry -- [x] `SensitiveString` type — compile-time safe, `Serialize` always redacts -- [x] 19 registry + 12 sensitive string tests covering all redaction guarantees -- [ ] Migrate all dfe-* and hyperi-* apps to `unmarshal_key_registered` pattern -- [ ] Align hyperi-pylib with same registry pattern - ---- - -## Backlog (P2 - from Gap Analysis) - -### Phase 1 - Core Enterprise - -- [ ] Database URL builders module (PostgreSQL, Redis) -- [ ] HTTP client module with retry middleware (reqwest-retry) +## Backlog ### Secrets Providers -- [ ] GCP Secret Manager provider (`secrets-gcp` feature, `google-cloud-secretmanager` crate) -- [ ] Azure Key Vault provider (`secrets-azure` feature, `azure_security_keyvault` crate) +- [ ] GCP Secret Manager provider (`secrets-gcp` feature) +- [ ] Azure Key Vault provider (`secrets-azure` feature) -### Phase 2 - Enhanced Features +### Kafka — Opinionated SASL-SCRAM Named Constructors -- [ ] Cache module with disk/Redis backing -- [ ] CLI framework helpers (wrap Clap) +- [ ] `KafkaConfig::external_sasl_scram(brokers, username, password)` — SASL_SSL + SCRAM-SHA-512 +- [ ] `KafkaConfig::internal_sasl_scram(brokers, username, password)` — SASL_PLAINTEXT + SCRAM-SHA-512 -### Phase 3 - Advanced +### Other -- [ ] Standalone Kafka client (if transport layer insufficient) - [ ] PII anonymiser (evaluate Rust libraries) - [ ] Python bindings for ClickHouse client (PyO3) -### Kafka — Opinionated SASL-SCRAM Named Constructors - -- [ ] Add `KafkaConfig::external_sasl_scram(brokers, username, password)` — SASL_SSL + SCRAM-SHA-512 -- [ ] Add `KafkaConfig::internal_sasl_scram(brokers, username, password)` — SASL_PLAINTEXT + SCRAM-SHA-512 -- [ ] Encodes the decision once: SCRAM works unchanged on Apache Kafka, AutoMQ, MSK, Confluent Cloud -- [ ] Remove per-project manual assembly of protocol + sasl + tls fields in dfe-loader, dfe-receiver - --- ## Notes - Use `CARGO_BUILD_JOBS=2` for all cargo commands -- Transport backends: Kafka, gRPC (native + Vector compat), Memory (Zenoh removed in v1.8.0) +- Transport backends: Kafka, gRPC (native + Vector compat), Memory +- Core pillars plan: `docs/superpowers/plans/2026-03-26-core-pillars.md` - See docs/GAP_ANALYSIS.md for detailed comparison with hyperi-pylib -- See docs/CLICKHOUSE_PYTHON_BINDINGS.md for Python binding proposal From 36c383d8bcdf96a120bf238cc45e629be984aa47 Mon Sep 17 00:00:00 2001 From: Derek Date: Thu, 26 Mar 2026 16:12:54 +1100 Subject: [PATCH 2/7] feat!: split Transport trait and add 4 new transports + factory BREAKING CHANGE: Transport trait split into TransportBase (close, is_healthy, name), TransportSender (send), and TransportReceiver (recv, commit, Token). Blanket Transport impl for types with both. New transport backends: - File: NDJSON with position tracking and commit persistence - Pipe: stdin/stdout for Unix pipeline composition - HTTP: POST to endpoint (send) + embedded axum server (receive) - Redis/Valkey Streams: XADD/XREADGROUP/XACK with consumer groups Transport factory: - AnySender: enum dispatch for runtime transport selection - AnySender::from_config(): create sender from config cascade - RoutedSender: per-key dispatch for data originators (receiver/fetcher) All transports auto-emit dfe_transport_* Prometheus metrics. 648 tests pass. --- Cargo.toml | 11 +- src/transport/factory.rs | 254 ++++++++++++ src/transport/file.rs | 500 +++++++++++++++++++++++ src/transport/grpc/mod.rs | 64 +-- src/transport/http.rs | 681 +++++++++++++++++++++++++++++++ src/transport/kafka/mod.rs | 38 +- src/transport/memory/mod.rs | 44 +- src/transport/mod.rs | 96 +++-- src/transport/pipe.rs | 333 +++++++++++++++ src/transport/redis_transport.rs | 557 +++++++++++++++++++++++++ src/transport/routed.rs | 250 ++++++++++++ src/transport/traits.rs | 84 ++-- src/transport/types.rs | 54 ++- tests/e2e/grpc_transport.rs | 2 +- tests/e2e/kafka.rs | 2 +- tests/e2e/vector_compat.rs | 2 +- 16 files changed, 2828 insertions(+), 144 deletions(-) create mode 100644 src/transport/factory.rs create mode 100644 src/transport/file.rs create mode 100644 src/transport/http.rs create mode 100644 src/transport/pipe.rs create mode 100644 src/transport/redis_transport.rs create mode 100644 src/transport/routed.rs diff --git a/Cargo.toml b/Cargo.toml index 1e74e63..3e07589 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -91,7 +91,11 @@ transport-memory = ["transport"] transport-kafka = ["transport", "rdkafka"] transport-grpc = ["transport", "dep:tonic", "dep:tonic-prost", "dep:prost", "dep:prost-types", "dep:tonic-prost-build", "dep:prost-build"] transport-grpc-vector-compat = ["transport-grpc"] -transport-all = ["transport-memory", "transport-kafka", "transport-grpc"] +transport-file = ["transport", "io"] +transport-pipe = ["transport"] +transport-http = ["transport", "http"] +transport-redis = ["transport", "redis"] +transport-all = ["transport-memory", "transport-kafka", "transport-grpc", "transport-file", "transport-pipe", "transport-http", "transport-redis"] # Secrets management secrets = ["tokio", "serde_json", "async-trait", "parking_lot", "base64", "dirs", "tracing"] @@ -155,7 +159,7 @@ metrics-exporter-opentelemetry = { version = ">=0.2.1, <0.3", optional = true } sysinfo = { version = ">=0.38.0, <0.39", optional = true } # Async runtime (for metrics server, http-server) -tokio = { version = ">=1.50.0, <2", features = ["rt-multi-thread", "net", "sync", "time", "macros", "signal", "fs"], optional = true } +tokio = { version = ">=1.50.0, <2", features = ["rt-multi-thread", "net", "sync", "time", "macros", "signal", "fs", "io-std", "io-util"], optional = true } # HTTP client — pinned to reqwest 0.12 until vaultrs and opentelemetry-otlp # support 0.13. reqwest-middleware 0.4 and reqwest-retry 0.7 target 0.12. @@ -183,6 +187,9 @@ rmp-serde = { version = ">=1.3.1, <2", optional = true } # Kafka transport (dynamic-linking: use system librdkafka instead of compiling C++ from source) rdkafka = { version = ">=0.39.0, <0.40", features = ["dynamic-linking"], optional = true } +# Redis/Valkey Streams transport +redis = { version = ">=1.0, <2", features = ["tokio-comp", "streams"], optional = true } + # gRPC transport (tonic + prost) tonic = { version = ">=0.14, <0.15", features = ["gzip"], optional = true } tonic-prost = { version = ">=0.14.5, <0.15", optional = true } diff --git a/src/transport/factory.rs b/src/transport/factory.rs new file mode 100644 index 0000000..2808ac7 --- /dev/null +++ b/src/transport/factory.rs @@ -0,0 +1,254 @@ +// Project: hyperi-rustlib +// File: src/transport/factory.rs +// Purpose: Transport factory — create senders from config +// Language: Rust +// +// License: FSL-1.1-ALv2 +// Copyright: (c) 2026 HYPERI PTY LIMITED + +//! Transport factory for runtime transport selection. +//! +//! Creates transport senders from configuration, enabling apps to swap +//! between Kafka, gRPC, file, pipe, HTTP, or Redis via config change. +//! +//! # Usage +//! +//! ```yaml +//! # settings.yaml +//! transport: +//! output: +//! type: kafka +//! kafka: +//! brokers: ["kafka:9092"] +//! ``` +//! +//! ```rust,ignore +//! use hyperi_rustlib::transport::factory::AnySender; +//! +//! let sender = AnySender::from_config("transport.output").await?; +//! sender.send("events.land", payload).await; +//! ``` + +use super::error::{TransportError, TransportResult}; +use super::traits::{TransportBase, TransportSender}; +use super::types::{SendResult, TransportType}; + +/// Type-erased transport sender. +/// +/// Wraps any concrete transport sender behind an enum for runtime +/// dispatch. Created by the transport factory from config. +/// +/// Uses enum dispatch (not trait objects) because `TransportSender` +/// has `impl Future` return types which prevent `dyn` dispatch. +pub enum AnySender { + #[cfg(feature = "transport-kafka")] + Kafka(super::kafka::KafkaTransport), + + #[cfg(feature = "transport-grpc")] + Grpc(super::grpc::GrpcTransport), + + #[cfg(feature = "transport-memory")] + Memory(super::memory::MemoryTransport), + + #[cfg(feature = "transport-pipe")] + Pipe(super::pipe::PipeTransport), + + #[cfg(feature = "transport-file")] + File(super::file::FileTransport), + + #[cfg(feature = "transport-http")] + Http(super::http::HttpTransport), + + #[cfg(feature = "transport-redis")] + Redis(super::redis_transport::RedisTransport), +} + +impl TransportBase for AnySender { + async fn close(&self) -> TransportResult<()> { + match self { + #[cfg(feature = "transport-kafka")] + Self::Kafka(t) => t.close().await, + #[cfg(feature = "transport-grpc")] + Self::Grpc(t) => t.close().await, + #[cfg(feature = "transport-memory")] + Self::Memory(t) => t.close().await, + #[cfg(feature = "transport-pipe")] + Self::Pipe(t) => t.close().await, + #[cfg(feature = "transport-file")] + Self::File(t) => t.close().await, + #[cfg(feature = "transport-http")] + Self::Http(t) => t.close().await, + #[cfg(feature = "transport-redis")] + Self::Redis(t) => t.close().await, + } + } + + fn is_healthy(&self) -> bool { + match self { + #[cfg(feature = "transport-kafka")] + Self::Kafka(t) => t.is_healthy(), + #[cfg(feature = "transport-grpc")] + Self::Grpc(t) => t.is_healthy(), + #[cfg(feature = "transport-memory")] + Self::Memory(t) => t.is_healthy(), + #[cfg(feature = "transport-pipe")] + Self::Pipe(t) => t.is_healthy(), + #[cfg(feature = "transport-file")] + Self::File(t) => t.is_healthy(), + #[cfg(feature = "transport-http")] + Self::Http(t) => t.is_healthy(), + #[cfg(feature = "transport-redis")] + Self::Redis(t) => t.is_healthy(), + } + } + + fn name(&self) -> &'static str { + match self { + #[cfg(feature = "transport-kafka")] + Self::Kafka(t) => t.name(), + #[cfg(feature = "transport-grpc")] + Self::Grpc(t) => t.name(), + #[cfg(feature = "transport-memory")] + Self::Memory(t) => t.name(), + #[cfg(feature = "transport-pipe")] + Self::Pipe(t) => t.name(), + #[cfg(feature = "transport-file")] + Self::File(t) => t.name(), + #[cfg(feature = "transport-http")] + Self::Http(t) => t.name(), + #[cfg(feature = "transport-redis")] + Self::Redis(t) => t.name(), + } + } +} + +impl TransportSender for AnySender { + async fn send(&self, key: &str, payload: &[u8]) -> SendResult { + match self { + #[cfg(feature = "transport-kafka")] + Self::Kafka(t) => t.send(key, payload).await, + #[cfg(feature = "transport-grpc")] + Self::Grpc(t) => t.send(key, payload).await, + #[cfg(feature = "transport-memory")] + Self::Memory(t) => t.send(key, payload).await, + #[cfg(feature = "transport-pipe")] + Self::Pipe(t) => t.send(key, payload).await, + #[cfg(feature = "transport-file")] + Self::File(t) => t.send(key, payload).await, + #[cfg(feature = "transport-http")] + Self::Http(t) => t.send(key, payload).await, + #[cfg(feature = "transport-redis")] + Self::Redis(t) => t.send(key, payload).await, + } + } +} + +impl AnySender { + /// Create a sender from config cascade. + /// + /// Reads the transport config from the given key in the config + /// cascade and creates the appropriate sender. + /// + /// # Example config + /// + /// ```yaml + /// transport: + /// output: + /// type: kafka + /// kafka: + /// brokers: ["kafka:9092"] + /// ``` + /// + /// ```rust,ignore + /// let sender = AnySender::from_config("transport.output").await?; + /// ``` + pub async fn from_config(key: &str) -> TransportResult { + #[cfg(feature = "config")] + let config = { + let cfg = crate::config::try_get() + .ok_or_else(|| TransportError::Config("config not initialised".into()))?; + cfg.unmarshal_key::(key) + .map_err(|e| TransportError::Config(format!("failed to read {key}: {e}")))? + }; + + #[cfg(not(feature = "config"))] + let config = super::TransportConfig::default(); + + Self::from_transport_config(&config).await + } + + /// Create a sender from an explicit `TransportConfig`. + pub async fn from_transport_config(config: &super::TransportConfig) -> TransportResult { + match config.transport_type { + #[cfg(feature = "transport-kafka")] + TransportType::Kafka => { + let kafka_config = config + .kafka + .as_ref() + .ok_or_else(|| TransportError::Config("kafka config missing".into()))?; + let transport = super::kafka::KafkaTransport::new(kafka_config).await?; + Ok(Self::Kafka(transport)) + } + + #[cfg(feature = "transport-grpc")] + TransportType::Grpc => { + let grpc_config = config + .grpc + .as_ref() + .ok_or_else(|| TransportError::Config("grpc config missing".into()))?; + let transport = super::grpc::GrpcTransport::new(grpc_config).await?; + Ok(Self::Grpc(transport)) + } + + #[cfg(feature = "transport-memory")] + TransportType::Memory => { + let memory_config = config.memory.clone().unwrap_or_default(); + let transport = super::memory::MemoryTransport::new(&memory_config); + Ok(Self::Memory(transport)) + } + + #[cfg(feature = "transport-pipe")] + TransportType::Pipe => { + let pipe_config = config.pipe.clone().unwrap_or_default(); + let transport = super::pipe::PipeTransport::new(&pipe_config); + Ok(Self::Pipe(transport)) + } + + #[cfg(feature = "transport-file")] + TransportType::File => { + let file_config = config + .file + .as_ref() + .ok_or_else(|| TransportError::Config("file config missing".into()))?; + let transport = super::file::FileTransport::new(file_config).await?; + Ok(Self::File(transport)) + } + + #[cfg(feature = "transport-http")] + TransportType::Http => { + let http_config = config + .http + .as_ref() + .ok_or_else(|| TransportError::Config("http config missing".into()))?; + let transport = super::http::HttpTransport::new(http_config).await?; + Ok(Self::Http(transport)) + } + + #[cfg(feature = "transport-redis")] + TransportType::Redis => { + let redis_config = config + .redis + .as_ref() + .ok_or_else(|| TransportError::Config("redis config missing".into()))?; + let transport = super::redis_transport::RedisTransport::new(redis_config).await?; + Ok(Self::Redis(transport)) + } + + // Transport types for modules not yet implemented + #[allow(unreachable_patterns)] + other => Err(TransportError::Config(format!( + "transport type '{other}' is not available (feature not enabled or not yet implemented)" + ))), + } + } +} diff --git a/src/transport/file.rs b/src/transport/file.rs new file mode 100644 index 0000000..6f02ea4 --- /dev/null +++ b/src/transport/file.rs @@ -0,0 +1,500 @@ +// Project: hyperi-rustlib +// File: src/transport/file.rs +// Purpose: NDJSON file transport +// Language: Rust +// +// License: FSL-1.1-ALv2 +// Copyright: (c) 2026 HYPERI PTY LIMITED + +//! # File Transport +//! +//! NDJSON (newline-delimited JSON) file transport for debugging, audit +//! trails, and replay. Wraps async file I/O behind the Transport traits. +//! +//! ## Send +//! +//! Appends one NDJSON line per `send()` call to the configured file path. +//! +//! ## Receive +//! +//! Reads NDJSON lines from the file, tracking byte offset for commit. +//! Position is persisted to a `.pos` sidecar file so reads survive restarts. +//! +//! ## Example +//! +//! ```rust,ignore +//! use hyperi_rustlib::transport::file::{FileTransport, FileTransportConfig}; +//! +//! let config = FileTransportConfig { path: "/tmp/events.ndjson".into(), append: true }; +//! let transport = FileTransport::new(&config).await?; +//! transport.send("events", b"{\"msg\":\"hello\"}").await; +//! ``` + +use super::error::{TransportError, TransportResult}; +use super::traits::{CommitToken, TransportBase, TransportReceiver, TransportSender}; +use super::types::{Message, PayloadFormat, SendResult}; +use serde::{Deserialize, Serialize}; +use std::path::{Path, PathBuf}; +use std::sync::atomic::{AtomicBool, Ordering}; +use tokio::io::{AsyncBufReadExt, AsyncSeekExt, AsyncWriteExt, BufReader}; +use tokio::sync::Mutex; + +/// Commit token for file transport. +/// +/// Contains the byte offset in the file after reading the line. +#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] +pub struct FileToken { + /// Byte offset after the line was read. + pub offset: u64, +} + +impl CommitToken for FileToken {} + +impl std::fmt::Display for FileToken { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + write!(f, "file:{}", self.offset) + } +} + +/// Configuration for file transport. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct FileTransportConfig { + /// File path for read/write. + pub path: String, + + /// Append mode (default true for send). + #[serde(default = "default_append")] + pub append: bool, +} + +fn default_append() -> bool { + true +} + +impl Default for FileTransportConfig { + fn default() -> Self { + Self { + path: String::new(), + append: true, + } + } +} + +/// Internal state for the write side. +struct WriteState { + file: tokio::fs::File, +} + +/// Internal state for the read side. +struct ReadState { + reader: BufReader, + offset: u64, + line_buf: String, +} + +/// NDJSON file transport. +/// +/// Supports both send (append) and receive (sequential read with +/// position tracking). Position is persisted to a `.pos` sidecar +/// file so reads survive process restarts. +pub struct FileTransport { + config: FileTransportConfig, + writer: Mutex>, + reader: Mutex>, + closed: AtomicBool, + sequence: std::sync::atomic::AtomicU64, +} + +impl FileTransport { + /// Create a new file transport. + /// + /// # Errors + /// + /// Returns error if the file path is empty. + pub async fn new(config: &FileTransportConfig) -> TransportResult { + if config.path.is_empty() { + return Err(TransportError::Config("file path is empty".into())); + } + + Ok(Self { + config: config.clone(), + writer: Mutex::new(None), + reader: Mutex::new(None), + closed: AtomicBool::new(false), + sequence: std::sync::atomic::AtomicU64::new(0), + }) + } + + /// Path to the `.pos` sidecar file that tracks read position. + fn pos_path(data_path: &Path) -> PathBuf { + let mut pos_path = data_path.as_os_str().to_owned(); + pos_path.push(".pos"); + PathBuf::from(pos_path) + } + + /// Load committed read position from the sidecar file. + async fn load_position(data_path: &Path) -> u64 { + let pos_path = Self::pos_path(data_path); + match tokio::fs::read_to_string(&pos_path).await { + Ok(content) => content.trim().parse::().unwrap_or(0), + Err(_) => 0, + } + } + + /// Save read position to the sidecar file. + async fn save_position(data_path: &Path, offset: u64) -> TransportResult<()> { + let pos_path = Self::pos_path(data_path); + tokio::fs::write(&pos_path, offset.to_string()) + .await + .map_err(|e| TransportError::Commit(format!("failed to write position file: {e}"))) + } + + /// Lazily open the write file handle. + async fn ensure_writer(&self) -> TransportResult<()> { + let mut guard = self.writer.lock().await; + if guard.is_none() { + let file = tokio::fs::OpenOptions::new() + .create(true) + .append(self.config.append) + .write(true) + .open(&self.config.path) + .await + .map_err(|e| { + TransportError::Connection(format!( + "failed to open '{}' for writing: {e}", + self.config.path + )) + })?; + *guard = Some(WriteState { file }); + } + Ok(()) + } + + /// Lazily open the read file handle and seek to committed position. + async fn ensure_reader(&self) -> TransportResult<()> { + let mut guard = self.reader.lock().await; + if guard.is_none() { + let path = Path::new(&self.config.path); + + // If the file does not exist yet, there is nothing to read + if !path.exists() { + return Err(TransportError::Recv(format!( + "file '{}' does not exist", + self.config.path + ))); + } + + let offset = Self::load_position(path).await; + let mut file = tokio::fs::File::open(&self.config.path) + .await + .map_err(|e| { + TransportError::Connection(format!( + "failed to open '{}' for reading: {e}", + self.config.path + )) + })?; + + // Seek to committed position + file.seek(std::io::SeekFrom::Start(offset)) + .await + .map_err(|e| { + TransportError::Recv(format!("failed to seek to offset {offset}: {e}")) + })?; + + *guard = Some(ReadState { + reader: BufReader::new(file), + offset, + line_buf: String::with_capacity(4096), + }); + } + Ok(()) + } +} + +impl TransportBase for FileTransport { + async fn close(&self) -> TransportResult<()> { + self.closed.store(true, Ordering::Relaxed); + + // Flush and drop writer + if let Some(mut state) = self.writer.lock().await.take() { + let _ = state.file.flush().await; + } + + // Drop reader + let _ = self.reader.lock().await.take(); + + Ok(()) + } + + fn is_healthy(&self) -> bool { + !self.closed.load(Ordering::Relaxed) + } + + fn name(&self) -> &'static str { + "file" + } +} + +impl TransportSender for FileTransport { + async fn send(&self, _key: &str, payload: &[u8]) -> SendResult { + if self.closed.load(Ordering::Relaxed) { + return SendResult::Fatal(TransportError::Closed); + } + + if let Err(e) = self.ensure_writer().await { + return SendResult::Fatal(e); + } + + let mut guard = self.writer.lock().await; + let Some(state) = guard.as_mut() else { + return SendResult::Fatal(TransportError::Internal("writer not initialised".into())); + }; + + // Write payload + newline as a single operation + if let Err(e) = state.file.write_all(payload).await { + return SendResult::Fatal(TransportError::Send(format!("write failed: {e}"))); + } + if let Err(e) = state.file.write_all(b"\n").await { + return SendResult::Fatal(TransportError::Send(format!("write newline failed: {e}"))); + } + if let Err(e) = state.file.flush().await { + return SendResult::Fatal(TransportError::Send(format!("flush failed: {e}"))); + } + + #[cfg(feature = "metrics")] + metrics::counter!("dfe_transport_sent_total", "transport" => "file").increment(1); + + SendResult::Ok + } +} + +impl TransportReceiver for FileTransport { + type Token = FileToken; + + async fn recv(&self, max: usize) -> TransportResult>> { + if self.closed.load(Ordering::Relaxed) { + return Err(TransportError::Closed); + } + + self.ensure_reader().await?; + + let mut guard = self.reader.lock().await; + let state = guard + .as_mut() + .ok_or_else(|| TransportError::Internal("reader not initialised".into()))?; + + let mut messages = Vec::with_capacity(max.min(100)); + + for _ in 0..max { + state.line_buf.clear(); + let bytes_read = state + .reader + .read_line(&mut state.line_buf) + .await + .map_err(|e| TransportError::Recv(format!("read failed: {e}")))?; + + if bytes_read == 0 { + // EOF + break; + } + + state.offset += bytes_read as u64; + + // Strip trailing newline + let line = state.line_buf.trim_end_matches('\n').trim_end_matches('\r'); + if line.is_empty() { + continue; + } + + let payload = line.as_bytes().to_vec(); + let format = PayloadFormat::detect(&payload); + let _seq = self.sequence.fetch_add(1, Ordering::Relaxed); + let timestamp_ms = chrono::Utc::now().timestamp_millis(); + + messages.push(Message { + key: None, + payload, + token: FileToken { + offset: state.offset, + }, + timestamp_ms: Some(timestamp_ms), + format, + }); + } + + #[cfg(feature = "metrics")] + if !messages.is_empty() { + metrics::counter!("dfe_transport_sent_total", "transport" => "file") + .increment(messages.len() as u64); + } + + Ok(messages) + } + + async fn commit(&self, tokens: &[Self::Token]) -> TransportResult<()> { + if let Some(max_token) = tokens.iter().max_by_key(|t| t.offset) { + let path = Path::new(&self.config.path); + Self::save_position(path, max_token.offset).await?; + } + Ok(()) + } +} + +#[cfg(test)] +mod tests { + use super::*; + use tempfile::TempDir; + + async fn make_transport(dir: &TempDir, filename: &str) -> FileTransport { + let path = dir.path().join(filename); + let config = FileTransportConfig { + path: path.to_str().unwrap().to_string(), + append: true, + }; + FileTransport::new(&config).await.unwrap() + } + + #[tokio::test] + async fn send_and_receive() { + let dir = TempDir::new().unwrap(); + let path = dir.path().join("test.ndjson"); + let path_str = path.to_str().unwrap().to_string(); + + // Write messages + let config = FileTransportConfig { + path: path_str.clone(), + append: true, + }; + let sender = FileTransport::new(&config).await.unwrap(); + + let r1 = sender.send("key", b"{\"msg\":\"hello\"}").await; + assert!(r1.is_ok()); + let r2 = sender.send("key", b"{\"msg\":\"world\"}").await; + assert!(r2.is_ok()); + sender.close().await.unwrap(); + + // Read messages back + let reader_config = FileTransportConfig { + path: path_str, + append: true, + }; + let reader = FileTransport::new(&reader_config).await.unwrap(); + let messages = reader.recv(10).await.unwrap(); + + assert_eq!(messages.len(), 2); + assert_eq!(messages[0].payload, b"{\"msg\":\"hello\"}"); + assert_eq!(messages[1].payload, b"{\"msg\":\"world\"}"); + + // Tokens should have increasing offsets + assert!(messages[1].token.offset > messages[0].token.offset); + } + + #[tokio::test] + async fn commit_persists_position() { + let dir = TempDir::new().unwrap(); + let path = dir.path().join("commit_test.ndjson"); + let path_str = path.to_str().unwrap().to_string(); + + // Write 3 messages + let config = FileTransportConfig { + path: path_str.clone(), + append: true, + }; + let sender = FileTransport::new(&config).await.unwrap(); + sender.send("k", b"line1").await; + sender.send("k", b"line2").await; + sender.send("k", b"line3").await; + sender.close().await.unwrap(); + + // Read first 2 messages and commit + let r1 = FileTransport::new(&FileTransportConfig { + path: path_str.clone(), + append: true, + }) + .await + .unwrap(); + let msgs = r1.recv(2).await.unwrap(); + assert_eq!(msgs.len(), 2); + assert_eq!(msgs[0].payload, b"line1"); + assert_eq!(msgs[1].payload, b"line2"); + + // Commit up to message 2 + let tokens: Vec<_> = msgs.iter().map(|m| m.token).collect(); + r1.commit(&tokens).await.unwrap(); + r1.close().await.unwrap(); + + // Open a new transport — should resume from committed position + let r2 = FileTransport::new(&FileTransportConfig { + path: path_str, + append: true, + }) + .await + .unwrap(); + let remaining = r2.recv(10).await.unwrap(); + assert_eq!(remaining.len(), 1); + assert_eq!(remaining[0].payload, b"line3"); + } + + #[tokio::test] + async fn close_prevents_operations() { + let dir = TempDir::new().unwrap(); + let transport = make_transport(&dir, "close_test.ndjson").await; + + transport.close().await.unwrap(); + assert!(!transport.is_healthy()); + + let result = transport.send("k", b"data").await; + assert!(result.is_fatal()); + + let result = transport.recv(1).await; + assert!(result.is_err()); + } + + #[tokio::test] + async fn file_token_display() { + let token = FileToken { offset: 42 }; + assert_eq!(format!("{token}"), "file:42"); + } + + #[tokio::test] + async fn recv_returns_empty_at_eof() { + let dir = TempDir::new().unwrap(); + let path = dir.path().join("eof_test.ndjson"); + let path_str = path.to_str().unwrap().to_string(); + + // Write one line + let config = FileTransportConfig { + path: path_str.clone(), + append: true, + }; + let transport = FileTransport::new(&config).await.unwrap(); + transport.send("k", b"only_line").await; + transport.close().await.unwrap(); + + // Read all, then read again — should get empty + let reader = FileTransport::new(&FileTransportConfig { + path: path_str, + append: true, + }) + .await + .unwrap(); + let msgs = reader.recv(10).await.unwrap(); + assert_eq!(msgs.len(), 1); + + let more = reader.recv(10).await.unwrap(); + assert!(more.is_empty()); + } + + #[tokio::test] + async fn empty_path_is_config_error() { + let result = FileTransport::new(&FileTransportConfig::default()).await; + assert!(result.is_err()); + } + + #[tokio::test] + async fn transport_name() { + let dir = TempDir::new().unwrap(); + let transport = make_transport(&dir, "name_test.ndjson").await; + assert_eq!(transport.name(), "file"); + } +} diff --git a/src/transport/grpc/mod.rs b/src/transport/grpc/mod.rs index 2abe904..284d9b4 100644 --- a/src/transport/grpc/mod.rs +++ b/src/transport/grpc/mod.rs @@ -20,7 +20,7 @@ //! ## Example //! //! ```rust,ignore -//! use hyperi_rustlib::transport::{GrpcTransport, GrpcConfig, Transport}; +//! use hyperi_rustlib::transport::{GrpcTransport, GrpcConfig, TransportReceiver}; //! //! // Server mode (receive from remote senders) //! let config = GrpcConfig::server("0.0.0.0:6000"); @@ -39,7 +39,7 @@ pub use config::GrpcConfig; pub use token::GrpcToken; use super::error::{TransportError, TransportResult}; -use super::traits::Transport; +use super::traits::{TransportBase, TransportReceiver, TransportSender}; use super::types::{Message, PayloadFormat, SendResult}; use std::collections::HashMap; use std::sync::Arc; @@ -49,8 +49,8 @@ use tonic::{Request, Response, Status}; /// gRPC transport for DFE inter-service communication. /// -/// Combines a tonic gRPC client (for sending) and server (for receiving) -/// behind the unified `Transport` trait. +/// Implements both `TransportSender` and `TransportReceiver`, so it also +/// satisfies the unified `Transport` trait via blanket impl. pub struct GrpcTransport { /// Client for sending (None if server-only mode). client: Option>, @@ -183,9 +183,33 @@ impl GrpcTransport { } } -impl Transport for GrpcTransport { - type Token = GrpcToken; +impl TransportBase for GrpcTransport { + async fn close(&self) -> TransportResult<()> { + self.closed.store(true, Ordering::Relaxed); + + // Signal server shutdown + // Note: we can't take from Option behind &self, so we use a flag + // The server task will complete when the oneshot is dropped + Ok(()) + } + + fn is_healthy(&self) -> bool { + let healthy = !self.closed.load(Ordering::Relaxed); + #[cfg(feature = "metrics")] + metrics::gauge!("dfe_transport_healthy", "transport" => "grpc").set(if healthy { + 1.0 + } else { + 0.0 + }); + healthy + } + fn name(&self) -> &'static str { + "grpc" + } +} + +impl TransportSender for GrpcTransport { async fn send(&self, key: &str, payload: &[u8]) -> SendResult { if self.closed.load(Ordering::Relaxed) { return SendResult::Fatal(TransportError::Closed); @@ -257,6 +281,10 @@ impl Transport for GrpcTransport { result } +} + +impl TransportReceiver for GrpcTransport { + type Token = GrpcToken; async fn recv(&self, max: usize) -> TransportResult>> { if self.closed.load(Ordering::Relaxed) { @@ -315,30 +343,6 @@ impl Transport for GrpcTransport { // Acknowledgement is implicit in the Push RPC response. Ok(()) } - - async fn close(&self) -> TransportResult<()> { - self.closed.store(true, Ordering::Relaxed); - - // Signal server shutdown - // Note: we can't take from Option behind &self, so we use a flag - // The server task will complete when the oneshot is dropped - Ok(()) - } - - fn is_healthy(&self) -> bool { - let healthy = !self.closed.load(Ordering::Relaxed); - #[cfg(feature = "metrics")] - metrics::gauge!("dfe_transport_healthy", "transport" => "grpc").set(if healthy { - 1.0 - } else { - 0.0 - }); - healthy - } - - fn name(&self) -> &'static str { - "grpc" - } } impl Drop for GrpcTransport { diff --git a/src/transport/http.rs b/src/transport/http.rs new file mode 100644 index 0000000..39a0b77 --- /dev/null +++ b/src/transport/http.rs @@ -0,0 +1,681 @@ +// Project: hyperi-rustlib +// File: src/transport/http.rs +// Purpose: HTTP/HTTPS transport (send via POST, receive via embedded server) +// Language: Rust +// +// License: FSL-1.1-ALv2 +// Copyright: (c) 2026 HYPERI PTY LIMITED + +//! # HTTP Transport +//! +//! HTTP/HTTPS transport for webhook delivery and REST ingest. +//! +//! ## Send +//! +//! POSTs payload bytes to `{endpoint}/{key}` using reqwest. +//! +//! ## Receive (requires `http-server` feature) +//! +//! Starts an embedded axum HTTP server that accepts POST requests on a +//! configurable path. Incoming payloads are queued into a bounded +//! `tokio::sync::mpsc` channel. `recv()` drains from this channel. +//! +//! ## Example +//! +//! ```rust,ignore +//! use hyperi_rustlib::transport::http::{HttpTransport, HttpTransportConfig}; +//! +//! // Send-only +//! let config = HttpTransportConfig { +//! endpoint: Some("http://loader:8080/ingest".into()), +//! ..Default::default() +//! }; +//! let transport = HttpTransport::new(&config).await?; +//! transport.send("events", b"{\"msg\":\"hello\"}").await; +//! ``` + +use super::error::{TransportError, TransportResult}; +use super::traits::{CommitToken, TransportBase, TransportReceiver, TransportSender}; +#[cfg(feature = "http-server")] +use super::types::PayloadFormat; +use super::types::{Message, SendResult}; +use serde::{Deserialize, Serialize}; +#[cfg(feature = "http-server")] +use std::sync::Arc; +#[cfg(feature = "http-server")] +use std::sync::atomic::AtomicU64; +use std::sync::atomic::{AtomicBool, Ordering}; + +/// Commit token for HTTP transport. +/// +/// HTTP is fire-and-forget from the receiver's perspective, so commit +/// is a no-op. The token provides sequence tracking and optional +/// client address for observability. +#[derive(Debug, Clone)] +pub struct HttpToken { + /// Local sequence number (monotonically increasing per transport instance). + pub seq: u64, + + /// Source client address (if available from the HTTP request). + pub source_addr: Option, +} + +impl HttpToken { + /// Create a new token with sequence number. + #[must_use] + pub fn new(seq: u64) -> Self { + Self { + seq, + source_addr: None, + } + } + + /// Create a new token with sequence number and source address. + #[must_use] + pub fn with_source(seq: u64, addr: String) -> Self { + Self { + seq, + source_addr: Some(addr), + } + } +} + +impl CommitToken for HttpToken {} + +impl std::fmt::Display for HttpToken { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + match &self.source_addr { + Some(addr) => write!(f, "http:{}:{}", addr, self.seq), + None => write!(f, "http:{}", self.seq), + } + } +} + +fn default_recv_path() -> String { + "/ingest".to_string() +} + +fn default_buffer_size() -> usize { + 10_000 +} + +fn default_recv_timeout_ms() -> u64 { + 100 +} + +/// Configuration for HTTP transport. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct HttpTransportConfig { + /// Endpoint URL for sending (e.g., "http://loader:8080/ingest"). None = send disabled. + #[serde(default)] + pub endpoint: Option, + + /// Listen address for receiving (e.g., "0.0.0.0:8080"). None = receive disabled. + /// Requires the `http-server` feature. + #[serde(default)] + pub listen: Option, + + /// Path to accept POSTs on for receive mode. Default: "/ingest". + #[serde(default = "default_recv_path")] + pub recv_path: String, + + /// Receive buffer size (bounded channel capacity). Default: 10000. + #[serde(default = "default_buffer_size")] + pub recv_buffer_size: usize, + + /// Receive timeout in milliseconds. Default: 100. + #[serde(default = "default_recv_timeout_ms")] + pub recv_timeout_ms: u64, +} + +impl Default for HttpTransportConfig { + fn default() -> Self { + Self { + endpoint: None, + listen: None, + recv_path: default_recv_path(), + recv_buffer_size: default_buffer_size(), + recv_timeout_ms: default_recv_timeout_ms(), + } + } +} + +impl HttpTransportConfig { + /// Create a send-only config pointing at the given endpoint URL. + #[must_use] + pub fn sender(endpoint: &str) -> Self { + Self { + endpoint: Some(endpoint.to_string()), + ..Default::default() + } + } + + /// Create a receive-only config listening on the given address. + #[must_use] + pub fn receiver(listen: &str) -> Self { + Self { + listen: Some(listen.to_string()), + ..Default::default() + } + } +} + +/// HTTP/HTTPS transport. +/// +/// Supports send (POST to endpoint) and receive (embedded axum server). +/// The receive side requires the `http-server` feature for axum. +pub struct HttpTransport { + /// reqwest client for sending (always available when transport-http is enabled). + client: reqwest::Client, + + /// Base URL for sending (None = send disabled). + endpoint: Option, + + /// Receiver channel populated by the embedded HTTP server. + /// Only available when `http-server` feature is enabled AND `listen` is configured. + #[cfg(feature = "http-server")] + receiver: Option>>>, + + /// Shutdown signal for the server task. + #[cfg(feature = "http-server")] + shutdown_tx: Option>, + + /// Server background task handle. + #[cfg(feature = "http-server")] + _server_handle: Option>, + + /// Whether the transport is closed. + closed: AtomicBool, + + /// Receive timeout in milliseconds (used by receive side). + #[cfg(feature = "http-server")] + recv_timeout_ms: u64, +} + +impl HttpTransport { + /// Create a new HTTP transport. + /// + /// - Set `config.endpoint` to enable sending (POST to URL). + /// - Set `config.listen` to enable receiving (embedded HTTP server, requires `http-server` feature). + /// + /// # Errors + /// + /// Returns error if the listen address is invalid or the server fails to bind. + pub async fn new(config: &HttpTransportConfig) -> TransportResult { + let client = reqwest::Client::builder() + .build() + .map_err(|e| TransportError::Config(format!("failed to create HTTP client: {e}")))?; + + #[cfg(feature = "http-server")] + let (receiver, shutdown_tx, server_handle) = if let Some(listen) = &config.listen { + let addr: std::net::SocketAddr = listen + .parse() + .map_err(|e| TransportError::Config(format!("invalid listen address: {e}")))?; + + let (tx, rx) = tokio::sync::mpsc::channel(config.recv_buffer_size); + let (sd_tx, sd_rx) = tokio::sync::oneshot::channel::<()>(); + + let sequence = Arc::new(AtomicU64::new(0)); + let recv_path = config.recv_path.clone(); + + let app = build_receiver_router(tx, sequence, &recv_path); + + let listener = tokio::net::TcpListener::bind(addr).await.map_err(|e| { + TransportError::Connection(format!("failed to bind to {addr}: {e}")) + })?; + + let handle = tokio::spawn(async move { + axum::serve( + listener, + app.into_make_service_with_connect_info::(), + ) + .with_graceful_shutdown(async { + sd_rx.await.ok(); + }) + .await + .ok(); + }); + + (Some(tokio::sync::Mutex::new(rx)), Some(sd_tx), Some(handle)) + } else { + (None, None, None) + }; + + Ok(Self { + client, + endpoint: config.endpoint.clone(), + #[cfg(feature = "http-server")] + receiver, + #[cfg(feature = "http-server")] + shutdown_tx, + #[cfg(feature = "http-server")] + _server_handle: server_handle, + closed: AtomicBool::new(false), + #[cfg(feature = "http-server")] + recv_timeout_ms: config.recv_timeout_ms, + }) + } +} + +/// Build the axum router for the receive side. +#[cfg(feature = "http-server")] +fn build_receiver_router( + sender: tokio::sync::mpsc::Sender>, + sequence: Arc, + recv_path: &str, +) -> axum::Router { + use axum::routing::post; + + let state = ReceiverState { sender, sequence }; + + axum::Router::new() + .route(recv_path, post(ingest_handler)) + .with_state(state) +} + +/// Shared state for the receive handler. +#[cfg(feature = "http-server")] +#[derive(Clone)] +struct ReceiverState { + sender: tokio::sync::mpsc::Sender>, + sequence: Arc, +} + +/// POST handler that accepts raw bytes and queues them into the mpsc channel. +#[cfg(feature = "http-server")] +async fn ingest_handler( + axum::extract::State(state): axum::extract::State, + axum::extract::ConnectInfo(addr): axum::extract::ConnectInfo, + body: axum::body::Bytes, +) -> axum::http::StatusCode { + if body.is_empty() { + return axum::http::StatusCode::BAD_REQUEST; + } + + let seq = state.sequence.fetch_add(1, Ordering::Relaxed); + let format = PayloadFormat::detect(&body); + let timestamp_ms = chrono::Utc::now().timestamp_millis(); + + let msg = Message { + key: None, + payload: body.to_vec(), + token: HttpToken::with_source(seq, addr.to_string()), + timestamp_ms: Some(timestamp_ms), + format, + }; + + match state.sender.try_send(msg) { + Ok(()) => { + #[cfg(feature = "metrics")] + metrics::counter!("dfe_transport_sent_total", "transport" => "http").increment(1); + axum::http::StatusCode::OK + } + Err(tokio::sync::mpsc::error::TrySendError::Full(_)) => { + #[cfg(feature = "metrics")] + metrics::counter!("dfe_transport_backpressured_total", "transport" => "http") + .increment(1); + axum::http::StatusCode::SERVICE_UNAVAILABLE + } + Err(tokio::sync::mpsc::error::TrySendError::Closed(_)) => { + #[cfg(feature = "metrics")] + metrics::counter!("dfe_transport_refused_total", "transport" => "http").increment(1); + axum::http::StatusCode::GONE + } + } +} + +impl TransportBase for HttpTransport { + async fn close(&self) -> TransportResult<()> { + self.closed.store(true, Ordering::Relaxed); + Ok(()) + } + + fn is_healthy(&self) -> bool { + !self.closed.load(Ordering::Relaxed) + } + + fn name(&self) -> &'static str { + "http" + } +} + +impl TransportSender for HttpTransport { + async fn send(&self, key: &str, payload: &[u8]) -> SendResult { + if self.closed.load(Ordering::Relaxed) { + return SendResult::Fatal(TransportError::Closed); + } + + let Some(base_url) = &self.endpoint else { + return SendResult::Fatal(TransportError::Config( + "no endpoint configured for sending".into(), + )); + }; + + // Build URL: {base_url}/{key} if key is non-empty, otherwise just {base_url} + let url = if key.is_empty() { + base_url.clone() + } else { + let base = base_url.trim_end_matches('/'); + let suffix = key.trim_start_matches('/'); + format!("{base}/{suffix}") + }; + + #[cfg(feature = "metrics")] + let start = std::time::Instant::now(); + + let result = match self + .client + .post(&url) + .header("content-type", "application/octet-stream") + .body(payload.to_vec()) + .send() + .await + { + Ok(resp) if resp.status().is_success() => { + #[cfg(feature = "metrics")] + metrics::counter!("dfe_transport_sent_total", "transport" => "http").increment(1); + SendResult::Ok + } + Ok(resp) + if resp.status() == reqwest::StatusCode::TOO_MANY_REQUESTS + || resp.status() == reqwest::StatusCode::SERVICE_UNAVAILABLE => + { + #[cfg(feature = "metrics")] + metrics::counter!("dfe_transport_backpressured_total", "transport" => "http") + .increment(1); + SendResult::Backpressured + } + Ok(resp) => { + #[cfg(feature = "metrics")] + metrics::counter!("dfe_transport_send_errors_total", "transport" => "http") + .increment(1); + SendResult::Fatal(TransportError::Send(format!( + "HTTP {} from {}", + resp.status(), + url + ))) + } + Err(e) => { + #[cfg(feature = "metrics")] + metrics::counter!("dfe_transport_send_errors_total", "transport" => "http") + .increment(1); + SendResult::Fatal(TransportError::Send(format!("HTTP request failed: {e}"))) + } + }; + + #[cfg(feature = "metrics")] + metrics::histogram!("dfe_transport_send_duration_seconds", "transport" => "http") + .record(start.elapsed().as_secs_f64()); + + result + } +} + +impl TransportReceiver for HttpTransport { + type Token = HttpToken; + + async fn recv(&self, max: usize) -> TransportResult>> { + if self.closed.load(Ordering::Relaxed) { + return Err(TransportError::Closed); + } + + #[cfg(feature = "http-server")] + { + let Some(receiver) = &self.receiver else { + return Err(TransportError::Config( + "no listen address configured for receiving".into(), + )); + }; + + let mut rx = receiver.lock().await; + let mut messages = Vec::with_capacity(max.min(100)); + + for _ in 0..max { + let result = if self.recv_timeout_ms == 0 { + match rx.try_recv() { + Ok(msg) => Some(msg), + Err(tokio::sync::mpsc::error::TryRecvError::Empty) => break, + Err(tokio::sync::mpsc::error::TryRecvError::Disconnected) => { + return Err(TransportError::Closed); + } + } + } else if messages.is_empty() { + // First message: wait with timeout + match tokio::time::timeout( + std::time::Duration::from_millis(self.recv_timeout_ms), + rx.recv(), + ) + .await + { + Ok(Some(msg)) => Some(msg), + Ok(None) => return Err(TransportError::Closed), + Err(_) => break, // Timeout + } + } else { + // Subsequent: non-blocking drain + match rx.try_recv() { + Ok(msg) => Some(msg), + Err(_) => break, + } + }; + + if let Some(msg) = result { + messages.push(msg); + } + } + + Ok(messages) + } + + #[cfg(not(feature = "http-server"))] + { + let _ = max; + Err(TransportError::Config( + "HTTP receive requires the 'http-server' feature".into(), + )) + } + } + + async fn commit(&self, _tokens: &[Self::Token]) -> TransportResult<()> { + // HTTP is fire-and-forget — commit is a no-op. + Ok(()) + } +} + +impl Drop for HttpTransport { + fn drop(&mut self) { + #[cfg(feature = "http-server")] + if let Some(tx) = self.shutdown_tx.take() { + let _ = tx.send(()); + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn http_token_display() { + let token = HttpToken::new(42); + assert_eq!(format!("{token}"), "http:42"); + } + + #[test] + fn http_token_display_with_source() { + let token = HttpToken::with_source(7, "192.168.1.1:54321".to_string()); + assert_eq!(format!("{token}"), "http:192.168.1.1:54321:7"); + } + + #[test] + fn config_defaults() { + let config = HttpTransportConfig::default(); + assert!(config.endpoint.is_none()); + assert!(config.listen.is_none()); + assert_eq!(config.recv_path, "/ingest"); + assert_eq!(config.recv_buffer_size, 10_000); + assert_eq!(config.recv_timeout_ms, 100); + } + + #[test] + fn config_sender_helper() { + let config = HttpTransportConfig::sender("http://localhost:8080/ingest"); + assert_eq!( + config.endpoint.as_deref(), + Some("http://localhost:8080/ingest") + ); + assert!(config.listen.is_none()); + } + + #[test] + fn config_receiver_helper() { + let config = HttpTransportConfig::receiver("0.0.0.0:8080"); + assert!(config.endpoint.is_none()); + assert_eq!(config.listen.as_deref(), Some("0.0.0.0:8080")); + } + + #[tokio::test] + async fn send_only_transport() { + // Send-only config (no endpoint = send disabled, but transport creates fine) + let config = HttpTransportConfig::default(); + let transport = HttpTransport::new(&config).await.unwrap(); + + assert!(transport.is_healthy()); + assert_eq!(transport.name(), "http"); + + // Send without endpoint should fail + let result = transport.send("test", b"payload").await; + assert!(result.is_fatal()); + + // Commit is always ok + transport.commit(&[]).await.unwrap(); + } + + #[tokio::test] + async fn close_prevents_send() { + let config = HttpTransportConfig::sender("http://localhost:19999/test"); + let transport = HttpTransport::new(&config).await.unwrap(); + + transport.close().await.unwrap(); + assert!(!transport.is_healthy()); + + let result = transport.send("test", b"data").await; + assert!(result.is_fatal()); + } + + #[tokio::test] + async fn close_prevents_recv() { + let config = HttpTransportConfig::default(); + let transport = HttpTransport::new(&config).await.unwrap(); + + transport.close().await.unwrap(); + let result = transport.recv(1).await; + assert!(result.is_err()); + } + + /// Full send + receive round-trip test. + /// Requires both `transport-http` and `http-server` features. + #[cfg(feature = "http-server")] + #[tokio::test] + async fn send_and_receive_roundtrip() { + // Start receiver on a random available port + let listener = tokio::net::TcpListener::bind("127.0.0.1:0").await.unwrap(); + let addr = listener.local_addr().unwrap(); + drop(listener); // Free the port for the transport to bind + + let recv_config = HttpTransportConfig { + listen: Some(addr.to_string()), + recv_path: "/ingest".to_string(), + recv_buffer_size: 100, + recv_timeout_ms: 1000, + ..Default::default() + }; + let receiver = HttpTransport::new(&recv_config).await.unwrap(); + + // Give the server a moment to start + tokio::time::sleep(std::time::Duration::from_millis(50)).await; + + // Send a message via a separate sender transport + let send_config = + HttpTransportConfig::sender(&format!("http://127.0.0.1:{}/ingest", addr.port())); + let sender = HttpTransport::new(&send_config).await.unwrap(); + + let result = sender.send("", b"{\"msg\":\"hello\"}").await; + assert!(result.is_ok(), "send failed: {result:?}"); + + // Receive it + let messages = receiver.recv(10).await.unwrap(); + assert_eq!(messages.len(), 1); + assert_eq!(messages[0].payload, b"{\"msg\":\"hello\"}"); + assert!(messages[0].token.source_addr.is_some()); + + // Cleanup + sender.close().await.unwrap(); + receiver.close().await.unwrap(); + } + + /// Test that the receiver rejects empty bodies. + #[cfg(feature = "http-server")] + #[tokio::test] + async fn receive_rejects_empty_body() { + let listener = tokio::net::TcpListener::bind("127.0.0.1:0").await.unwrap(); + let addr = listener.local_addr().unwrap(); + drop(listener); + + let recv_config = HttpTransportConfig { + listen: Some(addr.to_string()), + recv_timeout_ms: 200, + ..Default::default() + }; + let receiver = HttpTransport::new(&recv_config).await.unwrap(); + tokio::time::sleep(std::time::Duration::from_millis(50)).await; + + // Send empty body + let client = reqwest::Client::new(); + let resp = client + .post(format!("http://127.0.0.1:{}/ingest", addr.port())) + .body(Vec::::new()) + .send() + .await + .unwrap(); + + assert_eq!(resp.status(), reqwest::StatusCode::BAD_REQUEST); + + // recv should timeout with no messages + let messages = receiver.recv(10).await.unwrap(); + assert!(messages.is_empty()); + + receiver.close().await.unwrap(); + } + + /// Test recv without listen returns config error. + #[cfg(feature = "http-server")] + #[tokio::test] + async fn recv_without_listen_returns_error() { + let config = HttpTransportConfig::sender("http://localhost:9999"); + let transport = HttpTransport::new(&config).await.unwrap(); + + let result = transport.recv(10).await; + assert!(result.is_err()); + } + + #[test] + fn config_serde_roundtrip() { + let config = HttpTransportConfig { + endpoint: Some("http://example.com/ingest".into()), + listen: Some("0.0.0.0:8080".into()), + recv_path: "/custom".into(), + recv_buffer_size: 5000, + recv_timeout_ms: 250, + }; + + let json = serde_json::to_string(&config).unwrap(); + let parsed: HttpTransportConfig = serde_json::from_str(&json).unwrap(); + + assert_eq!(parsed.endpoint, config.endpoint); + assert_eq!(parsed.listen, config.listen); + assert_eq!(parsed.recv_path, config.recv_path); + assert_eq!(parsed.recv_buffer_size, config.recv_buffer_size); + assert_eq!(parsed.recv_timeout_ms, config.recv_timeout_ms); + } +} diff --git a/src/transport/kafka/mod.rs b/src/transport/kafka/mod.rs index 0d54c23..c320572 100644 --- a/src/transport/kafka/mod.rs +++ b/src/transport/kafka/mod.rs @@ -70,7 +70,7 @@ pub use producer::{KafkaProducer, ProducerMetrics, ProducerProfile}; pub use token::KafkaToken; use super::error::{TransportError, TransportResult}; -use super::traits::Transport; +use super::traits::{TransportBase, TransportReceiver, TransportSender}; use super::types::{Message, PayloadFormat, SendResult}; use rdkafka::config::ClientConfig; use rdkafka::consumer::{BaseConsumer, CommitMode, Consumer}; @@ -251,9 +251,23 @@ impl KafkaTransport { } } -impl Transport for KafkaTransport { - type Token = KafkaToken; +impl TransportBase for KafkaTransport { + async fn close(&self) -> TransportResult<()> { + self.closed.store(true, Ordering::Relaxed); + // rdkafka handles cleanup on drop + Ok(()) + } + + fn is_healthy(&self) -> bool { + !self.closed.load(Ordering::Relaxed) + } + + fn name(&self) -> &'static str { + "kafka" + } +} +impl TransportSender for KafkaTransport { async fn send(&self, key: &str, payload: &[u8]) -> SendResult { if self.closed.load(Ordering::Relaxed) { return SendResult::Fatal(TransportError::Closed); @@ -306,6 +320,10 @@ impl Transport for KafkaTransport { result } +} + +impl TransportReceiver for KafkaTransport { + type Token = KafkaToken; /// Receive a batch of messages. /// @@ -435,20 +453,6 @@ impl Transport for KafkaTransport { Ok(()) } - - async fn close(&self) -> TransportResult<()> { - self.closed.store(true, Ordering::Relaxed); - // rdkafka handles cleanup on drop - Ok(()) - } - - fn is_healthy(&self) -> bool { - !self.closed.load(Ordering::Relaxed) - } - - fn name(&self) -> &'static str { - "kafka" - } } /// Get or insert topic Arc into cache. diff --git a/src/transport/memory/mod.rs b/src/transport/memory/mod.rs index ae1564b..77aad36 100644 --- a/src/transport/memory/mod.rs +++ b/src/transport/memory/mod.rs @@ -32,7 +32,7 @@ mod token; pub use token::MemoryToken; use super::error::{TransportError, TransportResult}; -use super::traits::Transport; +use super::traits::{TransportBase, TransportReceiver, TransportSender}; use super::types::{Message, PayloadFormat, SendResult}; use serde::{Deserialize, Serialize}; use std::sync::Arc; @@ -174,9 +174,22 @@ impl MemorySender<'_> { } } -impl Transport for MemoryTransport { - type Token = MemoryToken; +impl TransportBase for MemoryTransport { + async fn close(&self) -> TransportResult<()> { + self.closed.store(true, Ordering::Relaxed); + Ok(()) + } + + fn is_healthy(&self) -> bool { + !self.closed.load(Ordering::Relaxed) + } + + fn name(&self) -> &'static str { + "memory" + } +} +impl TransportSender for MemoryTransport { async fn send(&self, key: &str, payload: &[u8]) -> SendResult { if self.closed.load(Ordering::Relaxed) { return SendResult::Fatal(TransportError::Closed); @@ -198,6 +211,10 @@ impl Transport for MemoryTransport { Err(mpsc::error::TrySendError::Closed(_)) => SendResult::Fatal(TransportError::Closed), } } +} + +impl TransportReceiver for MemoryTransport { + type Token = MemoryToken; async fn recv(&self, max: usize) -> TransportResult>> { if self.closed.load(Ordering::Relaxed) { @@ -207,10 +224,8 @@ impl Transport for MemoryTransport { let mut receiver = self.receiver.lock().await; let mut messages = Vec::with_capacity(max.min(100)); - // Try to receive up to max messages for _ in 0..max { let result = if self.recv_timeout_ms == 0 { - // Non-blocking: try_recv match receiver.try_recv() { Ok(msg) => Some(msg), Err(mpsc::error::TryRecvError::Empty) => break, @@ -219,7 +234,6 @@ impl Transport for MemoryTransport { } } } else if messages.is_empty() { - // First message: wait with timeout match tokio::time::timeout( std::time::Duration::from_millis(self.recv_timeout_ms), receiver.recv(), @@ -228,10 +242,9 @@ impl Transport for MemoryTransport { { Ok(Some(msg)) => Some(msg), Ok(None) => return Err(TransportError::Closed), - Err(_) => break, // Timeout + Err(_) => break, } } else { - // Subsequent messages: non-blocking match receiver.try_recv() { Ok(msg) => Some(msg), Err(_) => break, @@ -254,26 +267,11 @@ impl Transport for MemoryTransport { } async fn commit(&self, tokens: &[Self::Token]) -> TransportResult<()> { - // Find the highest sequence number if let Some(max_seq) = tokens.iter().map(|t| t.seq).max() { - // Update committed sequence (only advance, never go back) let _ = self.committed_seq.fetch_max(max_seq, Ordering::Relaxed); } Ok(()) } - - async fn close(&self) -> TransportResult<()> { - self.closed.store(true, Ordering::Relaxed); - Ok(()) - } - - fn is_healthy(&self) -> bool { - !self.closed.load(Ordering::Relaxed) - } - - fn name(&self) -> &'static str { - "memory" - } } #[cfg(test)] diff --git a/src/transport/mod.rs b/src/transport/mod.rs index 7489184..7bcfa47 100644 --- a/src/transport/mod.rs +++ b/src/transport/mod.rs @@ -8,49 +8,50 @@ //! # Transport Abstraction Layer //! -//! Pluggable message transport supporting Kafka, gRPC, and in-memory channels. -//! All transports deliver raw bytes (JSON or MsgPack) without any envelope format. +//! Pluggable message transport with split sender/receiver traits for +//! type-safe factory construction and runtime transport selection. //! -//! ## Transport Selection +//! ## Architecture +//! +//! ```text +//! TransportSender (object-safe) TransportReceiver (generic) +//! send(key, payload) recv(max) -> Vec> +//! close() commit(tokens) +//! is_healthy() close() +//! name() is_healthy(), name() +//! | | +//! +-------- Transport (blanket) -------+ +//! ``` //! -//! | Transport | Use Case | Durability | -//! |-----------|----------|------------| -//! | **Kafka** | Production (default) | At-least-once with broker persistence | -//! | **gRPC** | DFE mesh, low-latency | In-flight only, sender-side WAL optional | -//! | **Memory** | Unit tests | None, same-process only | +//! - **Output stages** (DLQ, forwarding, archiving): use `Box` +//! - **Input stages** (receiver, fetcher): use concrete `impl TransportReceiver` +//! - **Factory**: `sender_from_config()` returns `Box` //! -//! ## Vector Wire Protocol Compatibility +//! ## Transport Selection //! -//! Enable `transport-grpc-vector-compat` to accept events from legacy Vector sinks. -//! The Vector compat layer converts `vector.Vector/PushEvents` RPCs to native DFE -//! messages, enabling component-by-component migration from Vector to DFE. +//! | Transport | Send | Recv | Use Case | +//! |-----------|------|------|----------| +//! | **Kafka** | Yes | Yes | Production default, PB/day, persistence | +//! | **gRPC** | Yes | Yes | Low-latency direct, DFE mesh | +//! | **Memory** | Yes | Yes | Unit tests, same-process | +//! | **File** | Yes | Yes | Debugging, audit trails, replay | +//! | **Pipe** | Yes | Yes | Unix pipelines, sidecar pattern | +//! | **HTTP** | Yes | Yes | Webhook delivery, REST ingest | +//! | **Redis** | Yes | Yes | Edge deployments, lightweight pub/sub | //! //! ## Example //! //! ```rust,ignore -//! use hyperi_rustlib::transport::{Transport, TransportConfig, TransportType}; +//! use hyperi_rustlib::transport::{TransportSender, TransportConfig}; //! -//! // Create transport from config -//! let config = TransportConfig { -//! transport_type: TransportType::Kafka, -//! kafka: Some(KafkaConfig { /* ... */ }), -//! ..Default::default() -//! }; -//! let transport = create_transport(&config).await?; -//! -//! // Receive messages -//! let messages = transport.recv(100).await?; -//! for msg in &messages { -//! println!("Received: {} bytes", msg.payload.len()); -//! } -//! -//! // Commit after processing -//! let tokens: Vec<_> = messages.iter().map(|m| m.token.clone()).collect(); -//! transport.commit(&tokens).await?; +//! // Factory creates the right backend from config +//! let sender: Box = transport::sender_from_config("transport.output").await?; +//! sender.send("events.land", payload).await; //! ``` mod detect; mod error; +pub mod factory; mod payload; mod traits; mod types; @@ -77,9 +78,25 @@ pub mod vector_compat; #[cfg(feature = "transport-memory")] pub mod memory; -// Re-exports +#[cfg(feature = "transport-pipe")] +pub mod pipe; + +#[cfg(feature = "transport-file")] +pub mod file; + +#[cfg(feature = "transport-http")] +pub mod http; + +#[cfg(feature = "transport-redis")] +pub mod redis_transport; + +pub mod routed; + +// Re-exports — traits and factory pub use error::{TransportError, TransportResult}; -pub use traits::{CommitToken, Transport}; +pub use factory::AnySender; +pub use routed::RoutedSender; +pub use traits::{CommitToken, Transport, TransportBase, TransportReceiver, TransportSender}; pub use types::{Message, SendResult, TransportConfig, TransportType}; #[cfg(feature = "transport-kafka")] @@ -94,7 +111,14 @@ pub use vector_compat::{VectorCompatClient, VectorCompatService}; #[cfg(feature = "transport-memory")] pub use memory::{MemoryConfig, MemoryToken, MemoryTransport}; -// Note: Transport instances are created directly via their constructors -// (e.g., KafkaTransport::new(), GrpcTransport::new(), MemoryTransport::new()) -// rather than through a factory function, because each transport has a -// different Token associated type that can't be erased without losing type safety. +#[cfg(feature = "transport-pipe")] +pub use pipe::{PipeToken, PipeTransport, PipeTransportConfig}; + +#[cfg(feature = "transport-file")] +pub use file::{FileToken, FileTransport, FileTransportConfig}; + +#[cfg(feature = "transport-http")] +pub use http::{HttpToken, HttpTransport, HttpTransportConfig}; + +#[cfg(feature = "transport-redis")] +pub use redis_transport::{RedisToken, RedisTransport, RedisTransportConfig}; diff --git a/src/transport/pipe.rs b/src/transport/pipe.rs new file mode 100644 index 0000000..4457d2e --- /dev/null +++ b/src/transport/pipe.rs @@ -0,0 +1,333 @@ +// Project: hyperi-rustlib +// File: src/transport/pipe.rs +// Purpose: Unix pipe transport (stdin/stdout) +// Language: Rust +// +// License: FSL-1.1-ALv2 +// Copyright: (c) 2026 HYPERI PTY LIMITED + +//! # Pipe Transport +//! +//! Reads from stdin and writes to stdout for Unix pipeline composition. +//! Newline-delimited: each line is one message. +//! +//! ## Example +//! +//! ```rust,ignore +//! use hyperi_rustlib::transport::{PipeTransport, PipeTransportConfig}; +//! +//! let config = PipeTransportConfig::default(); +//! let transport = PipeTransport::new(&config); +//! +//! // Send writes payload + newline to stdout +//! transport.send("ignored", b"hello world").await; +//! +//! // Recv reads lines from stdin +//! let messages = transport.recv(10).await?; +//! ``` + +use super::error::{TransportError, TransportResult}; +use super::traits::{CommitToken, TransportBase, TransportReceiver, TransportSender}; +use super::types::{Message, PayloadFormat, SendResult}; +use serde::{Deserialize, Serialize}; +use std::sync::atomic::{AtomicBool, AtomicU64, Ordering}; +use tokio::io::{AsyncBufReadExt, AsyncWriteExt, BufReader}; + +/// Commit token for pipe transport. +/// +/// Contains a monotonic sequence number. Commit is a no-op +/// because stdin is a forward-only stream. +#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)] +pub struct PipeToken { + /// Message sequence number. + pub seq: u64, +} + +impl CommitToken for PipeToken {} + +impl std::fmt::Display for PipeToken { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + write!(f, "pipe:{}", self.seq) + } +} + +/// Configuration for pipe transport. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct PipeTransportConfig { + /// Receive timeout in milliseconds (0 = block until data). Default: 100. + #[serde(default = "default_recv_timeout_ms")] + pub recv_timeout_ms: u64, +} + +fn default_recv_timeout_ms() -> u64 { + 100 +} + +impl Default for PipeTransportConfig { + fn default() -> Self { + Self { + recv_timeout_ms: default_recv_timeout_ms(), + } + } +} + +/// Unix pipe transport (stdin/stdout). +/// +/// Send writes newline-delimited payloads to stdout. +/// Receive reads lines from stdin, each becoming a message. +/// Commit is a no-op (stdin cannot be rewound). +pub struct PipeTransport { + stdin: tokio::sync::Mutex>, + stdout: tokio::sync::Mutex, + sequence: AtomicU64, + closed: AtomicBool, + recv_timeout_ms: u64, +} + +impl PipeTransport { + /// Create a new pipe transport. + #[must_use] + pub fn new(config: &PipeTransportConfig) -> Self { + Self { + stdin: tokio::sync::Mutex::new(BufReader::new(tokio::io::stdin())), + stdout: tokio::sync::Mutex::new(tokio::io::stdout()), + sequence: AtomicU64::new(0), + closed: AtomicBool::new(false), + recv_timeout_ms: config.recv_timeout_ms, + } + } +} + +impl TransportBase for PipeTransport { + async fn close(&self) -> TransportResult<()> { + self.closed.store(true, Ordering::Relaxed); + + // Flush stdout before closing + let mut stdout = self.stdout.lock().await; + stdout + .flush() + .await + .map_err(|e| TransportError::Internal(format!("stdout flush failed: {e}")))?; + + Ok(()) + } + + fn is_healthy(&self) -> bool { + !self.closed.load(Ordering::Relaxed) + } + + fn name(&self) -> &'static str { + "pipe" + } +} + +impl TransportSender for PipeTransport { + async fn send(&self, _key: &str, payload: &[u8]) -> SendResult { + if self.closed.load(Ordering::Relaxed) { + return SendResult::Fatal(TransportError::Closed); + } + + let mut stdout = self.stdout.lock().await; + + // Write payload + newline + if let Err(e) = stdout.write_all(payload).await { + return SendResult::Fatal(TransportError::Send(format!("stdout write failed: {e}"))); + } + if let Err(e) = stdout.write_all(b"\n").await { + return SendResult::Fatal(TransportError::Send(format!( + "stdout newline write failed: {e}" + ))); + } + if let Err(e) = stdout.flush().await { + return SendResult::Fatal(TransportError::Send(format!("stdout flush failed: {e}"))); + } + + #[cfg(feature = "metrics")] + metrics::counter!("dfe_transport_sent_total", "transport" => "pipe").increment(1); + + SendResult::Ok + } +} + +impl TransportReceiver for PipeTransport { + type Token = PipeToken; + + async fn recv(&self, max: usize) -> TransportResult>> { + if self.closed.load(Ordering::Relaxed) { + return Err(TransportError::Closed); + } + + let mut stdin = self.stdin.lock().await; + let mut messages = Vec::with_capacity(max.min(100)); + let mut line_buf = String::new(); + + for _ in 0..max { + line_buf.clear(); + + let read_result = if self.recv_timeout_ms == 0 { + // Block until data arrives + stdin.read_line(&mut line_buf).await + } else if messages.is_empty() { + // First message: wait up to timeout + match tokio::time::timeout( + std::time::Duration::from_millis(self.recv_timeout_ms), + stdin.read_line(&mut line_buf), + ) + .await + { + Ok(result) => result, + Err(_) => break, // Timeout, return what we have (empty) + } + } else { + // Subsequent messages: non-blocking attempt via short timeout + match tokio::time::timeout( + std::time::Duration::from_millis(1), + stdin.read_line(&mut line_buf), + ) + .await + { + Ok(result) => result, + Err(_) => break, // No more data ready + } + }; + + match read_result { + Ok(0) => { + // EOF on stdin + if messages.is_empty() { + return Err(TransportError::Closed); + } + break; + } + Ok(_) => { + // Strip trailing newline + let payload = line_buf.trim_end_matches('\n').trim_end_matches('\r'); + if payload.is_empty() { + continue; + } + + let payload_bytes = payload.as_bytes().to_vec(); + let seq = self.sequence.fetch_add(1, Ordering::Relaxed); + let format = PayloadFormat::detect(&payload_bytes); + let timestamp_ms = chrono::Utc::now().timestamp_millis(); + + messages.push(Message { + key: None, + payload: payload_bytes, + token: PipeToken { seq }, + timestamp_ms: Some(timestamp_ms), + format, + }); + + #[cfg(feature = "metrics")] + metrics::counter!("dfe_records_received_total", "transport" => "pipe") + .increment(1); + } + Err(e) => { + return Err(TransportError::Recv(format!("stdin read failed: {e}"))); + } + } + } + + Ok(messages) + } + + async fn commit(&self, _tokens: &[Self::Token]) -> TransportResult<()> { + // No-op: stdin is a forward-only stream, cannot rewind or acknowledge + Ok(()) + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn token_display() { + let token = PipeToken { seq: 42 }; + assert_eq!(token.to_string(), "pipe:42"); + } + + #[test] + fn token_as_str() { + let token = PipeToken { seq: 7 }; + assert_eq!(token.as_str(), "pipe:7"); + } + + #[test] + fn token_clone() { + let token = PipeToken { seq: 99 }; + let cloned = token; + assert_eq!(token, cloned); + } + + #[test] + fn config_defaults() { + let config = PipeTransportConfig::default(); + assert_eq!(config.recv_timeout_ms, 100); + } + + #[test] + fn config_serde_roundtrip() { + let config = PipeTransportConfig { + recv_timeout_ms: 500, + }; + let json = serde_json::to_string(&config).unwrap(); + let parsed: PipeTransportConfig = serde_json::from_str(&json).unwrap(); + assert_eq!(parsed.recv_timeout_ms, 500); + } + + #[test] + fn config_serde_default_fields() { + // Empty JSON should use defaults + let parsed: PipeTransportConfig = serde_json::from_str("{}").unwrap(); + assert_eq!(parsed.recv_timeout_ms, 100); + } + + #[tokio::test] + async fn new_transport_is_healthy() { + let config = PipeTransportConfig::default(); + let transport = PipeTransport::new(&config); + assert!(transport.is_healthy()); + assert_eq!(transport.name(), "pipe"); + } + + #[tokio::test] + async fn close_marks_unhealthy() { + let config = PipeTransportConfig::default(); + let transport = PipeTransport::new(&config); + + transport.close().await.unwrap(); + assert!(!transport.is_healthy()); + } + + #[tokio::test] + async fn send_after_close_returns_fatal() { + let config = PipeTransportConfig::default(); + let transport = PipeTransport::new(&config); + + transport.close().await.unwrap(); + let result = transport.send("key", b"data").await; + assert!(result.is_fatal()); + } + + #[tokio::test] + async fn recv_after_close_returns_error() { + let config = PipeTransportConfig::default(); + let transport = PipeTransport::new(&config); + + transport.close().await.unwrap(); + let result = transport.recv(1).await; + assert!(result.is_err()); + } + + #[tokio::test] + async fn commit_is_noop() { + let config = PipeTransportConfig::default(); + let transport = PipeTransport::new(&config); + + let tokens = vec![PipeToken { seq: 0 }, PipeToken { seq: 1 }]; + let result = transport.commit(&tokens).await; + assert!(result.is_ok()); + } +} diff --git a/src/transport/redis_transport.rs b/src/transport/redis_transport.rs new file mode 100644 index 0000000..ba29c8b --- /dev/null +++ b/src/transport/redis_transport.rs @@ -0,0 +1,557 @@ +// Project: hyperi-rustlib +// File: src/transport/redis_transport.rs +// Purpose: Redis/Valkey Streams transport +// Language: Rust +// +// License: FSL-1.1-ALv2 +// Copyright: (c) 2026 HYPERI PTY LIMITED + +//! # Redis Streams Transport +//! +//! Lightweight pub/sub transport using Redis (or Valkey) Streams. +//! Uses `XADD` for send, `XREADGROUP` for receive, and `XACK` for commit. +//! +//! ## Send +//! +//! Appends payload bytes to a named stream via `XADD`. Optionally caps +//! the stream length with `MAXLEN ~` for approximate trimming. +//! +//! ## Receive +//! +//! Reads from a consumer group via `XREADGROUP` with blocking. Creates +//! the consumer group on first use if it does not exist. +//! +//! ## Commit +//! +//! Acknowledges processed entries via `XACK` so they are not re-delivered +//! to other consumers in the same group. +//! +//! ## Example +//! +//! ```rust,ignore +//! use hyperi_rustlib::transport::redis_transport::{RedisTransport, RedisTransportConfig}; +//! +//! let config = RedisTransportConfig { +//! stream: Some("events".into()), +//! ..Default::default() +//! }; +//! let transport = RedisTransport::new(&config).await?; +//! transport.send("events", b"{\"msg\":\"hello\"}").await; +//! ``` + +use super::error::{TransportError, TransportResult}; +use super::traits::{CommitToken, TransportBase, TransportReceiver, TransportSender}; +use super::types::{Message, PayloadFormat, SendResult}; +use redis::AsyncCommands; +use redis::streams::{StreamMaxlen, StreamReadOptions, StreamReadReply}; +use serde::{Deserialize, Serialize}; +use std::sync::Arc; +use std::sync::atomic::{AtomicBool, Ordering}; +use tokio::sync::Mutex; + +/// Commit token for Redis Streams transport. +/// +/// Contains the stream name and entry ID needed for `XACK`. +#[derive(Debug, Clone, PartialEq, Eq, Hash)] +pub struct RedisToken { + /// Stream name the entry belongs to. + pub stream: Arc, + /// Redis stream entry ID (e.g. "1711432800000-0"). + pub entry_id: String, +} + +impl CommitToken for RedisToken {} + +impl std::fmt::Display for RedisToken { + fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result { + write!(f, "redis:{}:{}", self.stream, self.entry_id) + } +} + +fn default_url() -> String { + "redis://127.0.0.1:6379".into() +} + +fn default_group() -> String { + "dfe".into() +} + +fn default_consumer() -> String { + "consumer-1".into() +} + +fn default_block_ms() -> usize { + 5000 +} + +/// Configuration for Redis/Valkey Streams transport. +#[derive(Debug, Clone, Serialize, Deserialize)] +pub struct RedisTransportConfig { + /// Redis/Valkey connection URL. + /// + /// Supports `redis://`, `rediss://` (TLS), and `unix://` schemes. + /// Default: `"redis://127.0.0.1:6379"`. + #[serde(default = "default_url")] + pub url: String, + + /// Stream name for send/receive. + /// + /// Used as default when key is empty in `send()`. + #[serde(default)] + pub stream: Option, + + /// Consumer group name. Default: `"dfe"`. + #[serde(default = "default_group")] + pub group: String, + + /// Consumer name within group. Default: hostname or `"consumer-1"`. + #[serde(default = "default_consumer")] + pub consumer: String, + + /// Maximum stream length (approximate via `MAXLEN ~`). + /// + /// `None` means unlimited growth. + #[serde(default)] + pub max_stream_len: Option, + + /// Block timeout in milliseconds for `XREADGROUP`. Default: 5000. + #[serde(default = "default_block_ms")] + pub block_ms: usize, +} + +impl Default for RedisTransportConfig { + fn default() -> Self { + Self { + url: default_url(), + stream: None, + group: default_group(), + consumer: default_consumer(), + max_stream_len: None, + block_ms: default_block_ms(), + } + } +} + +/// Redis/Valkey Streams transport. +/// +/// Supports both send (`XADD`) and receive (`XREADGROUP`) operations. +/// Works with both Redis and Valkey (same wire protocol). +pub struct RedisTransport { + conn: Mutex, + config: RedisTransportConfig, + closed: AtomicBool, + /// Whether the consumer group has been ensured for a given stream. + group_created: Mutex>, +} + +impl RedisTransport { + /// Create a new Redis Streams transport. + /// + /// Connects to the Redis server and prepares for stream operations. + /// The consumer group is created lazily on first `recv()` call. + /// + /// # Errors + /// + /// Returns error if the URL is invalid or connection fails. + pub async fn new(config: &RedisTransportConfig) -> TransportResult { + let client = redis::Client::open(config.url.as_str()).map_err(|e| { + TransportError::Config(format!("invalid Redis URL '{}': {e}", config.url)) + })?; + + let conn = client + .get_multiplexed_async_connection() + .await + .map_err(|e| { + TransportError::Connection(format!( + "failed to connect to Redis at '{}': {e}", + config.url + )) + })?; + + Ok(Self { + conn: Mutex::new(conn), + config: config.clone(), + closed: AtomicBool::new(false), + group_created: Mutex::new(std::collections::HashSet::new()), + }) + } + + /// Resolve the stream name: use `key` if non-empty, else fall back to config. + fn resolve_stream<'a>(&'a self, key: &'a str) -> Result<&'a str, TransportError> { + if !key.is_empty() { + return Ok(key); + } + self.config.stream.as_deref().ok_or_else(|| { + TransportError::Config( + "no stream name: key is empty and config.stream is not set".into(), + ) + }) + } + + /// Ensure the consumer group exists for the given stream. + /// + /// Uses `XGROUP CREATE ... MKSTREAM` so the stream is created if absent. + /// Idempotent: tracks which streams have been initialised and only + /// issues the command once per stream per transport instance. + async fn ensure_group(&self, stream: &str) -> TransportResult<()> { + { + let created = self.group_created.lock().await; + if created.contains(stream) { + return Ok(()); + } + } + + let mut conn = self.conn.lock().await; + let result: redis::RedisResult<()> = conn + .xgroup_create_mkstream(stream, &self.config.group, "0") + .await; + + match result { + Ok(()) => {} + Err(e) => { + // "BUSYGROUP Consumer Group name already exists" is not an error + let msg = e.to_string(); + if !msg.contains("BUSYGROUP") { + return Err(TransportError::Connection(format!( + "failed to create consumer group '{}' on stream '{stream}': {e}", + self.config.group + ))); + } + } + } + + self.group_created.lock().await.insert(stream.to_string()); + Ok(()) + } +} + +impl TransportBase for RedisTransport { + async fn close(&self) -> TransportResult<()> { + self.closed.store(true, Ordering::Relaxed); + Ok(()) + } + + fn is_healthy(&self) -> bool { + !self.closed.load(Ordering::Relaxed) + } + + fn name(&self) -> &'static str { + "redis" + } +} + +impl TransportSender for RedisTransport { + async fn send(&self, key: &str, payload: &[u8]) -> SendResult { + if self.closed.load(Ordering::Relaxed) { + return SendResult::Fatal(TransportError::Closed); + } + + let stream = match self.resolve_stream(key) { + Ok(s) => s.to_string(), + Err(e) => return SendResult::Fatal(e), + }; + + let mut conn = self.conn.lock().await; + + let result: redis::RedisResult = if let Some(max_len) = self.config.max_stream_len { + conn.xadd_maxlen( + &stream, + StreamMaxlen::Approx(max_len), + "*", + &[("payload", payload)], + ) + .await + } else { + conn.xadd(&stream, "*", &[("payload", payload)]).await + }; + + match result { + Ok(_entry_id) => { + #[cfg(feature = "metrics")] + metrics::counter!("dfe_transport_sent_total", "transport" => "redis").increment(1); + + SendResult::Ok + } + Err(e) => SendResult::Fatal(TransportError::Send(format!( + "XADD to stream '{stream}' failed: {e}" + ))), + } + } +} + +impl TransportReceiver for RedisTransport { + type Token = RedisToken; + + async fn recv(&self, max: usize) -> TransportResult>> { + if self.closed.load(Ordering::Relaxed) { + return Err(TransportError::Closed); + } + + let stream_name = self + .config + .stream + .as_deref() + .ok_or_else(|| TransportError::Config("config.stream must be set for recv()".into()))? + .to_string(); + + self.ensure_group(&stream_name).await?; + + let opts = StreamReadOptions::default() + .group(&self.config.group, &self.config.consumer) + .count(max) + .block(self.config.block_ms); + + let mut conn = self.conn.lock().await; + + // ">" means only new (undelivered) messages + let reply: StreamReadReply = conn + .xread_options(&[&stream_name], &[">"], &opts) + .await + .map_err(|e| { + TransportError::Recv(format!("XREADGROUP on stream '{stream_name}' failed: {e}")) + })?; + + let stream_arc: Arc = Arc::from(stream_name.as_str()); + let mut messages = Vec::new(); + + for stream_key in &reply.keys { + for stream_id in &stream_key.ids { + // Extract the "payload" field from the entry + let payload_bytes: Option> = stream_id + .map + .get("payload") + .and_then(|v| redis::from_redis_value(v.clone()).ok()); + + let payload = payload_bytes.unwrap_or_default(); + let format = PayloadFormat::detect(&payload); + let timestamp_ms = parse_entry_timestamp(&stream_id.id); + + messages.push(Message { + key: Some(Arc::clone(&stream_arc)), + payload, + token: RedisToken { + stream: Arc::clone(&stream_arc), + entry_id: stream_id.id.clone(), + }, + timestamp_ms, + format, + }); + } + } + + #[cfg(feature = "metrics")] + if !messages.is_empty() { + metrics::counter!("dfe_transport_sent_total", "transport" => "redis") + .increment(messages.len() as u64); + } + + Ok(messages) + } + + async fn commit(&self, tokens: &[Self::Token]) -> TransportResult<()> { + if tokens.is_empty() { + return Ok(()); + } + + // Group tokens by stream name for batch XACK + let mut by_stream: std::collections::HashMap<&str, Vec<&str>> = + std::collections::HashMap::new(); + for token in tokens { + by_stream + .entry(&token.stream) + .or_default() + .push(&token.entry_id); + } + + let mut conn = self.conn.lock().await; + + for (stream, ids) in &by_stream { + let id_refs: Vec<&str> = ids.clone(); + let _acked: i32 = conn + .xack(*stream, &self.config.group, &id_refs) + .await + .map_err(|e| { + TransportError::Commit(format!("XACK on stream '{stream}' failed: {e}")) + })?; + } + + Ok(()) + } +} + +/// Parse millisecond timestamp from a Redis stream entry ID. +/// +/// Entry IDs have the format `-`. +fn parse_entry_timestamp(entry_id: &str) -> Option { + entry_id + .split_once('-') + .and_then(|(ms_str, _)| ms_str.parse::().ok()) +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn token_display() { + let token = RedisToken { + stream: Arc::from("my_stream"), + entry_id: "1711432800000-0".into(), + }; + assert_eq!(format!("{token}"), "redis:my_stream:1711432800000-0"); + } + + #[test] + fn token_clone() { + let token = RedisToken { + stream: Arc::from("s1"), + entry_id: "100-0".into(), + }; + let cloned = token.clone(); + assert_eq!(token, cloned); + } + + #[test] + fn config_defaults() { + let config = RedisTransportConfig::default(); + assert_eq!(config.url, "redis://127.0.0.1:6379"); + assert!(config.stream.is_none()); + assert_eq!(config.group, "dfe"); + assert!(config.max_stream_len.is_none()); + assert_eq!(config.block_ms, 5000); + } + + #[test] + fn config_deserialise_minimal() { + let yaml = r" +url: redis://myhost:6380 +stream: events +"; + let config: RedisTransportConfig = serde_yaml_ng::from_str(yaml).unwrap(); + assert_eq!(config.url, "redis://myhost:6380"); + assert_eq!(config.stream.as_deref(), Some("events")); + // Defaults should be applied + assert_eq!(config.group, "dfe"); + assert_eq!(config.block_ms, 5000); + } + + #[test] + fn config_deserialise_full() { + let yaml = r" +url: rediss://secure.redis.io:6380 +stream: audit_log +group: my_group +consumer: worker-3 +max_stream_len: 100000 +block_ms: 2000 +"; + let config: RedisTransportConfig = serde_yaml_ng::from_str(yaml).unwrap(); + assert_eq!(config.url, "rediss://secure.redis.io:6380"); + assert_eq!(config.stream.as_deref(), Some("audit_log")); + assert_eq!(config.group, "my_group"); + assert_eq!(config.consumer, "worker-3"); + assert_eq!(config.max_stream_len, Some(100_000)); + assert_eq!(config.block_ms, 2000); + } + + #[test] + fn parse_entry_timestamp_valid() { + assert_eq!( + parse_entry_timestamp("1711432800000-0"), + Some(1_711_432_800_000) + ); + assert_eq!(parse_entry_timestamp("0-0"), Some(0)); + } + + #[test] + fn parse_entry_timestamp_invalid() { + assert_eq!(parse_entry_timestamp("not-a-number"), None); + assert_eq!(parse_entry_timestamp(""), None); + } + + #[test] + fn resolve_stream_uses_key_when_non_empty() { + let config = RedisTransportConfig { + stream: Some("default_stream".into()), + ..Default::default() + }; + // Cannot call resolve_stream without a transport instance, so test + // the logic inline: non-empty key takes precedence. + let key = "override_stream"; + let resolved = if key.is_empty() { + config.stream.as_deref().unwrap_or("") + } else { + key + }; + assert_eq!(resolved, "override_stream"); + } + + #[test] + fn resolve_stream_falls_back_to_config() { + let config = RedisTransportConfig { + stream: Some("default_stream".into()), + ..Default::default() + }; + let key = ""; + let resolved = if key.is_empty() { + config.stream.as_deref().unwrap_or("") + } else { + key + }; + assert_eq!(resolved, "default_stream"); + } + + // Integration test: requires a running Redis instance. + // Run with: REDIS_URL=redis://localhost:6379 cargo nextest run redis_integration + #[tokio::test] + async fn redis_integration_xadd_xreadgroup_xack() { + let Ok(url) = std::env::var("REDIS_URL") else { + eprintln!("Skipping: REDIS_URL not set"); + return; + }; + + let stream = format!("test_stream_{}", chrono::Utc::now().timestamp_millis()); + let group = "test_group"; + let consumer = "test_consumer"; + + let config = RedisTransportConfig { + url: url.clone(), + stream: Some(stream.clone()), + group: group.into(), + consumer: consumer.into(), + max_stream_len: Some(1000), + block_ms: 1000, + }; + + let transport = RedisTransport::new(&config).await.unwrap(); + + // Send two messages + let r1 = transport.send("", b"{\"n\":1}").await; + assert!(r1.is_ok(), "first send should succeed"); + + let r2 = transport.send("", b"{\"n\":2}").await; + assert!(r2.is_ok(), "second send should succeed"); + + // Receive messages + let messages = transport.recv(10).await.unwrap(); + assert_eq!(messages.len(), 2, "should receive 2 messages"); + assert_eq!(messages[0].payload, b"{\"n\":1}"); + assert_eq!(messages[1].payload, b"{\"n\":2}"); + + // Commit (XACK) + let tokens: Vec<_> = messages.iter().map(|m| m.token.clone()).collect(); + transport.commit(&tokens).await.unwrap(); + + // After commit, no new messages should be available + let more = transport.recv(10).await.unwrap(); + assert!(more.is_empty(), "no more messages after commit"); + + // Clean up: delete the test stream + let mut conn = transport.conn.lock().await; + let _: redis::RedisResult<()> = + redis::cmd("DEL").arg(&stream).query_async(&mut *conn).await; + + transport.close().await.unwrap(); + assert!(!transport.is_healthy()); + } +} diff --git a/src/transport/routed.rs b/src/transport/routed.rs new file mode 100644 index 0000000..6b69846 --- /dev/null +++ b/src/transport/routed.rs @@ -0,0 +1,250 @@ +// Project: hyperi-rustlib +// File: src/transport/routed.rs +// Purpose: Per-key routing transport for data originators (receiver, fetcher) +// Language: Rust +// +// License: FSL-1.1-ALv2 +// Copyright: (c) 2026 HYPERI PTY LIMITED + +//! Per-key routing transport for data originators. +//! +//! Routes `send(key, payload)` to different transport backends based on the +//! key. Used by dfe-receiver and dfe-fetcher where data-based routing +//! determines the destination (topic, endpoint, stream). +//! +//! All other DFE stages (transforms, loader, archiver) use simple 1:1 +//! transports and do NOT need this. +//! +//! # Config +//! +//! ```yaml +//! transport: +//! output: +//! type: routed +//! default: kafka +//! routes: +//! events.land: +//! type: grpc +//! grpc: +//! endpoint: "http://loader-land:6000" +//! events.load: +//! type: kafka +//! audit.land: +//! type: grpc +//! grpc: +//! endpoint: "http://archiver:6000" +//! kafka: +//! brokers: ["kafka:9092"] +//! ``` +//! +//! # Usage +//! +//! ```rust,ignore +//! let sender = RoutedSender::from_config("transport.output").await?; +//! // Routes to different backends based on key +//! sender.send("events.land", payload).await; // → gRPC to loader-land +//! sender.send("events.load", payload).await; // → Kafka topic +//! sender.send("audit.land", payload).await; // → gRPC to archiver +//! sender.send("unknown", payload).await; // → default (Kafka) +//! ``` + +use std::collections::HashMap; + +use super::error::{TransportError, TransportResult}; +use super::factory::AnySender; +use super::traits::{TransportBase, TransportSender}; +use super::types::SendResult; + +/// A routing transport that dispatches `send()` to different backends +/// based on the key parameter. +/// +/// Used by dfe-receiver and dfe-fetcher (data originators) where +/// data-based routing determines the destination. +pub struct RoutedSender { + /// Per-key route overrides. + routes: HashMap, + /// Default sender for keys not in the routes map. + default: Option, + closed: std::sync::atomic::AtomicBool, +} + +impl RoutedSender { + /// Create a new routed sender with explicit routes and optional default. + pub fn new(routes: HashMap, default: Option) -> Self { + Self { + routes, + default, + closed: std::sync::atomic::AtomicBool::new(false), + } + } + + /// Create from a map of key → `TransportConfig` plus a default config. + /// + /// Each route gets its own `AnySender` created from the corresponding config. + pub async fn from_route_configs( + routes: HashMap, + default_config: Option, + ) -> TransportResult { + let mut senders = HashMap::with_capacity(routes.len()); + for (key, config) in routes { + let sender = AnySender::from_transport_config(&config).await?; + senders.insert(key, sender); + } + + let default = match default_config { + Some(cfg) => Some(AnySender::from_transport_config(&cfg).await?), + None => None, + }; + + Ok(Self::new(senders, default)) + } + + /// Get the list of configured route keys. + #[must_use] + pub fn route_keys(&self) -> Vec<&str> { + self.routes.keys().map(String::as_str).collect() + } + + /// Check if a specific route key is configured. + #[must_use] + pub fn has_route(&self, key: &str) -> bool { + self.routes.contains_key(key) + } + + /// Check if a default fallback sender is configured. + #[must_use] + pub fn has_default(&self) -> bool { + self.default.is_some() + } + + /// Resolve which sender handles a given key. + fn resolve(&self, key: &str) -> Option<&AnySender> { + self.routes.get(key).or(self.default.as_ref()) + } +} + +impl TransportBase for RoutedSender { + async fn close(&self) -> TransportResult<()> { + self.closed + .store(true, std::sync::atomic::Ordering::Relaxed); + // Close all route senders + for sender in self.routes.values() { + sender.close().await?; + } + if let Some(ref default) = self.default { + default.close().await?; + } + Ok(()) + } + + fn is_healthy(&self) -> bool { + if self.closed.load(std::sync::atomic::Ordering::Relaxed) { + return false; + } + // Healthy if all configured senders are healthy + let routes_healthy = self.routes.values().all(|s| s.is_healthy()); + let default_healthy = self.default.as_ref().is_none_or(|s| s.is_healthy()); + routes_healthy && default_healthy + } + + fn name(&self) -> &'static str { + "routed" + } +} + +impl TransportSender for RoutedSender { + async fn send(&self, key: &str, payload: &[u8]) -> SendResult { + if self.closed.load(std::sync::atomic::Ordering::Relaxed) { + return SendResult::Fatal(TransportError::Closed); + } + + let Some(sender) = self.resolve(key) else { + return SendResult::Fatal(TransportError::Config(format!( + "no route configured for key '{key}' and no default sender" + ))); + }; + + #[cfg(feature = "metrics")] + metrics::counter!( + "dfe_transport_sent_total", + "transport" => "routed", + "route" => key.to_string() + ) + .increment(1); + + sender.send(key, payload).await + } +} + +#[cfg(test)] +mod tests { + use super::*; + + #[cfg(feature = "transport-memory")] + use crate::transport::memory::{MemoryConfig, MemoryTransport}; + + #[cfg(feature = "transport-memory")] + fn make_memory_sender() -> AnySender { + AnySender::Memory(MemoryTransport::new(&MemoryConfig::default())) + } + + #[tokio::test] + #[cfg(feature = "transport-memory")] + async fn routes_to_correct_sender() { + let mut route_map = HashMap::new(); + route_map.insert("events.land".into(), make_memory_sender()); + route_map.insert("events.load".into(), make_memory_sender()); + + let sender = RoutedSender::new(route_map, Some(make_memory_sender())); + + let result_land = sender.send("events.land", b"land-payload").await; + assert!(result_land.is_ok()); + + let result_load = sender.send("events.load", b"load-payload").await; + assert!(result_load.is_ok()); + + // Unknown key falls through to default + let result_default = sender.send("unknown.key", b"default-payload").await; + assert!(result_default.is_ok()); + + assert!(sender.is_healthy()); + assert_eq!(sender.name(), "routed"); + } + + #[tokio::test] + async fn no_route_no_default_returns_fatal() { + let sender = RoutedSender::new(HashMap::new(), None); + + let result = sender.send("unknown", b"payload").await; + assert!(result.is_fatal()); + } + + #[tokio::test] + #[cfg(feature = "transport-memory")] + async fn close_propagates_to_all_senders() { + let mut route_map = HashMap::new(); + route_map.insert("a".into(), make_memory_sender()); + let sender = RoutedSender::new(route_map, Some(make_memory_sender())); + + assert!(sender.is_healthy()); + sender.close().await.unwrap(); + assert!(!sender.is_healthy()); + } + + #[test] + fn route_keys_and_has_route() { + let sender = RoutedSender::new(HashMap::new(), None); + assert!(sender.route_keys().is_empty()); + assert!(!sender.has_route("anything")); + assert!(!sender.has_default()); + } + + #[tokio::test] + async fn send_after_close_returns_fatal() { + let sender = RoutedSender::new(HashMap::new(), None); + sender.close().await.unwrap(); + + let result = sender.send("key", b"payload").await; + assert!(result.is_fatal()); + } +} diff --git a/src/transport/traits.rs b/src/transport/traits.rs index 779b8e3..6cf0105 100644 --- a/src/transport/traits.rs +++ b/src/transport/traits.rs @@ -1,6 +1,6 @@ // Project: hyperi-rustlib // File: src/transport/traits.rs -// Purpose: Transport trait definitions +// Purpose: Transport trait definitions (sender + receiver split) // Language: Rust // // License: FSL-1.1-ALv2 @@ -15,8 +15,6 @@ use std::future::Future; /// /// Each transport implementation provides its own token type that /// captures the information needed to acknowledge message processing. -/// -/// Implementors must be `Clone`, `Send`, `Sync`, and `Debug`. pub trait CommitToken: Clone + Send + Sync + Debug + Display + 'static { /// Get a string representation for logging/debugging. fn as_str(&self) -> String { @@ -24,29 +22,50 @@ pub trait CommitToken: Clone + Send + Sync + Debug + Display + 'static { } } -/// Transport-agnostic message delivery. +/// Common transport operations shared by senders and receivers. /// -/// All implementations deliver raw bytes (JSON or MsgPack) without -/// any envelope or framing. Transport metadata is captured in tokens. +/// Every transport implementation provides these lifecycle and +/// introspection methods regardless of direction. +pub trait TransportBase: Send + Sync { + /// Shutdown the transport gracefully. + fn close(&self) -> impl Future> + Send; + + /// Check if the transport is healthy and connected. + fn is_healthy(&self) -> bool; + + /// Get transport name for logging/metrics. + fn name(&self) -> &'static str; +} + +/// Send-side transport. /// -/// Async methods return `impl Future + Send` to ensure compatibility with -/// `tokio::spawn` in downstream consumers. +/// Extends `TransportBase` with send capability. The factory returns +/// `AnySender` (enum dispatch) for runtime transport selection. /// -/// # Type Parameter +/// All implementations auto-emit `dfe_transport_*` Prometheus metrics +/// when a `MetricsManager` recorder is installed. +pub trait TransportSender: TransportBase { + /// Send raw bytes to a key/destination. + /// + /// The `key` semantics depend on the transport: + /// - Kafka: topic name + /// - gRPC: metadata routing key + /// - HTTP: URL path suffix or ignored + /// - File: filename suffix or ignored + /// - Redis: stream name + /// - Pipe: ignored (single stdout) + fn send(&self, key: &str, payload: &[u8]) -> impl Future + Send; +} + +/// Receive-side transport — generic over commit token type. /// -/// The `Token` associated type allows each transport to have its own -/// commit token type (e.g., `KafkaToken`, `ZenohToken`, `MemoryToken`). -pub trait Transport: Send + Sync { +/// Extends `TransportBase` with receive and commit capability. +/// Input stages (receiver, fetcher) use concrete implementations +/// directly for type-safe token handling. +pub trait TransportReceiver: TransportBase { /// The token type for this transport. type Token: CommitToken; - /// Send raw bytes to a key/topic. - /// - /// Returns `SendResult::Ok` on success, `SendResult::Backpressured` - /// if the transport cannot accept more messages, or `SendResult::Fatal` - /// on unrecoverable errors. - fn send(&self, key: &str, payload: &[u8]) -> impl Future + Send; - /// Receive up to `max` messages. /// /// Returns immediately with available messages (may be fewer than `max`). @@ -58,19 +77,20 @@ pub trait Transport: Send + Sync { /// Commit/acknowledge processed messages. /// - /// For Kafka: commits consumer offsets. - /// For Zenoh: no-op (no persistence). - /// For Memory: advances internal sequence. + /// - Kafka: commits consumer offsets + /// - gRPC: no-op (no persistence) + /// - Redis: XACK + /// - File: advances read position + /// - Memory: advances internal sequence fn commit(&self, tokens: &[Self::Token]) -> impl Future> + Send; +} - /// Shutdown gracefully. - /// - /// Flushes pending messages and closes connections. - fn close(&self) -> impl Future> + Send; - - /// Check if the transport is healthy and connected. - fn is_healthy(&self) -> bool; +/// Combined transport — implements both send and receive. +/// +/// Convenience trait for transports that support bidirectional communication. +/// Most concrete implementations (Kafka, gRPC, Memory, Redis, File, Pipe) +/// implement this. Automatically implemented via blanket impl. +pub trait Transport: TransportSender + TransportReceiver {} - /// Get transport name for logging/metrics. - fn name(&self) -> &'static str; -} +/// Blanket impl: anything that implements both traits is a Transport. +impl Transport for T {} diff --git a/src/transport/types.rs b/src/transport/types.rs index 6d18164..0de11b2 100644 --- a/src/transport/types.rs +++ b/src/transport/types.rs @@ -22,6 +22,14 @@ pub enum TransportType { Grpc, /// In-memory tokio channels (unit tests). Memory, + /// NDJSON file (debugging, audit trails, replay). + File, + /// Unix pipe (stdin/stdout, sidecar pattern). + Pipe, + /// HTTP/HTTPS (webhook delivery, REST ingest). + Http, + /// Redis/Valkey Streams (lightweight pub/sub). + Redis, } impl std::fmt::Display for TransportType { @@ -30,6 +38,10 @@ impl std::fmt::Display for TransportType { Self::Kafka => write!(f, "kafka"), Self::Grpc => write!(f, "grpc"), Self::Memory => write!(f, "memory"), + Self::File => write!(f, "file"), + Self::Pipe => write!(f, "pipe"), + Self::Http => write!(f, "http"), + Self::Redis => write!(f, "redis"), } } } @@ -152,9 +164,13 @@ impl SendResult { } /// Top-level transport configuration. +/// +/// Used by the transport factory to create the right backend from config. +/// Each transport type has its own optional config section — only the one +/// matching `transport_type` is read. #[derive(Debug, Clone, Default, Serialize, Deserialize)] pub struct TransportConfig { - /// Transport type (kafka, grpc, memory). + /// Transport type (kafka, grpc, memory, file, pipe, http, redis). #[serde(rename = "type", default)] pub transport_type: TransportType, @@ -177,6 +193,21 @@ pub struct TransportConfig { #[serde(default)] pub memory: Option, + /// Pipe transport configuration (stdin/stdout). + #[cfg(feature = "transport-pipe")] + #[serde(default)] + pub pipe: Option, + + /// File transport configuration (NDJSON file I/O). + #[cfg(feature = "transport-file")] + #[serde(default)] + pub file: Option, + + /// HTTP transport configuration (webhook delivery, REST ingest). + #[cfg(feature = "transport-http")] + #[serde(default)] + pub http: Option, + // Placeholder fields when features are disabled #[cfg(not(feature = "transport-kafka"))] #[serde(default, skip)] @@ -189,6 +220,27 @@ pub struct TransportConfig { #[cfg(not(feature = "transport-memory"))] #[serde(default, skip)] pub memory: Option<()>, + + #[cfg(not(feature = "transport-pipe"))] + #[serde(default, skip)] + pub pipe: Option<()>, + + #[cfg(not(feature = "transport-file"))] + #[serde(default, skip)] + pub file: Option<()>, + + #[cfg(not(feature = "transport-http"))] + #[serde(default, skip)] + pub http: Option<()>, + + /// Redis/Valkey Streams transport configuration. + #[cfg(feature = "transport-redis")] + #[serde(default)] + pub redis: Option, + + #[cfg(not(feature = "transport-redis"))] + #[serde(default, skip)] + pub redis: Option<()>, } #[cfg(test)] diff --git a/tests/e2e/grpc_transport.rs b/tests/e2e/grpc_transport.rs index dbdfefd..a55b5c5 100644 --- a/tests/e2e/grpc_transport.rs +++ b/tests/e2e/grpc_transport.rs @@ -17,7 +17,7 @@ use std::time::Duration; use hyperi_rustlib::transport::grpc::{GrpcConfig, GrpcTransport}; -use hyperi_rustlib::transport::{SendResult, Transport}; +use hyperi_rustlib::transport::{SendResult, TransportBase, TransportReceiver, TransportSender}; /// Find an available port for testing. async fn find_available_port() -> u16 { diff --git a/tests/e2e/kafka.rs b/tests/e2e/kafka.rs index 6623796..6104bec 100644 --- a/tests/e2e/kafka.rs +++ b/tests/e2e/kafka.rs @@ -628,7 +628,7 @@ async fn test_kafka_admin_describe_topic() { #[tokio::test] #[ignore = "requires Kafka broker - set TEST_KAFKA_BROKERS to run"] async fn test_kafka_send_receive_batch() { - use hyperi_rustlib::transport::{Transport, kafka::KafkaTransport}; + use hyperi_rustlib::transport::{TransportReceiver, TransportSender, kafka::KafkaTransport}; let Some(mut config) = get_test_config() else { eprintln!("Skipping: TEST_KAFKA_BROKERS not set"); diff --git a/tests/e2e/vector_compat.rs b/tests/e2e/vector_compat.rs index 9bc48b5..d979b59 100644 --- a/tests/e2e/vector_compat.rs +++ b/tests/e2e/vector_compat.rs @@ -31,7 +31,7 @@ use std::time::Duration; use hyperi_rustlib::transport::VectorCompatClient; use hyperi_rustlib::transport::grpc::{GrpcConfig, GrpcTransport}; -use hyperi_rustlib::transport::{SendResult, Transport}; +use hyperi_rustlib::transport::{SendResult, TransportBase, TransportReceiver, TransportSender}; /// Resolve the path to the Vector binary (cached via fetch-vector.sh or system PATH). /// From 67ca272a1219fc9ab29f236e1afd9cc26b00bed0 Mon Sep 17 00:00:00 2001 From: semantic-release-bot Date: Thu, 26 Mar 2026 05:24:37 +0000 Subject: [PATCH 3/7] chore: version 1.17.0-dev.15 [skip ci] # [1.17.0-dev.15](https://github.com/hyperi-io/hyperi-rustlib/compare/v1.17.0-dev.14...v1.17.0-dev.15) (2026-03-26) * feat!: split Transport trait and add 4 new transports + factory ([36c383d](https://github.com/hyperi-io/hyperi-rustlib/commit/36c383d8bcdf96a120bf238cc45e629be984aa47)) ### BREAKING CHANGES * Transport trait split into TransportBase (close, is_healthy, name), TransportSender (send), and TransportReceiver (recv, commit, Token). Blanket Transport impl for types with both. New transport backends: - File: NDJSON with position tracking and commit persistence - Pipe: stdin/stdout for Unix pipeline composition - HTTP: POST to endpoint (send) + embedded axum server (receive) - Redis/Valkey Streams: XADD/XREADGROUP/XACK with consumer groups Transport factory: - AnySender: enum dispatch for runtime transport selection - AnySender::from_config(): create sender from config cascade - RoutedSender: per-key dispatch for data originators (receiver/fetcher) All transports auto-emit dfe_transport_* Prometheus metrics. 648 tests pass. --- CHANGELOG.md | 26 ++++++++++++++++++++++++++ Cargo.toml | 2 +- VERSION | 2 +- 3 files changed, 28 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 28ea655..5be2675 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,29 @@ +# [1.17.0-dev.15](https://github.com/hyperi-io/hyperi-rustlib/compare/v1.17.0-dev.14...v1.17.0-dev.15) (2026-03-26) + + +* feat!: split Transport trait and add 4 new transports + factory ([36c383d](https://github.com/hyperi-io/hyperi-rustlib/commit/36c383d8bcdf96a120bf238cc45e629be984aa47)) + + +### BREAKING CHANGES + +* Transport trait split into TransportBase (close, +is_healthy, name), TransportSender (send), and TransportReceiver +(recv, commit, Token). Blanket Transport impl for types with both. + +New transport backends: +- File: NDJSON with position tracking and commit persistence +- Pipe: stdin/stdout for Unix pipeline composition +- HTTP: POST to endpoint (send) + embedded axum server (receive) +- Redis/Valkey Streams: XADD/XREADGROUP/XACK with consumer groups + +Transport factory: +- AnySender: enum dispatch for runtime transport selection +- AnySender::from_config(): create sender from config cascade +- RoutedSender: per-key dispatch for data originators (receiver/fetcher) + +All transports auto-emit dfe_transport_* Prometheus metrics. +648 tests pass. + # [1.17.0-dev.14](https://github.com/hyperi-io/hyperi-rustlib/compare/v1.17.0-dev.13...v1.17.0-dev.14) (2026-03-26) diff --git a/Cargo.toml b/Cargo.toml index 3e07589..8542567 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -8,7 +8,7 @@ [package] name = "hyperi-rustlib" -version = "1.17.0-dev.14" +version = "1.17.0-dev.15" edition = "2024" rust-version = "1.94" description = "Shared utility library for HyperI Rust applications" diff --git a/VERSION b/VERSION index d3794e7..6e4cd31 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -1.17.0-dev.14 +1.17.0-dev.15 From 21efaa2a4d61901dfe479d655ef9af1a5c437580 Mon Sep 17 00:00:00 2001 From: Derek Date: Thu, 26 Mar 2026 16:52:19 +1100 Subject: [PATCH 4/7] fix: add health registry, shutdown manager, and wire all modules HealthRegistry: global singleton with HealthStatus (Healthy/Degraded/ Unhealthy). Modules register health check closures at construction. is_healthy(), is_ready(), to_json() for aggregated status. Shutdown: global CancellationToken via OnceLock. install_signal_handler() listens for SIGTERM/SIGINT. Modules listen on token.cancelled(). Wiring: - KafkaTransport, GrpcTransport: register with health, Arc - CircuitBreaker: register with health, maps Open->Unhealthy - HttpServer: register with health, use global shutdown token - TieredSink drainer: listen on global shutdown token - ConfigReloader: listen on global shutdown token Logging + config cascade added to all new transports (file, pipe, HTTP, Redis). 663 tests pass. --- Cargo.toml | 9 +- src/config/reloader.rs | 52 ++++- src/health/mod.rs | 29 +++ src/health/registry.rs | 346 +++++++++++++++++++++++++++++++ src/http_server/server.rs | 31 ++- src/lib.rs | 9 + src/shutdown.rs | 189 +++++++++++++++++ src/tiered_sink/circuit.rs | 35 +++- src/tiered_sink/drainer.rs | 17 +- src/transport/file.rs | 39 ++++ src/transport/grpc/mod.rs | 21 +- src/transport/http.rs | 38 ++++ src/transport/kafka/mod.rs | 21 +- src/transport/pipe.rs | 36 ++++ src/transport/redis_transport.rs | 55 ++++- 15 files changed, 903 insertions(+), 24 deletions(-) create mode 100644 src/health/mod.rs create mode 100644 src/health/registry.rs create mode 100644 src/shutdown.rs diff --git a/Cargo.toml b/Cargo.toml index 8542567..d660237 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -20,9 +20,11 @@ categories = ["development-tools"] exclude = [".claude/", ".github/", "ci/", "ai/", "docs/", "examples/", "benches/", "scripts/"] [features] -default = ["config", "logger", "metrics", "runtime"] +default = ["config", "logger", "metrics", "runtime", "shutdown", "health"] # Core features +health = [] +shutdown = ["tokio", "tokio-util"] runtime = ["dirs"] config = ["figment", "dotenvy", "serde_yaml_ng", "serde_json", "toml", "dirs", "tracing"] logger = ["tracing", "tracing-subscriber", "owo-colors", "serde_json", "tracing-throttle"] @@ -104,7 +106,7 @@ secrets-aws = ["secrets", "aws-config", "aws-sdk-secretsmanager"] secrets-all = ["secrets-vault", "secrets-aws"] # Full feature set -full = ["config", "config-reload", "logger", "metrics", "metrics-dfe", "otel", "otel-metrics", "runtime", "http", "http-server", "spool", "tiered-sink", "resilience", "database", "cache", "transport-all", "transport-grpc-vector-compat", "secrets-all", "directory-config", "directory-config-git", "deployment", "version-check", "scaling", "memory", "cli", "io", "dlq", "dlq-kafka", "output-file", "expression"] +full = ["config", "config-reload", "logger", "metrics", "metrics-dfe", "otel", "otel-metrics", "runtime", "shutdown", "health", "http", "http-server", "spool", "tiered-sink", "resilience", "database", "cache", "transport-all", "transport-grpc-vector-compat", "secrets-all", "directory-config", "directory-config-git", "deployment", "version-check", "scaling", "memory", "cli", "io", "dlq", "dlq-kafka", "output-file", "expression"] [dependencies] # Serialisation (always needed) @@ -158,8 +160,9 @@ metrics-util = { version = ">=0.20.1, <0.21", optional = true } metrics-exporter-opentelemetry = { version = ">=0.2.1, <0.3", optional = true } sysinfo = { version = ">=0.38.0, <0.39", optional = true } -# Async runtime (for metrics server, http-server) +# Async runtime (for metrics server, http-server, shutdown) tokio = { version = ">=1.50.0, <2", features = ["rt-multi-thread", "net", "sync", "time", "macros", "signal", "fs", "io-std", "io-util"], optional = true } +tokio-util = { version = ">=0.7.14, <0.8", optional = true } # HTTP client — pinned to reqwest 0.12 until vaultrs and opentelemetry-otlp # support 0.13. reqwest-middleware 0.4 and reqwest-retry 0.7 target 0.12. diff --git a/src/config/reloader.rs b/src/config/reloader.rs index a553f77..cca4d6b 100644 --- a/src/config/reloader.rs +++ b/src/config/reloader.rs @@ -277,6 +277,9 @@ impl ConfigReloader { /// Main reload loop — waits for any trigger, then attempts reload. async fn run_loop(self) { + #[cfg(feature = "shutdown")] + let shutdown_token = crate::shutdown::token(); + // File polling state let mut last_modified: Option = self.config.config_path.as_ref().and_then(|p| file_mtime(p)); @@ -308,15 +311,46 @@ impl ConfigReloader { }; loop { - let trigger = self - .wait_for_trigger( - &mut poll_timer, - &mut periodic_timer, - #[cfg(unix)] - &mut sighup, - &mut last_modified, - ) - .await; + // Check for global shutdown before waiting for next trigger + #[cfg(feature = "shutdown")] + if shutdown_token.is_cancelled() { + info!("Config reloader stopping (shutdown)"); + return; + } + + let trigger_result = { + #[cfg(feature = "shutdown")] + { + tokio::select! { + trigger = self.wait_for_trigger( + &mut poll_timer, + &mut periodic_timer, + #[cfg(unix)] + &mut sighup, + &mut last_modified, + ) => Some(trigger), + () = shutdown_token.cancelled() => None, + } + } + #[cfg(not(feature = "shutdown"))] + { + Some( + self.wait_for_trigger( + &mut poll_timer, + &mut periodic_timer, + #[cfg(unix)] + &mut sighup, + &mut last_modified, + ) + .await, + ) + } + }; + + let Some(trigger) = trigger_result else { + info!("Config reloader stopping (shutdown)"); + return; + }; // Debounce check if last_reload.elapsed() < self.config.debounce { diff --git a/src/health/mod.rs b/src/health/mod.rs new file mode 100644 index 0000000..0c80228 --- /dev/null +++ b/src/health/mod.rs @@ -0,0 +1,29 @@ +// Project: hyperi-rustlib +// File: src/health/mod.rs +// Purpose: Unified health registry for service health state +// Language: Rust +// +// License: FSL-1.1-ALv2 +// Copyright: (c) 2026 HYPERI PTY LIMITED + +//! Unified health registry for service readiness and liveness. +//! +//! Provides a global singleton [`HealthRegistry`] that modules register +//! into at construction. The `/readyz` endpoint (or any health check) +//! queries the registry to determine overall service health. +//! +//! # Usage +//! +//! ```rust +//! use hyperi_rustlib::health::{HealthRegistry, HealthStatus}; +//! +//! // Register a component health check at construction +//! HealthRegistry::register("kafka_consumer", || HealthStatus::Healthy); +//! +//! // Query overall health +//! assert!(HealthRegistry::is_ready()); +//! ``` + +pub mod registry; + +pub use registry::{HealthRegistry, HealthStatus}; diff --git a/src/health/registry.rs b/src/health/registry.rs new file mode 100644 index 0000000..7728361 --- /dev/null +++ b/src/health/registry.rs @@ -0,0 +1,346 @@ +// Project: hyperi-rustlib +// File: src/health/registry.rs +// Purpose: Global health registry singleton for component health tracking +// Language: Rust +// +// License: FSL-1.1-ALv2 +// Copyright: (c) 2026 HYPERI PTY LIMITED + +//! Global health registry for unified service health state. +//! +//! Modules register health check callbacks at construction. The registry +//! aggregates component status to determine overall service health. +//! +//! # Design +//! +//! - Global singleton via `OnceLock` (consistent with config registry pattern) +//! - Components register a closure that returns their current [`HealthStatus`] +//! - [`is_healthy`](HealthRegistry::is_healthy) requires ALL components healthy +//! - [`is_ready`](HealthRegistry::is_ready) requires NO components unhealthy +//! (degraded is acceptable for readiness) +//! - Empty registry is considered healthy (vacuously true) + +use std::sync::{Arc, Mutex, OnceLock}; + +/// Health status of a registered component. +#[derive(Debug, Clone, Copy, PartialEq, Eq)] +pub enum HealthStatus { + /// Component is fully operational. + Healthy, + /// Component is operational but impaired (e.g., circuit half-open, + /// elevated latency, fallback active). + Degraded, + /// Component is not operational. Service should not receive traffic. + Unhealthy, +} + +impl HealthStatus { + /// String representation for JSON serialisation and endpoint output. + #[must_use] + pub fn as_str(self) -> &'static str { + match self { + Self::Healthy => "healthy", + Self::Degraded => "degraded", + Self::Unhealthy => "unhealthy", + } + } +} + +/// Health check callback — returns current component status. +type HealthCheck = Arc HealthStatus + Send + Sync>; + +/// A registered health check entry. +struct HealthEntry { + name: String, + check: HealthCheck, +} + +/// Global health registry singleton. +/// +/// Modules register health check callbacks at construction. The registry +/// aggregates all component statuses to determine overall service health. +/// +/// # Thread Safety +/// +/// The registry uses `Mutex>` for registration (infrequent, at +/// init time) and read access (health checks). For the typical DFE app +/// with 3-8 registered components, lock contention is negligible. +pub struct HealthRegistry { + components: Mutex>, +} + +/// Global singleton instance. +static REGISTRY: OnceLock = OnceLock::new(); + +impl HealthRegistry { + /// Create a new empty registry. + fn new() -> Self { + Self { + components: Mutex::new(Vec::new()), + } + } + + /// Get or initialise the global registry. + fn global() -> &'static Self { + REGISTRY.get_or_init(Self::new) + } + + /// Register a health check callback. + /// + /// Called by modules at construction time. The callback is invoked + /// each time health is queried, so it should be cheap (e.g., read + /// an `AtomicBool` or check a cached state). + /// + /// # Duplicate Names + /// + /// Multiple components may register with the same name. Each + /// registration is independent — the registry does not deduplicate. + pub fn register( + name: impl Into, + check: impl Fn() -> HealthStatus + Send + Sync + 'static, + ) { + let registry = Self::global(); + if let Ok(mut components) = registry.components.lock() { + components.push(HealthEntry { + name: name.into(), + check: Arc::new(check), + }); + } + } + + /// Check if ALL components are healthy. + /// + /// Returns `true` if the registry is empty (vacuously true) or + /// every registered component reports [`HealthStatus::Healthy`]. + #[must_use] + pub fn is_healthy() -> bool { + let registry = Self::global(); + let Ok(components) = registry.components.lock() else { + return false; + }; + components + .iter() + .all(|c| (c.check)() == HealthStatus::Healthy) + } + + /// Check if the service is ready to receive traffic. + /// + /// Ready means no component is [`HealthStatus::Unhealthy`]. Degraded + /// components are acceptable — the service can still serve requests, + /// just with reduced capability. + /// + /// Returns `true` if the registry is empty (vacuously true). + #[must_use] + pub fn is_ready() -> bool { + let registry = Self::global(); + let Ok(components) = registry.components.lock() else { + return false; + }; + components + .iter() + .all(|c| (c.check)() != HealthStatus::Unhealthy) + } + + /// Get per-component health status. + /// + /// Returns a snapshot of all registered components and their current + /// status. Useful for detailed health endpoints. + #[must_use] + pub fn components() -> Vec<(String, HealthStatus)> { + let registry = Self::global(); + let Ok(components) = registry.components.lock() else { + return Vec::new(); + }; + components + .iter() + .map(|c| (c.name.clone(), (c.check)())) + .collect() + } + + /// Get a JSON representation of the health state. + /// + /// Suitable for a `/health/detailed` endpoint response. + #[cfg(feature = "serde_json")] + #[must_use] + pub fn to_json() -> serde_json::Value { + let components = Self::components(); + let overall = if Self::is_healthy() { + "healthy" + } else if Self::is_ready() { + "degraded" + } else { + "unhealthy" + }; + + serde_json::json!({ + "status": overall, + "components": components.iter().map(|(name, status)| { + serde_json::json!({ + "name": name, + "status": status.as_str(), + }) + }).collect::>() + }) + } + + /// Clear all registered components (for testing only). + #[cfg(test)] + pub(crate) fn reset() { + let registry = Self::global(); + if let Ok(mut components) = registry.components.lock() { + components.clear(); + } + } +} + +#[cfg(test)] +mod tests { + use std::sync::atomic::{AtomicU8, Ordering}; + + use super::*; + + /// Tests share global statics — serialise them. + static TEST_LOCK: Mutex<()> = Mutex::new(()); + + macro_rules! serial_test { + () => { + let _guard = TEST_LOCK.lock().unwrap(); + HealthRegistry::reset(); + }; + } + + #[test] + fn empty_registry_is_healthy() { + serial_test!(); + + assert!(HealthRegistry::is_healthy()); + assert!(HealthRegistry::is_ready()); + assert!(HealthRegistry::components().is_empty()); + } + + #[test] + fn register_and_check_healthy() { + serial_test!(); + + HealthRegistry::register("transport", || HealthStatus::Healthy); + HealthRegistry::register("database", || HealthStatus::Healthy); + + assert!(HealthRegistry::is_healthy()); + assert!(HealthRegistry::is_ready()); + + let components = HealthRegistry::components(); + assert_eq!(components.len(), 2); + assert_eq!(components[0].0, "transport"); + assert_eq!(components[0].1, HealthStatus::Healthy); + assert_eq!(components[1].0, "database"); + assert_eq!(components[1].1, HealthStatus::Healthy); + } + + #[test] + fn unhealthy_component_fails_check() { + serial_test!(); + + HealthRegistry::register("transport", || HealthStatus::Healthy); + HealthRegistry::register("database", || HealthStatus::Unhealthy); + + assert!(!HealthRegistry::is_healthy()); + assert!(!HealthRegistry::is_ready()); + } + + #[test] + fn degraded_is_ready_but_not_healthy() { + serial_test!(); + + HealthRegistry::register("transport", || HealthStatus::Healthy); + HealthRegistry::register("circuit_breaker", || HealthStatus::Degraded); + + assert!(!HealthRegistry::is_healthy()); + assert!(HealthRegistry::is_ready()); + } + + #[test] + fn dynamic_health_check_reflects_state_changes() { + serial_test!(); + + // Simulate a component whose health changes at runtime + let state = Arc::new(AtomicU8::new(0)); // 0=healthy, 1=degraded, 2=unhealthy + let state_clone = state.clone(); + + HealthRegistry::register("dynamic", move || { + match state_clone.load(Ordering::Relaxed) { + 0 => HealthStatus::Healthy, + 1 => HealthStatus::Degraded, + _ => HealthStatus::Unhealthy, + } + }); + + // Initially healthy + assert!(HealthRegistry::is_healthy()); + assert!(HealthRegistry::is_ready()); + + // Transition to degraded + state.store(1, Ordering::Relaxed); + assert!(!HealthRegistry::is_healthy()); + assert!(HealthRegistry::is_ready()); + + // Transition to unhealthy + state.store(2, Ordering::Relaxed); + assert!(!HealthRegistry::is_healthy()); + assert!(!HealthRegistry::is_ready()); + + // Recovery back to healthy + state.store(0, Ordering::Relaxed); + assert!(HealthRegistry::is_healthy()); + assert!(HealthRegistry::is_ready()); + } + + #[test] + fn health_status_as_str() { + assert_eq!(HealthStatus::Healthy.as_str(), "healthy"); + assert_eq!(HealthStatus::Degraded.as_str(), "degraded"); + assert_eq!(HealthStatus::Unhealthy.as_str(), "unhealthy"); + } + + #[test] + #[cfg(feature = "serde_json")] + fn to_json_includes_all_components() { + serial_test!(); + + HealthRegistry::register("kafka", || HealthStatus::Healthy); + HealthRegistry::register("clickhouse", || HealthStatus::Degraded); + + let json = HealthRegistry::to_json(); + + assert_eq!(json["status"], "degraded"); + + let components = json["components"].as_array().unwrap(); + assert_eq!(components.len(), 2); + + assert_eq!(components[0]["name"], "kafka"); + assert_eq!(components[0]["status"], "healthy"); + + assert_eq!(components[1]["name"], "clickhouse"); + assert_eq!(components[1]["status"], "degraded"); + } + + #[test] + #[cfg(feature = "serde_json")] + fn to_json_empty_registry() { + serial_test!(); + + let json = HealthRegistry::to_json(); + assert_eq!(json["status"], "healthy"); + assert!(json["components"].as_array().unwrap().is_empty()); + } + + #[test] + #[cfg(feature = "serde_json")] + fn to_json_unhealthy_status() { + serial_test!(); + + HealthRegistry::register("broken", || HealthStatus::Unhealthy); + + let json = HealthRegistry::to_json(); + assert_eq!(json["status"], "unhealthy"); + } +} diff --git a/src/http_server/server.rs b/src/http_server/server.rs index 429679b..fef7fed 100644 --- a/src/http_server/server.rs +++ b/src/http_server/server.rs @@ -14,6 +14,7 @@ use std::net::SocketAddr; use std::sync::Arc; use std::sync::atomic::{AtomicBool, Ordering}; use tokio::net::TcpListener; +#[cfg(not(feature = "shutdown"))] use tokio::signal; use tokio::sync::watch; @@ -29,10 +30,21 @@ impl HttpServer { /// Create a new HTTP server with the given configuration. #[must_use] pub fn new(config: HttpServerConfig) -> Self { - Self { - config, - ready: Arc::new(AtomicBool::new(true)), + let ready = Arc::new(AtomicBool::new(true)); + + #[cfg(feature = "health")] + { + let r = Arc::clone(&ready); + crate::health::HealthRegistry::register("http_server", move || { + if r.load(Ordering::Relaxed) { + crate::health::HealthStatus::Healthy + } else { + crate::health::HealthStatus::Unhealthy + } + }); } + + Self { config, ready } } /// Create a new HTTP server bound to the specified address. @@ -70,7 +82,15 @@ impl HttpServer { /// /// Returns an error if binding fails or the server encounters an error. pub async fn serve(self, app: Router) -> Result<()> { - self.serve_with_shutdown(app, shutdown_signal()).await + #[cfg(feature = "shutdown")] + { + let token = crate::shutdown::install_signal_handler(); + self.serve_with_shutdown(app, token.cancelled_owned()).await + } + #[cfg(not(feature = "shutdown"))] + { + self.serve_with_shutdown(app, shutdown_signal()).await + } } /// Serve with a custom shutdown signal. @@ -254,6 +274,9 @@ async fn config_dump() -> impl IntoResponse { } /// Wait for a shutdown signal (SIGTERM or SIGINT). +/// +/// Used as fallback when the `shutdown` feature is not enabled. +#[cfg(not(feature = "shutdown"))] async fn shutdown_signal() { let ctrl_c = async { signal::ctrl_c() diff --git a/src/lib.rs b/src/lib.rs index e8ff80b..f6aae33 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -70,6 +70,12 @@ pub mod kafka_config; #[cfg(feature = "runtime")] pub mod runtime; +#[cfg(feature = "shutdown")] +pub mod shutdown; + +#[cfg(feature = "health")] +pub mod health; + #[cfg(feature = "config")] pub mod config; @@ -146,6 +152,9 @@ pub use kafka_config::{ #[cfg(feature = "runtime")] pub use runtime::RuntimePaths; +#[cfg(feature = "health")] +pub use health::{HealthRegistry, HealthStatus}; + #[cfg(feature = "config")] pub use config::{Config, ConfigError, ConfigOptions}; diff --git a/src/shutdown.rs b/src/shutdown.rs new file mode 100644 index 0000000..713437b --- /dev/null +++ b/src/shutdown.rs @@ -0,0 +1,189 @@ +// Project: hyperi-rustlib +// File: src/shutdown.rs +// Purpose: Unified graceful shutdown with global CancellationToken +// Language: Rust +// +// License: FSL-1.1-ALv2 +// Copyright: (c) 2026 HYPERI PTY LIMITED + +//! Unified graceful shutdown manager. +//! +//! Provides a global [`CancellationToken`] that all modules can listen on +//! for coordinated graceful shutdown. One place handles SIGTERM/SIGINT, +//! all modules drain gracefully. +//! +//! ## Usage +//! +//! ```rust,no_run +//! use hyperi_rustlib::shutdown; +//! +//! #[tokio::main] +//! async fn main() { +//! // Install the signal handler once at startup +//! let token = shutdown::install_signal_handler(); +//! +//! // Pass token to workers, or they can call shutdown::token() directly +//! tokio::spawn(async move { +//! loop { +//! tokio::select! { +//! _ = token.cancelled() => { +//! // drain and exit +//! break; +//! } +//! _ = do_work() => {} +//! } +//! } +//! }); +//! } +//! +//! async fn do_work() { +//! tokio::time::sleep(std::time::Duration::from_secs(1)).await; +//! } +//! ``` + +use std::sync::OnceLock; +use tokio_util::sync::CancellationToken; + +static TOKEN: OnceLock = OnceLock::new(); + +/// Get the global shutdown token. +/// +/// All modules should clone this token and listen for cancellation +/// in their main loops via `token.cancelled().await`. +/// +/// The token is created lazily on first access. +pub fn token() -> CancellationToken { + TOKEN.get_or_init(CancellationToken::new).clone() +} + +/// Check if shutdown has been requested. +pub fn is_shutdown() -> bool { + TOKEN.get().is_some_and(CancellationToken::is_cancelled) +} + +/// Trigger shutdown programmatically. +/// +/// Cancels the global token. All modules listening on it will +/// begin their drain/cleanup sequence. +pub fn trigger() { + if let Some(t) = TOKEN.get() { + t.cancel(); + } +} + +/// Wait for SIGTERM or SIGINT, then trigger shutdown. +/// +/// Call this once at application startup. It spawns a background +/// task that waits for the OS signal, then cancels the global token. +/// +/// Returns the token for use in `tokio::select!` or other async +/// shutdown coordination. +#[must_use] +pub fn install_signal_handler() -> CancellationToken { + let t = token(); + let cancel = t.clone(); + + tokio::spawn(async move { + wait_for_signal().await; + cancel.cancel(); + + #[cfg(feature = "logger")] + tracing::info!("Shutdown signal received, cancelling all tasks"); + }); + + t +} + +/// Wait for SIGTERM or SIGINT. +async fn wait_for_signal() { + let ctrl_c = async { + tokio::signal::ctrl_c() + .await + .expect("failed to install Ctrl+C handler"); + }; + + #[cfg(unix)] + let terminate = async { + tokio::signal::unix::signal(tokio::signal::unix::SignalKind::terminate()) + .expect("failed to install SIGTERM handler") + .recv() + .await; + }; + + #[cfg(unix)] + tokio::select! { + () = ctrl_c => {}, + () = terminate => {}, + } + + #[cfg(not(unix))] + ctrl_c.await; +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn token_is_not_cancelled_initially() { + // Use a fresh token (not the global) to avoid test pollution + let t = CancellationToken::new(); + assert!(!t.is_cancelled()); + } + + #[test] + fn trigger_cancels_token() { + let t = CancellationToken::new(); + assert!(!t.is_cancelled()); + t.cancel(); + assert!(t.is_cancelled()); + } + + #[test] + fn token_is_cloneable_and_shared() { + let t = CancellationToken::new(); + let c1 = t.clone(); + let c2 = t.clone(); + + assert!(!c1.is_cancelled()); + assert!(!c2.is_cancelled()); + + t.cancel(); + + assert!(c1.is_cancelled()); + assert!(c2.is_cancelled()); + } + + #[test] + fn multiple_triggers_are_idempotent() { + let t = CancellationToken::new(); + t.cancel(); + t.cancel(); // second cancel should not panic + assert!(t.is_cancelled()); + } + + #[tokio::test] + async fn cancelled_future_resolves_after_cancel() { + let t = CancellationToken::new(); + let c = t.clone(); + + tokio::spawn(async move { + tokio::time::sleep(std::time::Duration::from_millis(10)).await; + c.cancel(); + }); + + // This should resolve once the token is cancelled + t.cancelled().await; + assert!(t.is_cancelled()); + } + + #[tokio::test] + async fn child_token_cancelled_by_parent() { + let parent = CancellationToken::new(); + let child = parent.child_token(); + + assert!(!child.is_cancelled()); + parent.cancel(); + assert!(child.is_cancelled()); + } +} diff --git a/src/tiered_sink/circuit.rs b/src/tiered_sink/circuit.rs index fb899b6..dea66e7 100644 --- a/src/tiered_sink/circuit.rs +++ b/src/tiered_sink/circuit.rs @@ -8,7 +8,8 @@ //! Circuit breaker for sink health tracking. -use std::sync::atomic::{AtomicU32, AtomicU64, Ordering}; +use std::sync::Arc; +use std::sync::atomic::{AtomicU8, AtomicU32, AtomicU64, Ordering}; use std::time::Duration; use tokio::sync::RwLock; @@ -34,6 +35,9 @@ pub struct CircuitBreaker { failure_threshold: u32, reset_timeout: Duration, last_failure_time: AtomicU64, // epoch millis + /// Atomic mirror of circuit state for sync health check access. + /// 0 = Closed, 1 = Open, 2 = HalfOpen. + health_state: Arc, } impl CircuitBreaker { @@ -43,15 +47,40 @@ impl CircuitBreaker { /// - `reset_timeout`: Time to wait before allowing a probe request #[must_use] pub fn new(failure_threshold: u32, reset_timeout: Duration) -> Self { + let health_state = Arc::new(AtomicU8::new(0)); // 0 = Closed + + #[cfg(feature = "health")] + { + let hs = Arc::clone(&health_state); + crate::health::HealthRegistry::register("circuit_breaker", move || { + match hs.load(Ordering::Relaxed) { + 0 => crate::health::HealthStatus::Healthy, // Closed + 2 => crate::health::HealthStatus::Degraded, // HalfOpen + _ => crate::health::HealthStatus::Unhealthy, // Open + } + }); + } + Self { state: RwLock::new(CircuitState::Closed), consecutive_failures: AtomicU32::new(0), failure_threshold, reset_timeout, last_failure_time: AtomicU64::new(0), + health_state, } } + /// Sync the atomic health state mirror with the current circuit state. + fn sync_health_state(&self, state: CircuitState) { + let val = match state { + CircuitState::Closed => 0, + CircuitState::Open => 1, + CircuitState::HalfOpen => 2, + }; + self.health_state.store(val, Ordering::Relaxed); + } + /// Get current circuit state. pub async fn state(&self) -> CircuitState { let mut state = self.state.write().await; @@ -64,6 +93,7 @@ impl CircuitBreaker { if elapsed >= self.reset_timeout { *state = CircuitState::HalfOpen; + self.sync_health_state(*state); } } @@ -85,6 +115,7 @@ impl CircuitBreaker { let mut state = self.state.write().await; self.consecutive_failures.store(0, Ordering::SeqCst); *state = CircuitState::Closed; + self.sync_health_state(*state); } /// Record a failed request. @@ -96,6 +127,7 @@ impl CircuitBreaker { if failures >= self.failure_threshold { let mut state = self.state.write().await; *state = CircuitState::Open; + self.sync_health_state(*state); } } @@ -110,6 +142,7 @@ impl CircuitBreaker { self.consecutive_failures.store(0, Ordering::SeqCst); let mut state = self.state.write().await; *state = CircuitState::Closed; + self.sync_health_state(*state); } } diff --git a/src/tiered_sink/drainer.rs b/src/tiered_sink/drainer.rs index 52bb166..0938484 100644 --- a/src/tiered_sink/drainer.rs +++ b/src/tiered_sink/drainer.rs @@ -150,12 +150,25 @@ pub async fn drain_loop( ) { let mut drainer = Drainer::new(strategy); + #[cfg(feature = "shutdown")] + let global_shutdown = crate::shutdown::token(); + loop { - // Check for shutdown + // Check for shutdown (local notify or global shutdown token) tokio::select! { () = shutdown.notified() => { #[cfg(feature = "logger")] - tracing::info!("Drain task shutting down"); + tracing::info!("Drain task shutting down (local notify)"); + return; + } + () = async { + #[cfg(feature = "shutdown")] + global_shutdown.cancelled().await; + #[cfg(not(feature = "shutdown"))] + std::future::pending::<()>().await; + } => { + #[cfg(feature = "logger")] + tracing::info!("Drain task shutting down (global shutdown)"); return; } () = tokio::time::sleep(interval) => {} diff --git a/src/transport/file.rs b/src/transport/file.rs index 6f02ea4..6df20af 100644 --- a/src/transport/file.rs +++ b/src/transport/file.rs @@ -80,6 +80,22 @@ impl Default for FileTransportConfig { } } +impl FileTransportConfig { + /// Load from the config cascade under the `transport.file` key. + #[must_use] + pub fn from_cascade() -> Self { + #[cfg(feature = "config")] + { + if let Some(cfg) = crate::config::try_get() + && let Ok(tc) = cfg.unmarshal_key_registered::("transport.file") + { + return tc; + } + } + Self::default() + } +} + /// Internal state for the write side. struct WriteState { file: tokio::fs::File, @@ -116,6 +132,9 @@ impl FileTransport { return Err(TransportError::Config("file path is empty".into())); } + #[cfg(feature = "logger")] + tracing::info!(path = %config.path, append = config.append, "File transport opened"); + Ok(Self { config: config.clone(), writer: Mutex::new(None), @@ -252,15 +271,24 @@ impl TransportSender for FileTransport { // Write payload + newline as a single operation if let Err(e) = state.file.write_all(payload).await { + #[cfg(feature = "logger")] + tracing::warn!(error = %e, "File transport: write error"); return SendResult::Fatal(TransportError::Send(format!("write failed: {e}"))); } if let Err(e) = state.file.write_all(b"\n").await { + #[cfg(feature = "logger")] + tracing::warn!(error = %e, "File transport: newline write error"); return SendResult::Fatal(TransportError::Send(format!("write newline failed: {e}"))); } if let Err(e) = state.file.flush().await { + #[cfg(feature = "logger")] + tracing::warn!(error = %e, "File transport: flush error"); return SendResult::Fatal(TransportError::Send(format!("flush failed: {e}"))); } + #[cfg(feature = "logger")] + tracing::debug!(bytes = payload.len(), "File transport: message sent"); + #[cfg(feature = "metrics")] metrics::counter!("dfe_transport_sent_total", "transport" => "file").increment(1); @@ -322,6 +350,11 @@ impl TransportReceiver for FileTransport { }); } + #[cfg(feature = "logger")] + if !messages.is_empty() { + tracing::debug!(lines = messages.len(), "File transport: batch received"); + } + #[cfg(feature = "metrics")] if !messages.is_empty() { metrics::counter!("dfe_transport_sent_total", "transport" => "file") @@ -335,6 +368,12 @@ impl TransportReceiver for FileTransport { if let Some(max_token) = tokens.iter().max_by_key(|t| t.offset) { let path = Path::new(&self.config.path); Self::save_position(path, max_token.offset).await?; + + #[cfg(feature = "logger")] + tracing::debug!( + offset = max_token.offset, + "File transport: position committed" + ); } Ok(()) } diff --git a/src/transport/grpc/mod.rs b/src/transport/grpc/mod.rs index 284d9b4..e23417d 100644 --- a/src/transport/grpc/mod.rs +++ b/src/transport/grpc/mod.rs @@ -67,6 +67,9 @@ pub struct GrpcTransport { /// Whether the transport is closed. closed: AtomicBool, + /// Shared healthy flag — read by health registry closure, written by close(). + healthy: Arc, + /// Receive timeout (milliseconds). recv_timeout_ms: u64, @@ -170,12 +173,27 @@ impl GrpcTransport { server_handle = Some(handle); } + let healthy = Arc::new(AtomicBool::new(true)); + + #[cfg(feature = "health")] + { + let h = Arc::clone(&healthy); + crate::health::HealthRegistry::register("transport:grpc", move || { + if h.load(Ordering::Relaxed) { + crate::health::HealthStatus::Healthy + } else { + crate::health::HealthStatus::Unhealthy + } + }); + } + Ok(Self { client, receiver, shutdown_tx, _server_handle: server_handle, closed: AtomicBool::new(false), + healthy, recv_timeout_ms: config.recv_timeout_ms, #[cfg(feature = "metrics")] inflight: AtomicU64::new(0), @@ -186,6 +204,7 @@ impl GrpcTransport { impl TransportBase for GrpcTransport { async fn close(&self) -> TransportResult<()> { self.closed.store(true, Ordering::Relaxed); + self.healthy.store(false, Ordering::Relaxed); // Signal server shutdown // Note: we can't take from Option behind &self, so we use a flag @@ -194,7 +213,7 @@ impl TransportBase for GrpcTransport { } fn is_healthy(&self) -> bool { - let healthy = !self.closed.load(Ordering::Relaxed); + let healthy = self.healthy.load(Ordering::Relaxed); #[cfg(feature = "metrics")] metrics::gauge!("dfe_transport_healthy", "transport" => "grpc").set(if healthy { 1.0 diff --git a/src/transport/http.rs b/src/transport/http.rs index 39a0b77..5f25ae9 100644 --- a/src/transport/http.rs +++ b/src/transport/http.rs @@ -141,6 +141,20 @@ impl Default for HttpTransportConfig { } impl HttpTransportConfig { + /// Load from the config cascade under the `transport.http` key. + #[must_use] + pub fn from_cascade() -> Self { + #[cfg(feature = "config")] + { + if let Some(cfg) = crate::config::try_get() + && let Ok(tc) = cfg.unmarshal_key_registered::("transport.http") + { + return tc; + } + } + Self::default() + } + /// Create a send-only config pointing at the given endpoint URL. #[must_use] pub fn sender(endpoint: &str) -> Self { @@ -241,6 +255,13 @@ impl HttpTransport { (None, None, None) }; + #[cfg(feature = "logger")] + tracing::info!( + endpoint = ?config.endpoint, + listen = ?config.listen, + "HTTP transport opened" + ); + Ok(Self { client, endpoint: config.endpoint.clone(), @@ -372,6 +393,9 @@ impl TransportSender for HttpTransport { .await { Ok(resp) if resp.status().is_success() => { + #[cfg(feature = "logger")] + tracing::debug!(url = %url, bytes = payload.len(), "HTTP transport: POST sent"); + #[cfg(feature = "metrics")] metrics::counter!("dfe_transport_sent_total", "transport" => "http").increment(1); SendResult::Ok @@ -380,12 +404,18 @@ impl TransportSender for HttpTransport { if resp.status() == reqwest::StatusCode::TOO_MANY_REQUESTS || resp.status() == reqwest::StatusCode::SERVICE_UNAVAILABLE => { + #[cfg(feature = "logger")] + tracing::warn!(status = %resp.status(), url = %url, "HTTP transport: backpressure"); + #[cfg(feature = "metrics")] metrics::counter!("dfe_transport_backpressured_total", "transport" => "http") .increment(1); SendResult::Backpressured } Ok(resp) => { + #[cfg(feature = "logger")] + tracing::warn!(status = %resp.status(), url = %url, "HTTP transport: send error"); + #[cfg(feature = "metrics")] metrics::counter!("dfe_transport_send_errors_total", "transport" => "http") .increment(1); @@ -396,6 +426,9 @@ impl TransportSender for HttpTransport { ))) } Err(e) => { + #[cfg(feature = "logger")] + tracing::warn!(error = %e, url = %url, "HTTP transport: request failed"); + #[cfg(feature = "metrics")] metrics::counter!("dfe_transport_send_errors_total", "transport" => "http") .increment(1); @@ -464,6 +497,11 @@ impl TransportReceiver for HttpTransport { } } + #[cfg(feature = "logger")] + if !messages.is_empty() { + tracing::debug!(messages = messages.len(), "HTTP transport: batch received"); + } + Ok(messages) } diff --git a/src/transport/kafka/mod.rs b/src/transport/kafka/mod.rs index c320572..d558d2b 100644 --- a/src/transport/kafka/mod.rs +++ b/src/transport/kafka/mod.rs @@ -115,6 +115,8 @@ pub struct KafkaTransport { /// Key optimization: no locks in the hot path. topic_cache: HashMap>, closed: AtomicBool, + /// Shared healthy flag — read by health registry closure, written by close(). + healthy: Arc, /// Topics we're subscribed to (for cache warming). subscribed_topics: Vec, } @@ -232,11 +234,26 @@ impl KafkaTransport { .create_with_context(StatsContext::new()) .map_err(|e| TransportError::Connection(format!("Failed to create producer: {e}")))?; + let healthy = Arc::new(AtomicBool::new(true)); + + #[cfg(feature = "health")] + { + let h = Arc::clone(&healthy); + crate::health::HealthRegistry::register("transport:kafka", move || { + if h.load(Ordering::Relaxed) { + crate::health::HealthStatus::Healthy + } else { + crate::health::HealthStatus::Unhealthy + } + }); + } + Ok(Self { consumer, producer, topic_cache, closed: AtomicBool::new(false), + healthy, subscribed_topics, }) } @@ -254,12 +271,13 @@ impl KafkaTransport { impl TransportBase for KafkaTransport { async fn close(&self) -> TransportResult<()> { self.closed.store(true, Ordering::Relaxed); + self.healthy.store(false, Ordering::Relaxed); // rdkafka handles cleanup on drop Ok(()) } fn is_healthy(&self) -> bool { - !self.closed.load(Ordering::Relaxed) + self.healthy.load(Ordering::Relaxed) } fn name(&self) -> &'static str { @@ -473,6 +491,7 @@ impl std::fmt::Debug for KafkaTransport { f.debug_struct("KafkaTransport") .field("subscribed_topics", &self.subscribed_topics) .field("closed", &self.closed.load(Ordering::Relaxed)) + .field("healthy", &self.healthy.load(Ordering::Relaxed)) .finish_non_exhaustive() } } diff --git a/src/transport/pipe.rs b/src/transport/pipe.rs index 4457d2e..94a8a91 100644 --- a/src/transport/pipe.rs +++ b/src/transport/pipe.rs @@ -71,6 +71,22 @@ impl Default for PipeTransportConfig { } } +impl PipeTransportConfig { + /// Load from the config cascade under the `transport.pipe` key. + #[must_use] + pub fn from_cascade() -> Self { + #[cfg(feature = "config")] + { + if let Some(cfg) = crate::config::try_get() + && let Ok(tc) = cfg.unmarshal_key_registered::("transport.pipe") + { + return tc; + } + } + Self::default() + } +} + /// Unix pipe transport (stdin/stdout). /// /// Send writes newline-delimited payloads to stdout. @@ -88,6 +104,12 @@ impl PipeTransport { /// Create a new pipe transport. #[must_use] pub fn new(config: &PipeTransportConfig) -> Self { + #[cfg(feature = "logger")] + tracing::info!( + recv_timeout_ms = config.recv_timeout_ms, + "Pipe transport opened" + ); + Self { stdin: tokio::sync::Mutex::new(BufReader::new(tokio::io::stdin())), stdout: tokio::sync::Mutex::new(tokio::io::stdout()), @@ -142,6 +164,12 @@ impl TransportSender for PipeTransport { return SendResult::Fatal(TransportError::Send(format!("stdout flush failed: {e}"))); } + #[cfg(feature = "logger")] + tracing::debug!( + bytes = payload.len(), + "Pipe transport: message sent to stdout" + ); + #[cfg(feature = "metrics")] metrics::counter!("dfe_transport_sent_total", "transport" => "pipe").increment(1); @@ -229,6 +257,14 @@ impl TransportReceiver for PipeTransport { } } + #[cfg(feature = "logger")] + if !messages.is_empty() { + tracing::debug!( + lines = messages.len(), + "Pipe transport: batch received from stdin" + ); + } + Ok(messages) } diff --git a/src/transport/redis_transport.rs b/src/transport/redis_transport.rs index ba29c8b..1191d9e 100644 --- a/src/transport/redis_transport.rs +++ b/src/transport/redis_transport.rs @@ -132,6 +132,22 @@ impl Default for RedisTransportConfig { } } +impl RedisTransportConfig { + /// Load from the config cascade under the `transport.redis` key. + #[must_use] + pub fn from_cascade() -> Self { + #[cfg(feature = "config")] + { + if let Some(cfg) = crate::config::try_get() + && let Ok(tc) = cfg.unmarshal_key_registered::("transport.redis") + { + return tc; + } + } + Self::default() + } +} + /// Redis/Valkey Streams transport. /// /// Supports both send (`XADD`) and receive (`XREADGROUP`) operations. @@ -168,6 +184,14 @@ impl RedisTransport { )) })?; + #[cfg(feature = "logger")] + tracing::info!( + url = %config.url, + stream = ?config.stream, + group = %config.group, + "Redis transport opened" + ); + Ok(Self { conn: Mutex::new(conn), config: config.clone(), @@ -267,14 +291,22 @@ impl TransportSender for RedisTransport { match result { Ok(_entry_id) => { + #[cfg(feature = "logger")] + tracing::debug!(stream = %stream, "Redis transport: XADD sent"); + #[cfg(feature = "metrics")] metrics::counter!("dfe_transport_sent_total", "transport" => "redis").increment(1); SendResult::Ok } - Err(e) => SendResult::Fatal(TransportError::Send(format!( - "XADD to stream '{stream}' failed: {e}" - ))), + Err(e) => { + #[cfg(feature = "logger")] + tracing::warn!(error = %e, stream = %stream, "Redis transport: XADD error"); + + SendResult::Fatal(TransportError::Send(format!( + "XADD to stream '{stream}' failed: {e}" + ))) + } } } } @@ -308,6 +340,9 @@ impl TransportReceiver for RedisTransport { .xread_options(&[&stream_name], &[">"], &opts) .await .map_err(|e| { + #[cfg(feature = "logger")] + tracing::warn!(error = %e, stream = %stream_name, "Redis transport: XREADGROUP error"); + TransportError::Recv(format!("XREADGROUP on stream '{stream_name}' failed: {e}")) })?; @@ -339,6 +374,14 @@ impl TransportReceiver for RedisTransport { } } + #[cfg(feature = "logger")] + if !messages.is_empty() { + tracing::debug!( + messages = messages.len(), + "Redis transport: XREADGROUP received" + ); + } + #[cfg(feature = "metrics")] if !messages.is_empty() { metrics::counter!("dfe_transport_sent_total", "transport" => "redis") @@ -371,10 +414,16 @@ impl TransportReceiver for RedisTransport { .xack(*stream, &self.config.group, &id_refs) .await .map_err(|e| { + #[cfg(feature = "logger")] + tracing::warn!(error = %e, stream = %stream, "Redis transport: XACK error"); + TransportError::Commit(format!("XACK on stream '{stream}' failed: {e}")) })?; } + #[cfg(feature = "logger")] + tracing::debug!(count = tokens.len(), "Redis transport: XACK committed"); + Ok(()) } } From 0e18ca9c085ace491197d2cf6241f031388c0580 Mon Sep 17 00:00:00 2001 From: semantic-release-bot Date: Thu, 26 Mar 2026 05:57:29 +0000 Subject: [PATCH 5/7] chore: version 1.17.0-dev.16 [skip ci] # [1.17.0-dev.16](https://github.com/hyperi-io/hyperi-rustlib/compare/v1.17.0-dev.15...v1.17.0-dev.16) (2026-03-26) ### Bug Fixes * add health registry, shutdown manager, and wire all modules ([21efaa2](https://github.com/hyperi-io/hyperi-rustlib/commit/21efaa2a4d61901dfe479d655ef9af1a5c437580)) --- CHANGELOG.md | 7 +++++++ Cargo.toml | 2 +- VERSION | 2 +- 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 5be2675..27231a8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,10 @@ +# [1.17.0-dev.16](https://github.com/hyperi-io/hyperi-rustlib/compare/v1.17.0-dev.15...v1.17.0-dev.16) (2026-03-26) + + +### Bug Fixes + +* add health registry, shutdown manager, and wire all modules ([21efaa2](https://github.com/hyperi-io/hyperi-rustlib/commit/21efaa2a4d61901dfe479d655ef9af1a5c437580)) + # [1.17.0-dev.15](https://github.com/hyperi-io/hyperi-rustlib/compare/v1.17.0-dev.14...v1.17.0-dev.15) (2026-03-26) diff --git a/Cargo.toml b/Cargo.toml index d660237..82efc37 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -8,7 +8,7 @@ [package] name = "hyperi-rustlib" -version = "1.17.0-dev.15" +version = "1.17.0-dev.16" edition = "2024" rust-version = "1.94" description = "Shared utility library for HyperI Rust applications" diff --git a/VERSION b/VERSION index 6e4cd31..e11aeb1 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -1.17.0-dev.15 +1.17.0-dev.16 From 9376329923c236c52f8cbbbae2aa9de5b04c3e9c Mon Sep 17 00:00:00 2001 From: Derek Date: Thu, 26 Mar 2026 17:19:36 +1100 Subject: [PATCH 6/7] fix: OTel trace propagation, /readyz health wiring, description update MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit OTel trace context (W3C traceparent) auto-propagated through gRPC (PushRequest metadata), Kafka (message headers), and HTTP (request headers). All gated behind #[cfg(feature = "otel")] — zero overhead when disabled. Shared propagation helpers in transport/propagation.rs. /readyz endpoint now aggregates from HealthRegistry when health feature is enabled. /health/detailed returns per-component JSON status. MetricsManager readiness also checks HealthRegistry. Fix test ordering issue with HealthRegistry global state. Update crate description. 671 tests pass. --- Cargo.toml | 2 +- src/http_server/server.rs | 31 ++++++- src/metrics/mod.rs | 22 ++++- src/transport/grpc/mod.rs | 14 +++ src/transport/http.rs | 33 +++++-- src/transport/kafka/mod.rs | 32 +++++++ src/transport/mod.rs | 1 + src/transport/propagation.rs | 165 +++++++++++++++++++++++++++++++++++ 8 files changed, 290 insertions(+), 10 deletions(-) create mode 100644 src/transport/propagation.rs diff --git a/Cargo.toml b/Cargo.toml index 82efc37..296d9c2 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -11,7 +11,7 @@ name = "hyperi-rustlib" version = "1.17.0-dev.16" edition = "2024" rust-version = "1.94" -description = "Shared utility library for HyperI Rust applications" +description = "Opinionated Rust framework for high-throughput data pipelines at PB scale. Auto-wiring config, logging, metrics, tracing, health, and graceful shutdown — built from many years of production infrastructure experience." license = "FSL-1.1-ALv2" repository = "https://github.com/hyperi-io/hyperi-rustlib" publish = true diff --git a/src/http_server/server.rs b/src/http_server/server.rs index fef7fed..1f9f924 100644 --- a/src/http_server/server.rs +++ b/src/http_server/server.rs @@ -196,6 +196,11 @@ impl HttpServer { ); } + #[cfg(all(feature = "health", feature = "serde_json"))] + if self.config.enable_health_endpoints { + router = router.route("/health/detailed", get(health_detailed)); + } + #[cfg(feature = "config")] if self.config.enable_config_endpoint { router = router.route("/config", get(config_dump)); @@ -240,14 +245,35 @@ async fn health_live() -> impl IntoResponse { } /// Readiness endpoint handler. +/// +/// Checks the local ready flag AND (when the `health` feature is enabled) +/// the global [`HealthRegistry`](crate::health::HealthRegistry). Both must +/// be true for a 200 response; otherwise 503. async fn health_ready(ready: Arc) -> impl IntoResponse { - if ready.load(Ordering::SeqCst) { + let locally_ready = ready.load(Ordering::SeqCst); + + #[cfg(feature = "health")] + let registry_ready = crate::health::HealthRegistry::is_ready(); + #[cfg(not(feature = "health"))] + let registry_ready = true; + + if locally_ready && registry_ready { (StatusCode::OK, "OK") } else { (StatusCode::SERVICE_UNAVAILABLE, "NOT READY") } } +/// Detailed health endpoint returning per-component status as JSON. +/// +/// Returns the output of [`HealthRegistry::to_json()`](crate::health::HealthRegistry::to_json), +/// which includes overall status and each registered component's state. +#[cfg(all(feature = "health", feature = "serde_json"))] +async fn health_detailed() -> impl IntoResponse { + let json = crate::health::HealthRegistry::to_json(); + axum::Json(json) +} + /// Config registry dump endpoint handler (redacted). #[cfg(feature = "config")] async fn config_dump() -> impl IntoResponse { @@ -332,6 +358,9 @@ mod tests { #[tokio::test] async fn test_health_ready_when_ready() { + #[cfg(feature = "health")] + crate::health::HealthRegistry::reset(); + let config = HttpServerConfig::default(); let server = HttpServer::new(config); server.set_ready(true); diff --git a/src/metrics/mod.rs b/src/metrics/mod.rs index b449810..bb76bd5 100644 --- a/src/metrics/mod.rs +++ b/src/metrics/mod.rs @@ -759,7 +759,14 @@ async fn handle_connection( } else if request_line.starts_with("GET /readyz") || request_line.starts_with("GET /health/ready") { - let ready = readiness_fn.as_ref().is_none_or(|f| f()); + let callback_ready = readiness_fn.as_ref().is_none_or(|f| f()); + + #[cfg(feature = "health")] + let registry_ready = crate::health::HealthRegistry::is_ready(); + #[cfg(not(feature = "health"))] + let registry_ready = true; + + let ready = callback_ready && registry_ready; if ready { ("200 OK", r#"{"status":"ready"}"#.to_string()) } else { @@ -787,11 +794,22 @@ async fn handle_connection( } /// Readiness response helper for axum endpoints. +/// +/// Checks the caller-supplied readiness callback AND (when the `health` +/// feature is enabled) the global [`HealthRegistry`](crate::health::HealthRegistry). +/// Both must be true for a 200 response. #[cfg(all(feature = "metrics", feature = "http-server"))] fn readiness_response(rf: Option) -> axum::response::Response { use axum::response::IntoResponse; - let ready = rf.as_ref().is_none_or(|f| f()); + let callback_ready = rf.as_ref().is_none_or(|f| f()); + + #[cfg(feature = "health")] + let registry_ready = crate::health::HealthRegistry::is_ready(); + #[cfg(not(feature = "health"))] + let registry_ready = true; + + let ready = callback_ready && registry_ready; if ready { ( [(axum::http::header::CONTENT_TYPE, "application/json")], diff --git a/src/transport/grpc/mod.rs b/src/transport/grpc/mod.rs index e23417d..6c91f43 100644 --- a/src/transport/grpc/mod.rs +++ b/src/transport/grpc/mod.rs @@ -245,6 +245,12 @@ impl TransportSender for GrpcTransport { metadata.insert("topic".to_string(), key.to_string()); } + // Inject W3C traceparent into gRPC metadata for distributed tracing + #[cfg(feature = "otel")] + if let Some(tp) = super::propagation::current_traceparent() { + metadata.insert(super::propagation::TRACEPARENT_HEADER.to_string(), tp); + } + let request = proto::PushRequest { payload: payload.to_vec(), format: proto::Format::Auto.into(), @@ -392,6 +398,14 @@ impl proto::dfe_transport_server::DfeTransport for DfeTransportServiceImpl { let req = request.into_inner(); let seq = self.sequence.fetch_add(1, Ordering::Relaxed); + // Extract W3C traceparent from incoming gRPC metadata for distributed tracing + #[cfg(feature = "otel")] + if let Some(tp) = req.metadata.get(super::propagation::TRACEPARENT_HEADER) + && super::propagation::is_valid_traceparent(tp) + { + tracing::Span::current().record("traceparent", tp.as_str()); + } + let format = PayloadFormat::detect(&req.payload); let key = req.metadata.get("topic").map(|s| Arc::from(s.as_str())); diff --git a/src/transport/http.rs b/src/transport/http.rs index 5f25ae9..173ff19 100644 --- a/src/transport/http.rs +++ b/src/transport/http.rs @@ -307,12 +307,27 @@ struct ReceiverState { async fn ingest_handler( axum::extract::State(state): axum::extract::State, axum::extract::ConnectInfo(addr): axum::extract::ConnectInfo, + headers: axum::http::HeaderMap, body: axum::body::Bytes, ) -> axum::http::StatusCode { if body.is_empty() { return axum::http::StatusCode::BAD_REQUEST; } + // Extract W3C traceparent from incoming HTTP headers for distributed tracing + #[cfg(feature = "otel")] + if let Some(tp) = headers + .get(super::propagation::TRACEPARENT_HEADER) + .and_then(|v| v.to_str().ok()) + && super::propagation::is_valid_traceparent(tp) + { + tracing::Span::current().record("traceparent", tp); + } + + // Suppress unused variable warning when otel feature is disabled + #[cfg(not(feature = "otel"))] + let _ = &headers; + let seq = state.sequence.fetch_add(1, Ordering::Relaxed); let format = PayloadFormat::detect(&body); let timestamp_ms = chrono::Utc::now().timestamp_millis(); @@ -384,14 +399,20 @@ impl TransportSender for HttpTransport { #[cfg(feature = "metrics")] let start = std::time::Instant::now(); - let result = match self + // Build request with optional W3C traceparent header for distributed tracing + let request_builder = self .client .post(&url) - .header("content-type", "application/octet-stream") - .body(payload.to_vec()) - .send() - .await - { + .header("content-type", "application/octet-stream"); + + #[cfg(feature = "otel")] + let request_builder = if let Some(tp) = super::propagation::current_traceparent() { + request_builder.header(super::propagation::TRACEPARENT_HEADER, tp) + } else { + request_builder + }; + + let result = match request_builder.body(payload.to_vec()).send().await { Ok(resp) if resp.status().is_success() => { #[cfg(feature = "logger")] tracing::debug!(url = %url, bytes = payload.len(), "HTTP transport: POST sent"); diff --git a/src/transport/kafka/mod.rs b/src/transport/kafka/mod.rs index d558d2b..54b5002 100644 --- a/src/transport/kafka/mod.rs +++ b/src/transport/kafka/mod.rs @@ -293,6 +293,18 @@ impl TransportSender for KafkaTransport { let record: FutureRecord<'_, str, [u8]> = FutureRecord::to(key).payload(payload); + // Inject W3C traceparent into Kafka message headers for distributed tracing + #[cfg(feature = "otel")] + let record = if let Some(tp) = super::propagation::current_traceparent() { + let headers = rdkafka::message::OwnedHeaders::new().insert(rdkafka::message::Header { + key: super::propagation::TRACEPARENT_HEADER, + value: Some(tp.as_str()), + }); + record.headers(headers) + } else { + record + }; + #[cfg(feature = "metrics")] let start = std::time::Instant::now(); @@ -371,6 +383,26 @@ impl TransportReceiver for KafkaTransport { if let Some(result) = self.consumer.poll(timeout) { match result { Ok(msg) => { + // Extract W3C traceparent from Kafka headers (first message only, + // to associate the batch span with the upstream trace) + #[cfg(feature = "otel")] + if let Some(headers) = msg.headers() { + use rdkafka::message::Headers; + for idx in 0..headers.count() { + if let Some(Ok(header)) = headers.try_get_as::<[u8]>(idx) + && header.key == super::propagation::TRACEPARENT_HEADER + { + if let Some(value) = header.value + && let Ok(tp) = std::str::from_utf8(value) + && super::propagation::is_valid_traceparent(tp) + { + tracing::Span::current().record("traceparent", tp); + } + break; + } + } + } + let topic_str = msg.topic(); let topic: Arc = get_or_insert_topic(&mut local_cache, topic_str); let payload = msg.payload().map_or_else(Vec::new, |p| p.to_vec()); diff --git a/src/transport/mod.rs b/src/transport/mod.rs index 7bcfa47..9d3dcb0 100644 --- a/src/transport/mod.rs +++ b/src/transport/mod.rs @@ -53,6 +53,7 @@ mod detect; mod error; pub mod factory; mod payload; +pub mod propagation; mod traits; mod types; diff --git a/src/transport/propagation.rs b/src/transport/propagation.rs new file mode 100644 index 0000000..34b9599 --- /dev/null +++ b/src/transport/propagation.rs @@ -0,0 +1,165 @@ +// Project: hyperi-rustlib +// File: src/transport/propagation.rs +// Purpose: W3C Trace Context propagation helpers for transport layer +// Language: Rust +// +// License: FSL-1.1-ALv2 +// Copyright: (c) 2026 HYPERI PTY LIMITED + +//! # Trace Context Propagation +//! +//! W3C Trace Context (traceparent) helpers for automatic context propagation +//! across transport boundaries. When the `otel` feature is enabled, transports +//! inject/extract `traceparent` headers transparently. +//! +//! Format: `00-{trace_id}-{span_id}-{flags}` +//! Example: `00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01` + +/// W3C traceparent header name. +pub const TRACEPARENT_HEADER: &str = "traceparent"; + +/// Format a W3C traceparent header value from the current OTel span context. +/// +/// Returns `Some("00-{trace_id}-{span_id}-{flags}")` if there is a valid +/// span context active, `None` otherwise. +#[cfg(feature = "otel")] +#[must_use] +pub fn current_traceparent() -> Option { + use opentelemetry::trace::TraceContextExt; + + let cx = opentelemetry::Context::current(); + let span = cx.span(); + let sc = span.span_context(); + + if sc.is_valid() { + Some(format_traceparent(sc)) + } else { + None + } +} + +/// Format a `SpanContext` into a W3C traceparent string. +/// +/// `TraceId` and `SpanId` implement `Display` as lowercase hex. +/// `TraceFlags::to_u8()` returns the raw flags byte. +#[cfg(feature = "otel")] +fn format_traceparent(sc: &opentelemetry::trace::SpanContext) -> String { + format!( + "00-{}-{}-{:02x}", + sc.trace_id(), + sc.span_id(), + sc.trace_flags().to_u8() + ) +} + +/// Format a traceparent string from raw components (for testing without OTel). +#[must_use] +pub fn format_traceparent_raw(trace_id: u128, span_id: u64, flags: u8) -> String { + format!("00-{trace_id:032x}-{span_id:016x}-{flags:02x}") +} + +/// Validate that a string looks like a well-formed traceparent header. +/// +/// Does basic structural validation (length, separators, hex chars). +/// Does NOT validate that trace_id/span_id are non-zero. +#[must_use] +pub fn is_valid_traceparent(value: &str) -> bool { + // Expected: "00-<32hex>-<16hex>-<2hex>" = 55 chars + if value.len() != 55 { + return false; + } + + let bytes = value.as_bytes(); + + // Version: "00" + if bytes[0] != b'0' || bytes[1] != b'0' { + return false; + } + + // Separators at positions 2, 35, 52 + if bytes[2] != b'-' || bytes[35] != b'-' || bytes[52] != b'-' { + return false; + } + + // All other positions must be hex digits + let hex_ranges = [3..35, 36..52, 53..55]; + for range in &hex_ranges { + for &b in &bytes[range.clone()] { + if !b.is_ascii_hexdigit() { + return false; + } + } + } + + true +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn traceparent_format_raw() { + let tp = format_traceparent_raw( + 0x4bf9_2f35_77b3_4da6_a3ce_929d_0e0e_4736, + 0x00f0_67aa_0ba9_02b7, + 0x01, + ); + assert_eq!( + tp, + "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01" + ); + assert_eq!(tp.len(), 55); + } + + #[test] + fn traceparent_format_zero_padded() { + // Low values should be zero-padded to full width + let tp = format_traceparent_raw(123, 456, 1); + assert!(tp.starts_with("00-")); + assert_eq!(tp.len(), 55); + assert_eq!( + tp, + "00-0000000000000000000000000000007b-00000000000001c8-01" + ); + } + + #[test] + fn traceparent_format_flags_zero() { + let tp = format_traceparent_raw(1, 1, 0); + assert!(tp.ends_with("-00")); + } + + #[test] + fn valid_traceparent() { + assert!(is_valid_traceparent( + "00-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01" + )); + } + + #[test] + fn invalid_traceparent_too_short() { + assert!(!is_valid_traceparent("00-abc-def-01")); + } + + #[test] + fn invalid_traceparent_bad_version() { + assert!(!is_valid_traceparent( + "ff-4bf92f3577b34da6a3ce929d0e0e4736-00f067aa0ba902b7-01" + )); + } + + #[test] + fn invalid_traceparent_non_hex() { + assert!(!is_valid_traceparent( + "00-4bf92f3577b34da6a3ce929d0e0eXXXX-00f067aa0ba902b7-01" + )); + } + + #[test] + fn invalid_traceparent_wrong_separators() { + assert!(!is_valid_traceparent( + "00_4bf92f3577b34da6a3ce929d0e0e4736_00f067aa0ba902b7_01" + )); + } +} From d6cdd2c026e6136d330e2abd5e9d0a9090525191 Mon Sep 17 00:00:00 2001 From: Derek Date: Thu, 26 Mar 2026 17:22:40 +1100 Subject: [PATCH 7/7] chore: save session state [skip ci] --- TODO.md | 109 ++++++++++++++++++++++++++++---------------------------- 1 file changed, 54 insertions(+), 55 deletions(-) diff --git a/TODO.md b/TODO.md index df51114..384a450 100644 --- a/TODO.md +++ b/TODO.md @@ -1,68 +1,65 @@ # TODO - hyperi-rustlib -**Project Goal:** Rust shared library equivalent to hyperi-pylib (Python) and hyperi-golib (Go) +**Project Goal:** Opinionated Rust framework for high-throughput data pipelines at PB scale -**Target:** Production-ready library for HyperI Rust applications +**Target:** Production-ready library with auto-wiring config, logging, metrics, tracing, health, and graceful shutdown --- ## Current Tasks -### Core Pillars Implementation `[NEXT]` - -Full plan: `docs/superpowers/plans/2026-03-26-core-pillars.md` - -**Phase 1: OTel Tracing Auto-Propagation** -- [ ] Auto-initialise OTel layer in logger when `otel` feature + `OTEL_EXPORTER_OTLP_ENDPOINT` set -- [ ] gRPC trace context propagation (tonic interceptors, `traceparent` header) -- [ ] Kafka trace context propagation (message headers) -- [ ] HTTP client trace context injection -- [ ] HTTP server trace context extraction - -**Phase 2: Unified HealthState** -- [ ] `src/health/` module with global `HealthRegistry` singleton -- [ ] `HealthComponent` trait — modules register at construction -- [ ] Wire transport, circuit breaker, config reloader into registry -- [ ] `/readyz` aggregates from `HealthRegistry::is_healthy()` -- [ ] `/health/detailed` JSON endpoint with per-component status - -**Phase 3: Unified Graceful Shutdown** -- [ ] `src/shutdown/` module with global `CancellationToken` -- [ ] SIGTERM/SIGINT → `token.cancel()` → all modules drain -- [ ] Wire http-server, tiered-sink, config-reloader, gRPC transport - -**Phase 4: New Transports** -- [ ] File transport (NDJSON, wraps existing `NdjsonWriter`) -- [ ] Pipe transport (stdin/stdout, newline-delimited) -- [ ] HTTP transport (POST to endpoint, uses `HttpClient`) -- [ ] Redis/Valkey Streams transport (`XADD`/`XREADGROUP`/`XACK`) - -**Phase 5: DLQ Transport Integration** -- [ ] DLQ Kafka backend uses `Box` instead of raw producer +### v2.0.0 Release `[NEXT]` + +All core pillar work is done. Need to: +- [ ] Release-merge to release branch (feat!: breaking change → v2.0.0) +- [ ] Verify crates.io publication +- [ ] Docs consolidation (TRANSPORT.md, CORE-PILLARS.md, per-feature docs) +- [ ] Add Redis vs Kafka comparison table to transport docs + +### DLQ Transport Integration + +- [ ] DLQ Kafka backend uses `Box` / `AnySender` instead of raw producer - [ ] DLQ can write to any transport (file, HTTP, Redis, Kafka) -**Phase 6: Always-On Defaults** -- [ ] Make config, logger, metrics, health, shutdown default features -- [ ] Downstream dfe-* app remediation (remove boilerplate) +### Identity / Auth Module (Discussion) + +- [ ] Token validation middleware (JWT/OIDC) for gRPC interceptor + axum middleware +- [ ] Service identity (service name + instance ID for mTLS, audit logs) +- [ ] Break-glass: static bearer token from secrets module +- [ ] Design decision: dfe-engine as SSoT, rustlib validates tokens only + +### Downstream Remediation + +- [ ] Migrate dfe-loader to v2.0.0 (transport factory, remove boilerplate) +- [ ] Migrate dfe-receiver to v2.0.0 (RoutedSender, transport factory) +- [ ] Migrate dfe-archiver to v2.0.0 +- [ ] Migrate dfe-fetcher to v2.0.0 +- [ ] Migrate dfe-transform-wasm to v2.0.0 +- [ ] Migrate dfe-transform-vrl to v2.0.0 - [ ] Audit hyperi-pylib and write alignment plan --- -### Completed Recent - -- [x] **Universal metrics instrumentation** (v1.19.8) — tiered-sink, spool, dlq, cache, http-client, secrets all auto-emit Prometheus metrics via global singleton. Core pillar design decision documented in CLAUDE.md. -- [x] **Kafka transport metrics + StatsContext** (v1.19.8) — `KafkaTransport` always uses `StatsContext` for consumer and producer. `dfe_transport_*` metrics on `send()`. `rdkafka_*` metrics auto-emitted. Zero downstream code changes. -- [x] **gRPC transport metrics** (v1.19.7) — `dfe_transport_*` metrics on send/recv. Server push handler uses `try_send` with backpressure status codes. -- [x] **HTTP client module** (v1.19.6) — reqwest + reqwest-middleware + reqwest-retry, exponential backoff, config cascade -- [x] **Database URL builders** (v1.19.6) — PostgreSQL, ClickHouse, Redis/Valkey, MongoDB. Display trait redacts passwords. -- [x] **Cache module** (v1.19.6) — moka-backed concurrent in-memory cache, per-source TTL, source isolation -- [x] **Dependency update** (v1.19.6) — all deps to latest, cargo-audit ignores for transitive advisories -- [x] **Config registry** (v1.19.3-v1.19.5) — auto-registering reflectable config, `/config` admin endpoint, `SensitiveString`, heuristic redaction, change notification -- [x] **CEL expression profile** (v1.19.2) — `matches()` blocked by default, `ProfileConfig` with per-category overrides -- [x] **Config cascade wiring** (v1.19.2) — expression, memory, version_check, scaling, grpc, secrets auto-read from cascade -- [x] **MemoryGuard underflow fix** (v1.19.1) — `fetch_sub` replaced with `fetch_update` + `saturating_sub` -- [x] **Test restructure** (v1.19.1) — `tests/integration/`, `tests/e2e/`, `tests/common/` -- [x] **hyperi-ci release-merge** — CLI command replaces per-project workflow files +### Completed This Session + +- [x] **Transport trait split** — `Transport` split into `TransportBase` + `TransportSender` + `TransportReceiver` with blanket `Transport` impl +- [x] **Transport factory** — `AnySender` enum dispatch from config, `RoutedSender` for per-key dispatch (receiver/fetcher only) +- [x] **File transport** — NDJSON with position tracking, commit persistence, rotation +- [x] **Pipe transport** — stdin/stdout for Unix pipeline composition +- [x] **HTTP transport** — POST send + embedded axum receive (bidirectional) +- [x] **Redis/Valkey Streams transport** — XADD/XREADGROUP/XACK with consumer groups +- [x] **HealthRegistry** — global singleton, modules auto-register health check closures, `/readyz` aggregates, `/health/detailed` JSON +- [x] **Shutdown manager** — global CancellationToken, SIGTERM/SIGINT handler, modules listen on token +- [x] **OTel trace propagation** — W3C traceparent auto-injected/extracted in gRPC, Kafka, HTTP transports +- [x] **Universal metrics** — all modules auto-emit Prometheus metrics via global recorder +- [x] **Logging + config cascade** — added to all new transports +- [x] **Health wiring** — Kafka, gRPC, CircuitBreaker, HttpServer auto-register +- [x] **Shutdown wiring** — HttpServer, TieredSink drainer, ConfigReloader listen on global token +- [x] **KafkaTransport StatsContext** — always-on, rdkafka_* metrics auto-emitted +- [x] **gRPC transport metrics** — dfe_transport_* parity with Kafka +- [x] **HTTP client, database URL builders, cache modules** (v1.19.6) +- [x] **Config registry, SensitiveString, /config endpoint** (v1.19.3-v1.19.5) +- [x] **Dependency updates, cargo-audit ignores** (v1.19.6) --- @@ -75,8 +72,8 @@ Full plan: `docs/superpowers/plans/2026-03-26-core-pillars.md` ### Kafka — Opinionated SASL-SCRAM Named Constructors -- [ ] `KafkaConfig::external_sasl_scram(brokers, username, password)` — SASL_SSL + SCRAM-SHA-512 -- [ ] `KafkaConfig::internal_sasl_scram(brokers, username, password)` — SASL_PLAINTEXT + SCRAM-SHA-512 +- [ ] `KafkaConfig::external_sasl_scram(brokers, username, password)` +- [ ] `KafkaConfig::internal_sasl_scram(brokers, username, password)` ### Other @@ -88,6 +85,8 @@ Full plan: `docs/superpowers/plans/2026-03-26-core-pillars.md` ## Notes - Use `CARGO_BUILD_JOBS=2` for all cargo commands -- Transport backends: Kafka, gRPC (native + Vector compat), Memory +- Transport backends: Kafka, gRPC, Memory, File, Pipe, HTTP, Redis/Valkey - Core pillars plan: `docs/superpowers/plans/2026-03-26-core-pillars.md` -- See docs/GAP_ANALYSIS.md for detailed comparison with hyperi-pylib +- Two deployment modes: Kafka-mediated (persistence) vs direct gRPC (low latency) +- Routed transport is receiver/fetcher only — all other stages are 1:1 +- Breaking change: `feat!:` commit triggers v2.0.0 via semantic-release