This file is the operating guide for coding agents working in siphon.
- Build and maintain a production-safe ingestion service that:
- receives provider webhooks,
- normalizes to CloudEvents,
- publishes to NATS JetStream,
- optionally sinks to ClickHouse,
- exposes replay/admin operations and metrics.
- Keep behavior deterministic, observable, and safe under retries, duplicates, and transient outages.
cmd/tap- process entrypoint and runtime wiring (
main.go,run.go) - admin endpoints and runtime registries
- process entrypoint and runtime wiring (
config- config structs, defaults, validation, env overrides (
TAP_prefix)
- config structs, defaults, validation, env overrides (
internal/ingress- HTTP webhook ingress and provider verification
internal/normalize- provider payload to normalized event + CloudEvent conversion
internal/publish- NATS publisher and ClickHouse sink
internal/poller- poll-mode providers and scheduling/failure-budget logic
internal/dlq- dead-letter queue publication/replay support
internal/health- liveness/readiness handlers and Prometheus metrics
charts/siphon- Helm chart values, schema, templates
docs/admin-openapi.yaml- admin API contract expected to match runtime behavior
- Ingress receives webhook at
/webhooks/{provider}or/webhooks/{provider}/{tenant}. - Provider signature/auth verification executes.
- Payload is normalized into internal event shape and CloudEvent.
- CloudEvent is published to NATS JetStream with dedupe headers.
- Optional ClickHouse sink consumes from JetStream and batch-inserts.
- Failures in verify/normalize/publish can be recorded to DLQ.
- Admin APIs can inspect/replay DLQ and inspect poller runtime status.
- Go toolchain: as defined in CI (
go1.26.2viaGOTOOLCHAIN). - Core local commands:
go test ./...go test -race ./...go vet ./...make ci-local
- CI parity command:
make ci-local- includes: vet, tests, race, staticcheck, coverage gate, OpenAPI contract test, Helm lint/template.
/.github/workflows/ci.yml enforces these gates:
test:go vet+ coverage threshold (minimum75%).staticcheckopenapi-contract:TestAdminOpenAPIContractMatchesRuntimeconfig-lint: runtime/chart config lint + Helm render hardening assertionsdocker-buildhelm-lint+helm templateintegration: real NATS + ClickHouse integration testperf-smoke: k6 smoke against/livezand/readyzsecurity:gosec,govulncheck, Trivy CRITICAL scan, SBOM generationflake-tracker: per-runintegration/perf-smokestatus + duration artifact trend- failed
mainruns auto-open a triage issue with failed job log snippets
If your change affects behavior, assume at least one of these can fail and run the relevant subset locally before committing.
A change is not done until all apply:
- Code compiles and targeted tests pass.
go test ./...passes.make ci-localpasses.- Documentation/config surfaces are updated when needed.
- No regressions in API contract, observability, or security posture.
Always update all relevant surfaces:
config/config.go- struct field
- defaults in
ApplyDefaults() - validation in
Validate()and helpers
config/config_test.go- defaults
- valid and invalid cases
- env override behavior if applicable
config.example.yamlcharts/siphon/values.yamlcharts/siphon/values.schema.json- docs:
- root
README.md - chart
charts/siphon/README.mdwhen Helm-visible
- root
- Update runtime handlers in
cmd/tap/*. - Keep
docs/admin-openapi.yamlin sync. - Run:
go test ./cmd/tap -run TestAdminOpenAPIContractMatchesRuntime -count=1
- Update runtime code in
internal/publish. - Add/adjust tests in:
internal/publish/nats_integration_test.gointernal/publish/clickhouse_test.gointernal/publish/real_integration_test.goif behavior spans dependencies
- Validate observability impact in
internal/health/metrics.goand tests/docs.
- Validate locally:
helm lint charts/siphonhelm template siphon charts/siphon >/dev/null
subject_prefixmust not contain wildcards or whitespace.- Dedupe window must be positive and
<= max_age. - Auth modes are mutually exclusive:
- username/password
- token
- creds file
- TLS flags/files must obey validation constraints.
- Stream constraints must remain valid:
- storage in
file|memory - discard in
old|new - compression in
none|s2 - max-size and count fields non-negative and bounded where required
- storage in
addrentries must be validhost:port.- TLS settings require
secure=truewhere applicable. - consumer timings and pool settings must stay positive/consistent.
insert_timeout < consumer_ack_wait.consumer_backoffvalues must be positive, non-decreasing.- if both configured:
consumer_max_deliver == len(consumer_backoff).
- Role-scoped tokens and rotation behavior must remain coherent.
- Rate limiting and optional CIDR/mTLS guards must continue to apply.
- Replay queue/job limits must remain bounded and validated.
Do not add major behavior without metrics and tests.
Current high-value metrics include:
- ingress: received, verification failures, processing duration
- publish: published, failures, dedupe hits, retry counters and delay histogram
- JetStream advisories by kind
- ClickHouse dedupe skipped counter
- admin request/replay lifecycle metrics
- poller health and fetch budget metrics
When adding metrics:
- register in
internal/health/metrics.go - add/extend tests in
internal/health/metrics_test.go - document in README if user-facing/operationally important
- Prefer table-driven tests for validation logic.
- For pure logic, write narrow unit tests.
- For transport behavior (NATS/ClickHouse), prefer integration-style tests with controlled mocks or local servers.
- Keep tests deterministic and bounded with explicit timeouts.
- If coverage drops near threshold, add tests in changed package first.
Helpful coverage drill-down:
go test ./... -coverprofile=/tmp/siphon.coverage.out
go tool cover -func=/tmp/siphon.coverage.out- Preserve idempotency and dedupe behavior end-to-end.
- Avoid unbounded retries, queues, or backoff growth.
- Keep publish/sink timeouts explicit and context-driven.
- Treat reconnects and transient network failures as normal; instrument retry paths.
- Maintain current CI security posture (gosec/govulncheck/trivy/SBOM).
- Avoid weakening TLS defaults without explicit config gate.
- Keep secret handling via env references and mounted files; avoid logging secrets.
- Maintain auth checks and scope checks on admin endpoints.
Bring up local stack:
docker compose up --buildNATS and ClickHouse images are pinned in repo-managed configs and CI; keep version updates deliberate and consistent across:
docker-compose.yml.github/workflows/ci.ymlgo.mod(client/server libraries, where applicable)
Tag-based release workflow (v*) builds/pushes multi-arch image, generates SBOMs, signs artifacts with cosign, and packages Helm chart. Ensure release-impacting changes (image behavior, chart values, runtime flags) are documented.
- Make minimal, targeted diffs.
- Keep changes consistent with existing code style and naming.
- Do not silently change API/contract behavior without tests and docs.
- Prefer fixing root cause over introducing bypass flags.
- If a change touches multiple layers (runtime/config/chart/docs), complete all layers in one pass.
Run before pushing:
go test ./...
make ci-localThen verify clean state:
git status --shortOnly commit when the tree is clean except intended changes.