diff --git a/coverage-thresholds.json b/coverage-thresholds.json index b7de1992..4fd482a8 100644 --- a/coverage-thresholds.json +++ b/coverage-thresholds.json @@ -5,5 +5,5 @@ "sdk/configclient": 87.1, "sdk/configwatcher": 90.9, "sdk/grpctransport": 90.0, - "sdk/tools": 96.1 + "sdk/tools": 96.0 } diff --git a/docs/adr/ADR-001-grpc-over-rest.md b/docs/adr/ADR-001-grpc-over-rest.md new file mode 100644 index 00000000..e82f3323 --- /dev/null +++ b/docs/adr/ADR-001-grpc-over-rest.md @@ -0,0 +1,39 @@ +# ADR-001: gRPC over REST + +**Date:** 2026-06-05 +**Status:** Accepted +**Deciders:** OpenDecree maintainers + +## Context + +OpenDecree is a multi-tenant configuration service that needs to support client SDKs in multiple languages (Go, Python, TypeScript). The API must support: + +- Strongly typed contracts shared across the server and all SDK clients +- Efficient binary serialization for high-throughput config reads +- Streaming for real-time config change notifications (config watcher pattern) +- Automatic client code generation to keep SDKs in sync with the server + +REST/JSON APIs require manual schema maintenance (OpenAPI), are weakly typed at the transport layer, and lack native streaming. HTTP/2 multiplexing and binary framing make gRPC a better fit for an SDK-centric, performance-sensitive service. + +## Decision + +Use Protocol Buffers (proto3) as the API schema language and gRPC as the primary transport. Proto files in `proto/centralconfig/v1/` are the single source of truth for the API contract. Generated code is committed to `api/centralconfig/v1/`. + +`buf` is used for linting, breaking-change detection, and code generation (running plugins locally via Docker, not remote registries). + +## Consequences + +**Positive:** + +- Strong typing enforced at compile time across all languages +- Automatic client stub generation via `buf generate` — SDKs are always in sync +- Native bidirectional streaming for config watch notifications +- Compact binary encoding (Protocol Buffers) reduces latency and bandwidth vs. JSON +- `buf lint` and `buf breaking` catch API contract violations before they ship + +**Negative:** + +- Browser clients cannot call gRPC directly; grpc-web or a transcoding gateway is required +- Not curl-friendly — human debugging requires tooling such as grpcurl or a gRPC reflection client +- Proto schema changes must follow strict backward-compatibility rules (or be coordinated with a breaking migration during alpha) +- Adds `buf` and proto toolchain to the developer setup (mitigated by Docker-based generation) diff --git a/docs/adr/ADR-002-specs-first-workflow.md b/docs/adr/ADR-002-specs-first-workflow.md new file mode 100644 index 00000000..08748869 --- /dev/null +++ b/docs/adr/ADR-002-specs-first-workflow.md @@ -0,0 +1,45 @@ +# ADR-002: Specs-First Workflow + +**Date:** 2026-06-05 +**Status:** Accepted +**Deciders:** OpenDecree maintainers + +## Context + +OpenDecree has two distinct schema layers that must remain consistent: + +1. **API contracts** — defined in `.proto` files, which drive gRPC server interfaces and all SDK clients +2. **Database contracts** — defined in `.sql` query files, which drive the Go database access layer via `sqlc` + +Without a clear source-of-truth policy, there is a risk that hand-written Go code diverges from the API or DB schema, especially as the project evolves across multiple contributors and multiple language SDKs. A "code-first" approach would mean maintaining three or more representations of the same concept in sync by hand. + +## Decision + +Proto files (`proto/centralconfig/v1/`) and SQL query files (`db/queries/`) are the canonical sources of truth. Go implementation code is written after — and in response to — changes in those files. + +The standard workflow is: + +1. Edit `.proto` or `.sql` files +2. Run `make generate` (runs `buf generate` and `sqlc generate` in Docker) +3. Implement or update Go business logic in `internal/` +4. Run `make test` and `make lint` +5. Commit — generated files (`*.pb.go`, `*.gen.go`) are checked into git + +Generated files are marked as `linguist-generated` in `.gitattributes` so they are excluded from language statistics and diff noise on GitHub. + +## Consequences + +**Positive:** + +- Single source of truth for both the API surface and the DB access layer +- Consistency is mechanically enforced — divergence between spec and implementation is a compile error +- Multi-language SDK clients are always in sync with the server API without manual effort +- `buf breaking` catches accidental API regressions at lint time +- `sqlc` type-checks SQL at generation time, catching query errors before runtime + +**Negative:** + +- Extra codegen step is required before implementing any change — developers must run `make generate` before editing business logic +- Generated files are committed to git, which increases diff size and requires discipline to avoid manual edits to generated files +- Mistakes in a proto or SQL file can break generation for all downstream modules until fixed +- Local development requires Docker to run `buf` and `sqlc` generators diff --git a/docs/adr/ADR-003-postgresql-redis-architecture.md b/docs/adr/ADR-003-postgresql-redis-architecture.md new file mode 100644 index 00000000..43b1465d --- /dev/null +++ b/docs/adr/ADR-003-postgresql-redis-architecture.md @@ -0,0 +1,42 @@ +# ADR-003: PostgreSQL + Redis Architecture + +**Date:** 2026-06-05 +**Status:** Accepted +**Deciders:** OpenDecree maintainers + +## Context + +OpenDecree needs to: + +1. Durably store configuration values, schema definitions, and an immutable audit log across multiple tenants +2. Serve config reads with low latency — config lookups are on the hot path for every service that depends on this system +3. Propagate configuration changes in real time to connected clients (config watcher subscriptions) + +A single data store would force tradeoffs: a relational database handles durable storage and ACID guarantees well but is not the right tool for low-latency cache reads or fan-out pub/sub; an in-memory store handles cache and messaging well but lacks the durability and query capabilities needed for the config history and audit log. + +## Decision + +Use **PostgreSQL 17** as the primary durable store and **Redis 7** as the cache and pub/sub layer. + +- PostgreSQL stores all config values (versioned), schema definitions, tenant records, and the audit log. Row-Level Security enforces tenant isolation at the database layer (see ADR-005). +- Redis caches resolved config values for fast reads. On a cache miss, the server fetches from PostgreSQL and repopulates the cache. +- Redis pub/sub propagates change notifications to all server instances when a config value is written, so connected config-watcher clients receive updates promptly across a horizontally scaled deployment. + +Both dependencies are placed behind Go interfaces (`cache.Cache`, `pubsub.PubSub`) so they can be swapped or mocked in tests without touching business logic. + +## Consequences + +**Positive:** + +- Proven, widely-adopted technologies with strong operational tooling and hosting options +- Full ACID guarantees from PostgreSQL for config mutations and audit entries +- Optimistic read path: most config reads are served from Redis without hitting the database +- Real-time change propagation across server replicas with minimal coupling +- Interfaces-behind-abstraction pattern keeps the business logic testable and the dependencies replaceable + +**Negative:** + +- Two external infrastructure dependencies increase operational complexity compared to a single-store solution +- Redis is a single point of failure for pub/sub — if Redis is unavailable, change notifications are not delivered (reads can still be served from PostgreSQL, but watchers will not see updates until Redis recovers) +- Cache invalidation logic must be kept in sync with write paths; a missed invalidation leads to stale reads until TTL expiry +- Local development and CI require both PostgreSQL and Redis to be running (addressed via Docker Compose) diff --git a/docs/adr/ADR-004-metadata-headers-first-auth.md b/docs/adr/ADR-004-metadata-headers-first-auth.md new file mode 100644 index 00000000..3d70d3ff --- /dev/null +++ b/docs/adr/ADR-004-metadata-headers-first-auth.md @@ -0,0 +1,40 @@ +# ADR-004: Metadata-Headers-First Authentication + +**Date:** 2026-06-05 +**Status:** Accepted +**Deciders:** OpenDecree maintainers + +## Context + +OpenDecree serves two very different deployment contexts: + +1. **Internal / developer environments** — services on a private network or inside a Kubernetes cluster where callers are trusted and setting up a full JWT/JWKS infrastructure is unnecessary overhead +2. **Production / multi-tenant environments** — where callers must be cryptographically authenticated and tenant isolation must be enforced + +A design that mandates JWT from day one creates friction for adopters who just want to integrate quickly in a trusted network. A design that defaults to no authentication at all provides no path toward production hardening. The auth mechanism must also work cleanly over gRPC, where the standard carrier is request metadata (headers). + +## Decision + +The default authentication mode uses gRPC metadata headers: + +- `x-tenant-id` — identifies the calling tenant +- `x-role` — declares the caller's role (`superadmin`, `admin`, `viewer`) + +In default mode these values are accepted as-is without cryptographic validation. The server applies RBAC via the Guard chain (`TenantScopeGuard`, `RolePolicyGuard`, `FieldLockGuard`) based on the declared values. + +JWT/JWKS authentication is opt-in: when configured, an interceptor validates the bearer token against the configured JWKS endpoint and extracts tenant ID and role from token claims, overriding the metadata headers. + +## Consequences + +**Positive:** + +- Zero-config for internal services and development environments — no key management, no JWKS endpoint required +- Progressive security model: operators can start with header-based auth on a trusted network and migrate to JWT for external-facing deployments without changing client code +- Clean integration with gRPC metadata conventions +- Auth logic is isolated in interceptors, keeping service business logic auth-agnostic + +**Negative:** + +- Default mode is **not safe** without a network trust boundary — any caller that can reach the gRPC port can claim any tenant ID or role; operators must be explicit about this in their deployment security model +- The distinction between "trusted network mode" and "validated JWT mode" must be clearly communicated in documentation to avoid misconfiguration in production +- Mixing metadata-header auth with JWT requires careful interceptor ordering to avoid header spoofing when JWT is enabled diff --git a/docs/adr/ADR-005-multi-tenant-rls.md b/docs/adr/ADR-005-multi-tenant-rls.md new file mode 100644 index 00000000..36c90ea0 --- /dev/null +++ b/docs/adr/ADR-005-multi-tenant-rls.md @@ -0,0 +1,41 @@ +# ADR-005: Multi-Tenant Isolation via Row-Level Security + +**Date:** 2026-06-05 +**Status:** Accepted +**Deciders:** OpenDecree maintainers + +## Context + +OpenDecree is a multi-tenant service where all tenants share a single PostgreSQL database. Tenant data (config values, schema definitions, audit entries) must be strictly isolated — one tenant must never be able to read or modify another tenant's data. + +Isolation can be enforced at multiple layers: + +- **Application layer** — every query includes a `WHERE tenant_id = $1` predicate +- **Database layer** — PostgreSQL Row-Level Security (RLS) policies prevent rows from being accessed by the wrong session role +- **Schema-per-tenant / database-per-tenant** — stronger isolation but significant operational overhead + +Application-layer-only enforcement is fragile: a missing `WHERE` clause or a future query that skips the filter silently leaks cross-tenant data. Relying solely on application logic also means that direct DB access (e.g., during incident response or by a future admin tool) bypasses the isolation guarantee. + +## Decision + +All tables that store tenant-scoped data include a `tenant_id` column. PostgreSQL Row-Level Security policies are enabled on those tables and restrict each session to rows matching the session's `app.tenant_id` setting. + +The Go application sets `SET LOCAL app.tenant_id = $1` at the start of each request transaction (extracted from the authenticated gRPC metadata). RLS policies then enforce isolation at the database engine level, independent of the application query logic. + +Application queries still include `tenant_id` filters where appropriate for index efficiency, but RLS provides a defense-in-depth backstop. + +## Consequences + +**Positive:** + +- Defense in depth: tenant isolation is enforced at the database engine level, not only in application code +- A missing application-level `WHERE tenant_id = ...` clause results in an empty result set (RLS filters it out) rather than a data leak +- Direct database access by operators or admin tooling respects the same isolation boundary when session variables are set correctly +- No per-tenant schema or database management overhead — all tenants share the same schema + +**Negative:** + +- RLS policies add a small per-query overhead as PostgreSQL evaluates the policy predicate on every row access +- Query design must account for RLS: some bulk admin operations that span tenants require a dedicated privileged role that bypasses RLS (e.g., a `decree_admin` role used only for migrations and superadmin queries) +- Developers unfamiliar with RLS may find debugging unexpected empty results non-obvious if the session variable is not set +- `sqlc`-generated queries must be reviewed to ensure they are compatible with RLS and do not inadvertently rely on bypassing it diff --git a/docs/adr/ADR-006-go-module-split.md b/docs/adr/ADR-006-go-module-split.md new file mode 100644 index 00000000..9babbb6d --- /dev/null +++ b/docs/adr/ADR-006-go-module-split.md @@ -0,0 +1,53 @@ +# ADR-006: Go Module Split (8 Modules) + +**Date:** 2026-06-05 +**Status:** Accepted +**Deciders:** OpenDecree maintainers + +## Context + +OpenDecree is a monorepo that contains both a server (with heavy infrastructure dependencies: gRPC server framework, PostgreSQL driver, Redis client, OTel SDK) and multiple client SDKs intended to be imported by end-user Go services. + +If all code lived in a single Go module, SDK consumers would transitively pull in all server-side dependencies (database drivers, migration tools, server-only OTel exporters, etc.). This violates the principle of minimal consumer dependency footprint. + +Additionally, different consumers have different Go version requirements: + +- The server runs in a controlled build environment and can use the latest Go version +- CLI users installing via `go install` are likely on a recent but not necessarily bleeding-edge version +- SDK consumers may be on an older stable Go version (Go 1.22 was chosen as the lowest stable common ground) + +A single module cannot simultaneously satisfy Go 1.22 compatibility for SDK consumers while using Go 1.25 language features in the server. + +## Decision + +The repository is split into 8 Go modules with a tiered Go version policy: + +| Module | Path | Go version | Notes | +|--------|------|------------|-------| +| Server (root) | `.` | 1.25 | Full server binary, all infrastructure deps | +| API | `api/` | 1.24 | Generated proto stubs, consumed by transport + tools | +| CLI | `cmd/decree/` | 1.24 | User-facing CLI, matches transport floor | +| gRPC transport | `sdk/grpctransport/` | 1.24 | gRPC dial + interceptors for SDK clients | +| SDK tools | `sdk/tools/` | 1.24 | Shared SDK utilities, depends on api | +| Config client | `sdk/configclient/` | 1.22 | Lightweight SDK: read config values | +| Admin client | `sdk/adminclient/` | 1.22 | Lightweight SDK: manage schemas and tenants | +| Config watcher | `sdk/configwatcher/` | 1.22 | Lightweight SDK: stream config changes | + +A `go.work` workspace file ties all modules together for local development and is **not committed to git** (gitignored), so consumers never see it. + +During development, inter-module dependencies use `replace` directives in each module's `go.mod` to point at the local worktree. Published versions use normal module paths and semantic version tags. + +## Consequences + +**Positive:** + +- SDK consumers (configclient, adminclient, configwatcher) import only lightweight modules with minimal transitive dependencies — no server-side infrastructure pulled in +- Each tier can declare its own minimum Go version, letting SDK consumers stay on Go 1.22 while the server uses Go 1.25 features +- Module boundaries make it structurally impossible for SDK code to accidentally import server internals + +**Negative:** + +- `replace` directives in `go.mod` files must be kept in sync during local development; a missing replace causes confusing "module not found" errors +- `go.work` is required for a working local development environment but is gitignored — new contributors must generate it (via `go work init && go work use ./...` or a `make` target) +- Releasing requires tagging 8 modules in the correct dependency order (leaves first) +- `go get` upgrades must be applied per module rather than globally diff --git a/docs/adr/README.md b/docs/adr/README.md new file mode 100644 index 00000000..5829e374 --- /dev/null +++ b/docs/adr/README.md @@ -0,0 +1,48 @@ +# Architecture Decision Records + +This directory contains Architecture Decision Records (ADRs) for OpenDecree. ADRs document significant architectural choices, the context that motivated them, and their consequences. + +## Format + +Each ADR follows this structure: + +- **Context** — the situation or problem that prompted the decision +- **Decision** — what was decided +- **Consequences** — the results of the decision, both positive and negative + +## Index + +| ADR | Title | Status | +|-----|-------|--------| +| [ADR-001](ADR-001-grpc-over-rest.md) | gRPC over REST | Accepted | +| [ADR-002](ADR-002-specs-first-workflow.md) | Specs-First Workflow | Accepted | +| [ADR-003](ADR-003-postgresql-redis-architecture.md) | PostgreSQL + Redis Architecture | Accepted | +| [ADR-004](ADR-004-metadata-headers-first-auth.md) | Metadata-Headers-First Authentication | Accepted | +| [ADR-005](ADR-005-multi-tenant-rls.md) | Multi-Tenant Isolation via Row-Level Security | Accepted | +| [ADR-006](ADR-006-go-module-split.md) | Go Module Split (8 Modules) | Accepted | + +## Adding a new ADR + +1. Create a file named `ADR-NNN-short-title.md` (increment the number) +2. Use the template below +3. Add a row to the index above + +```markdown +# ADR-NNN: Title + +**Date:** YYYY-MM-DD +**Status:** Accepted +**Deciders:** OpenDecree maintainers + +## Context + +[What situation or problem prompted this decision] + +## Decision + +[What was decided] + +## Consequences + +[What are the results of this decision — good and bad] +```