Skip to content

Commit 528eb3f

Browse files
amosttAygentic
andcommitted
docs(gateway): add gateway-ready conventions, service communication docs, and reference compose [AYG-74]
Add Gateway-Ready Conventions and Service-to-Service Communication sections to README covering service discoverability endpoints, routing conventions, cross-cutting concerns table, managed platform guidance, {SERVICE_NAME}_URL env var pattern, shared HTTP client usage with correlation ID propagation, and ServiceError pattern for unconfigured services. Create reference compose.gateway.yml with Traefik 3.6, TLS/ACME, HTTP-to-HTTPS redirect, rate limiting middleware (commented), and example service labels — clearly marked as reference-only for self-hosted deployments. Fixes AYG-74 Related to AYG-64 🤖 Generated by Aygentic Co-Authored-By: Aygentic <noreply@aygentic.com>
1 parent 31e2f6c commit 528eb3f

2 files changed

Lines changed: 252 additions & 0 deletions

File tree

README.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -330,6 +330,119 @@ Run `alembic upgrade head` for Alembic migrations and `supabase db push` for Sup
330330

331331
---
332332

333+
## Gateway-Ready Conventions
334+
335+
This template is **gateway-ready, not gateway-inclusive** — every service follows conventions that make it routable through any API gateway (Traefik, Kong, AWS ALB, Alibaba Cloud API Gateway) without shipping one.
336+
337+
### Service Discoverability
338+
339+
Every service exposes standard endpoints that gateways use for health checking and service registration:
340+
341+
| Endpoint | Purpose | Auth Required |
342+
|----------|---------|---------------|
343+
| `GET /version` | Returns `service_name`, `version`, `commit`, `build_time`, `environment` | No |
344+
| `GET /healthz` | Liveness probe — is the process running? | No |
345+
| `GET /readyz` | Readiness probe — can the service handle traffic? | No |
346+
347+
### Routing Conventions
348+
349+
All API routes use the `/api/v1` prefix (configured via `API_V1_STR` in `backend/app/core/config.py`). The service name belongs in the deployment URL, not the API path:
350+
351+
```
352+
✓ https://user-service.example.com/api/v1/users
353+
✗ https://gateway.example.com/user-service/api/v1/users
354+
```
355+
356+
Path-based gateway routing is possible but not the default convention.
357+
358+
### Cross-Cutting Concerns
359+
360+
| Concern | Template Responsibility | Gateway Responsibility |
361+
|---------|------------------------|------------------------|
362+
| Authentication | Clerk JWT verification per-service | Optional JWT pre-validation |
363+
| Rate limiting | None (defer to gateway) | Per-client rate limits |
364+
| API key management | None (Clerk handles user auth) | Machine-to-machine API keys |
365+
| CORS | Per-service `BACKEND_CORS_ORIGINS` | Aggregate CORS if fronting multiple services |
366+
| TLS termination | None (platform provides) | Certificate management |
367+
| Request routing | Responds to all requests on its port | Routes by domain/path to services |
368+
| Load balancing | None (platform provides) | Distributes across service instances |
369+
370+
### Managed Platforms
371+
372+
Teams deploying to managed platforms (Railway, Cloud Run, Fly.io, Alibaba Cloud) use the platform's built-in routing and load balancing — Traefik is not needed. The gateway-ready conventions (health endpoints, `/api/v1` prefix, structured error responses) still apply, as platforms rely on these for health checking and service discovery.
373+
374+
For self-hosted deployments, see [`compose.gateway.yml`](compose.gateway.yml) for a reference Traefik 3.6 configuration.
375+
376+
---
377+
378+
## Service-to-Service Communication
379+
380+
Service communication uses a deliberately simple pattern: environment variables pointing to URLs. No service registry, no DNS-based discovery, no service mesh. This works because container platforms (Railway, Cloud Run, Fly.io) provide stable internal URLs for services within the same project.
381+
382+
### Service Discovery
383+
384+
Each service dependency is configured via an environment variable following the `{SERVICE_NAME}_URL` pattern:
385+
386+
```env
387+
# .env
388+
USER_SERVICE_URL=https://user-service.railway.internal
389+
BILLING_SERVICE_URL=https://billing-service.railway.internal
390+
```
391+
392+
Add one `{SERVICE_NAME}_URL` variable per dependency. When a URL is not configured, the calling code should fail fast with a clear error (see [Error Handling](#error-handling-for-unconfigured-services) below).
393+
394+
### Shared HTTP Client
395+
396+
All inter-service calls use the shared `HttpClient` (`backend/app/core/http_client.py`), which automatically:
397+
398+
- Propagates `X-Correlation-ID` and `X-Request-ID` headers
399+
- Applies configurable timeout, retry (with exponential backoff), and circuit breaker policies
400+
- Logs retries and exhausted retries with target URL and attempt count
401+
402+
```python
403+
from app.api.deps import HttpClientDep
404+
from app.core.config import settings
405+
406+
async def get_user(http: HttpClientDep, user_id: str):
407+
response = await http.get(f"{settings.USER_SERVICE_URL}/api/v1/users/{user_id}")
408+
response.raise_for_status()
409+
return response.json()
410+
```
411+
412+
### Correlation ID Propagation
413+
414+
Every incoming request is assigned a `request_id` (UUID v4) by the request middleware (`backend/app/core/middleware.py`). The middleware also reads the `X-Correlation-ID` header from the incoming request:
415+
416+
- **If `X-Correlation-ID` is present** and matches the expected format (alphanumeric, hyphens, underscores, dots; max 128 characters): it is used as the `correlation_id`
417+
- **If `X-Correlation-ID` is absent or invalid**: the `request_id` is used as the `correlation_id`
418+
419+
Both values are bound to structlog context variables and automatically propagated to outgoing HTTP calls via the shared `HttpClient`. This creates a trace that spans multiple services — all logs for a single user action share the same `correlation_id`.
420+
421+
### Error Handling for Unconfigured Services
422+
423+
When a service URL is not configured, fail immediately with a descriptive `ServiceError` rather than making a request to an empty URL:
424+
425+
```python
426+
from app.api.deps import HttpClientDep
427+
from app.core.config import settings
428+
from app.core.errors import ServiceError
429+
430+
async def get_user(http: HttpClientDep, user_id: str):
431+
if not settings.USER_SERVICE_URL:
432+
raise ServiceError(
433+
status_code=503,
434+
message="User service not configured",
435+
code="SERVICE_NOT_CONFIGURED",
436+
)
437+
response = await http.get(f"{settings.USER_SERVICE_URL}/api/v1/users/{user_id}")
438+
response.raise_for_status()
439+
return response.json()
440+
```
441+
442+
This returns a structured `503 SERVICE_UNAVAILABLE` response with the `SERVICE_NOT_CONFIGURED` error code, making it clear that the issue is a missing configuration — not a downstream service failure.
443+
444+
---
445+
333446
## Environment Variables
334447

335448
Configuration is loaded from `.env` (development) or passed as container environment variables (staging/production). Copy `.env.example` to `.env` to get started.

compose.gateway.yml

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,139 @@
1+
# =============================================================================
2+
# REFERENCE ONLY — NOT PART OF THE TEMPLATE
3+
# =============================================================================
4+
#
5+
# This file documents a self-hosted Traefik 3.6 gateway configuration for teams
6+
# that manage their own reverse proxy instead of using a managed gateway.
7+
#
8+
# Teams deploying to managed platforms (Railway, Cloud Run, Fly.io) should use
9+
# the platform's built-in routing and load balancing instead of running a
10+
# self-hosted Traefik instance.
11+
#
12+
# This is for teams wanting a self-hosted Traefik gateway on a VPS or
13+
# bare-metal server with automatic TLS via Let's Encrypt.
14+
#
15+
# HOW TO USE:
16+
# 1. Create the external network: docker network create traefik-public
17+
# 2. Set ACME_EMAIL and DOMAIN in your .env (or export them)
18+
# 3. Start the gateway: docker compose -f compose.gateway.yml up -d
19+
# 4. Add the example service labels (see bottom of this file) to your own
20+
# compose.yml services, then bring them up on the same traefik-public network
21+
#
22+
# THIS FILE IS NEVER USED BY THE TEMPLATE DIRECTLY.
23+
# It is documentation — a reference for self-hosted gateway deployments.
24+
# =============================================================================
25+
26+
services:
27+
28+
traefik:
29+
image: traefik:3.6
30+
restart: always
31+
ports:
32+
# HTTP — receives Let's Encrypt TLS challenges and redirects to HTTPS
33+
- "80:80"
34+
# HTTPS — TLS-terminated traffic
35+
- "443:443"
36+
volumes:
37+
# Docker socket (read-only) — Traefik reads labels from running containers
38+
- /var/run/docker.sock:/var/run/docker.sock:ro
39+
# Persistent volume for Let's Encrypt certificates
40+
- traefik-certificates:/certificates
41+
command:
42+
# --- Docker provider ---
43+
# Enable Docker label-based service discovery
44+
- --providers.docker
45+
# Only route containers that explicitly set traefik.enable=true
46+
- --providers.docker.exposedbydefault=false
47+
# Scope discovery to services tagged with traefik.constraint-label=traefik-public
48+
- --providers.docker.constraints=Label(`traefik.constraint-label`, `traefik-public`)
49+
50+
# --- Entrypoints ---
51+
- --entrypoints.http.address=:80
52+
- --entrypoints.https.address=:443
53+
54+
# --- HTTP → HTTPS redirect (global, via entrypoint configuration) ---
55+
- --entrypoints.http.http.redirections.entryPoint.to=https
56+
- --entrypoints.http.http.redirections.entryPoint.scheme=https
57+
- --entrypoints.http.http.redirections.entryPoint.permanent=true
58+
59+
# --- TLS / Let's Encrypt (ACME) ---
60+
# TLS-ALPN-01 challenge — requires port 443 to be publicly reachable.
61+
# If port 443 is firewalled or shared, switch to the HTTP-01 challenge:
62+
# - --certificatesresolvers.le.acme.httpchallenge=true
63+
# - --certificatesresolvers.le.acme.httpchallenge.entrypoint=http
64+
- --certificatesresolvers.le.acme.tlschallenge=true
65+
# Your email for Let's Encrypt renewal notifications
66+
- --certificatesresolvers.le.acme.email=${ACME_EMAIL?Variable ACME_EMAIL not set}
67+
# Store certificates in the mounted volume
68+
- --certificatesresolvers.le.acme.storage=/certificates/acme.json
69+
70+
# --- Observability ---
71+
- --accesslog
72+
- --log
73+
labels:
74+
- traefik.enable=true
75+
- traefik.docker.network=traefik-public
76+
77+
# https-redirect middleware — named alias so service routers can reference it
78+
# The global redirect is enforced by the entrypoint command args above;
79+
# this label makes the middleware available by name for per-router use.
80+
- traefik.http.middlewares.https-redirect.redirectscheme.scheme=https
81+
- traefik.http.middlewares.https-redirect.redirectscheme.permanent=true
82+
83+
# --- Rate limiting middleware (reference — not active by default) ---
84+
# To enable rate limiting on a service, add these labels to that service
85+
# and reference the middleware name in its router labels:
86+
# traefik.http.middlewares.rate-limit.ratelimit.average=100
87+
# traefik.http.middlewares.rate-limit.ratelimit.burst=50
88+
# traefik.http.middlewares.rate-limit.ratelimit.period=1m
89+
# Then apply it to a router:
90+
# traefik.http.routers.<name>-https.middlewares=rate-limit
91+
networks:
92+
- traefik-public
93+
94+
# =============================================================================
95+
# EXAMPLE BACKEND SERVICE (commented out — for reference only)
96+
# =============================================================================
97+
# Copy these labels into your compose.yml backend service.
98+
# Replace STACK_NAME, DOMAIN, the image reference, and the container port
99+
# to match your workload. The network and constraint-label entries are
100+
# required for Traefik to discover and route to your service.
101+
#
102+
# backend:
103+
# image: your-org/your-backend:latest
104+
# restart: always
105+
# networks:
106+
# - traefik-public
107+
# - default
108+
# labels:
109+
# - traefik.enable=true
110+
# - traefik.docker.network=traefik-public
111+
# - traefik.constraint-label=traefik-public
112+
#
113+
# # Container port the backend listens on
114+
# - traefik.http.services.${STACK_NAME:-app}-backend.loadbalancer.server.port=8000
115+
#
116+
# # HTTP router — redirects to HTTPS via the https-redirect middleware
117+
# - traefik.http.routers.${STACK_NAME:-app}-backend-http.rule=Host(`api.${DOMAIN:-localhost}`)
118+
# - traefik.http.routers.${STACK_NAME:-app}-backend-http.entrypoints=http
119+
# - traefik.http.routers.${STACK_NAME:-app}-backend-http.middlewares=https-redirect
120+
#
121+
# # HTTPS router — serves TLS-terminated traffic with Let's Encrypt cert
122+
# - traefik.http.routers.${STACK_NAME:-app}-backend-https.rule=Host(`api.${DOMAIN:-localhost}`)
123+
# - traefik.http.routers.${STACK_NAME:-app}-backend-https.entrypoints=https
124+
# - traefik.http.routers.${STACK_NAME:-app}-backend-https.tls=true
125+
# - traefik.http.routers.${STACK_NAME:-app}-backend-https.tls.certresolver=le
126+
#
127+
# # Optionally apply rate limiting (requires rate-limit middleware above):
128+
# # - traefik.http.routers.${STACK_NAME:-app}-backend-https.middlewares=rate-limit
129+
# =============================================================================
130+
131+
networks:
132+
# Shared external network — must be created before running this file:
133+
# docker network create traefik-public
134+
traefik-public:
135+
external: true
136+
137+
volumes:
138+
# Stores Let's Encrypt certificates across container restarts
139+
traefik-certificates:

0 commit comments

Comments
 (0)