Skip to content

Commit 72c339d

Browse files
committed
fix: harden runtime and deployment defaults
1 parent 686d9b9 commit 72c339d

12 files changed

Lines changed: 220 additions & 14 deletions

File tree

.env.example

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,4 +13,5 @@ LOG_LEVEL="info"
1313
TRUST_PROXY=0
1414
DOCS_ENABLED=true
1515
METRICS_ENABLED=false
16+
METRICS_AUTH_TOKEN=""
1617
BCRYPT_ROUNDS=10

.env.production.example

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,5 +17,6 @@ RATE_LIMIT_MAX_REQUESTS=100
1717
LOG_LEVEL="info"
1818
TRUST_PROXY=1
1919
DOCS_ENABLED=true
20-
METRICS_ENABLED=true
20+
METRICS_ENABLED=false
21+
METRICS_AUTH_TOKEN=""
2122
BCRYPT_ROUNDS=10

.github/workflows/deploy.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,9 @@ concurrency:
2929
permissions:
3030
contents: read
3131

32+
env:
33+
RAILWAY_CLI_VERSION: 4.31.0
34+
3235
jobs:
3336
resolve-context:
3437
name: resolve deployment context
@@ -237,7 +240,7 @@ jobs:
237240
node-version: 20
238241

239242
- name: Install Railway CLI
240-
run: npm install --global @railway/cli@latest
243+
run: npm install --global @railway/cli@${RAILWAY_CLI_VERSION}
241244

242245
- name: Deploy to Railway
243246
env:

README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -174,6 +174,7 @@ LOG_LEVEL="info"
174174
TRUST_PROXY=0
175175
DOCS_ENABLED=true
176176
METRICS_ENABLED=false
177+
METRICS_AUTH_TOKEN=""
177178
BCRYPT_ROUNDS=10
178179
```
179180

@@ -206,6 +207,7 @@ Deployment automation is implemented through [`.github/workflows/deploy.yml`](./
206207
- manual dispatch for intentional non-production deployments
207208
- exact-ref verification before any deployment
208209
- smoke validation for `/health`, `/ready`, and `/docs.json`
210+
- a pinned Railway CLI version for deterministic release promotion
209211

210212
Deployment setup material:
211213

@@ -229,6 +231,8 @@ Enable metrics locally with `METRICS_ENABLED=true` and expose:
229231

230232
- `GET /metrics`
231233

234+
If metrics are enabled on any non-local environment, prefer setting `METRICS_AUTH_TOKEN` or keeping the route private at the network layer instead of exposing raw Prometheus output publicly.
235+
232236
Local observability assets:
233237

234238
- [`docs/observability.md`](./docs/observability.md)

docs/deployment/railway.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,7 @@ At minimum, configure:
5252
On Railway, define `DATABASE_URL` and `REDIS_URL` on the `auth-api` service itself by referencing the backing services, rather than assuming those values are shared automatically across services.
5353

5454
`TRUST_PROXY=1` is recommended for Railway because the service sits behind a proxy.
55+
`METRICS_ENABLED=false` is the safer production default unless the metrics route stays private or is protected with `METRICS_AUTH_TOKEN`.
5556

5657
## GitHub Environments and secrets
5758

@@ -115,6 +116,8 @@ The workflow now has four explicit phases:
115116

116117
The workflow clears the default GitHub Actions `CI=true` value for the deploy step so Railway waits for the deployment result instead of switching to build-only CI mode.
117118

119+
The Railway CLI version is pinned in the workflow on purpose. Update that version deliberately, in reviewable code, rather than pulling `latest` during a production promotion.
120+
118121
Concurrency is grouped by environment, not by a single hardcoded production bucket, so staging and production deploy queues remain isolated.
119122

120123
## Railway config as code
@@ -157,6 +160,6 @@ Recommended manual configuration in GitHub:
157160

158161
Branch protection should continue to require the `quality` and `integration` jobs from `.github/workflows/ci.yml` for `main`.
159162

160-
## Current limitation
163+
## Current production demo
161164

162-
The repository automation is ready for deployment, but an actual public demo URL still depends on the Railway project existing and the required GitHub Environment secrets being configured correctly.
165+
The public demo is live at `https://auth-api-production-a97b.up.railway.app`.

docs/observability.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,11 @@ This repository exposes a minimal Prometheus-compatible metrics surface focused
1212

1313
The metrics endpoint is disabled by default and must be enabled explicitly with `METRICS_ENABLED=true`.
1414

15+
For non-local environments, prefer one of these patterns:
16+
17+
- keep `/metrics` on a private network only
18+
- set `METRICS_AUTH_TOKEN` and require a bearer token from the scraper
19+
1520
## Run the service with metrics enabled
1621

1722
Start the application locally with metrics exposed on `/metrics`:
@@ -20,6 +25,12 @@ Start the application locally with metrics exposed on `/metrics`:
2025
METRICS_ENABLED=true npm run dev
2126
```
2227

28+
If you want to exercise the authenticated path locally, provide a token:
29+
30+
```bash
31+
METRICS_ENABLED=true METRICS_AUTH_TOKEN=local-observability-token npm run dev
32+
```
33+
2334
If you are using the full local stack, ensure PostgreSQL and Redis are running first:
2435

2536
```bash

src/config/env.ts

Lines changed: 26 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import { config as loadDotenv } from "dotenv";
22
import { z } from "zod";
3+
import { durationToMs } from "../utils/duration";
34

45
loadDotenv({ path: ".env", override: false, quiet: true });
56

@@ -33,6 +34,28 @@ const booleanFromEnv = z.preprocess((value) => {
3334
return value;
3435
}, z.boolean());
3536

37+
const durationString = z
38+
.string()
39+
.trim()
40+
.min(1)
41+
.refine((value) => {
42+
try {
43+
durationToMs(value);
44+
return true;
45+
} catch {
46+
return false;
47+
}
48+
}, "must use a supported duration format such as 15m, 7d, 60s, or 1000");
49+
50+
const optionalMetricsAuthToken = z.preprocess((value) => {
51+
if (typeof value !== "string") {
52+
return value;
53+
}
54+
55+
const trimmed = value.trim();
56+
return trimmed.length === 0 ? undefined : trimmed;
57+
}, z.string().min(16, "METRICS_AUTH_TOKEN must have at least 16 characters").optional());
58+
3659
const envSchema = z.object({
3760
NODE_ENV: z.enum(["development", "test", "production"]).default("development"),
3861
PORT: z.coerce.number().int().positive().default(3000),
@@ -45,8 +68,8 @@ const envSchema = z.object({
4568
.string()
4669
.trim()
4770
.min(32, "REFRESH_TOKEN_SECRET must have at least 32 characters"),
48-
ACCESS_TOKEN_EXPIRES_IN: z.string().trim().min(1).default("15m"),
49-
REFRESH_TOKEN_EXPIRES_IN: z.string().trim().min(1).default("7d"),
71+
ACCESS_TOKEN_EXPIRES_IN: durationString.default("15m"),
72+
REFRESH_TOKEN_EXPIRES_IN: durationString.default("7d"),
5073
JWT_ISSUER: z.string().trim().min(1).default("auth-api"),
5174
JWT_AUDIENCE: z.string().trim().min(1).default("auth-api-clients"),
5275
REDIS_URL: optionalString,
@@ -58,6 +81,7 @@ const envSchema = z.object({
5881
TRUST_PROXY: z.coerce.number().int().min(0).default(0),
5982
DOCS_ENABLED: booleanFromEnv.default(true),
6083
METRICS_ENABLED: booleanFromEnv.default(false),
84+
METRICS_AUTH_TOKEN: optionalMetricsAuthToken,
6185
BCRYPT_ROUNDS: z.coerce.number().int().min(8).max(15).default(10),
6286
});
6387

src/controllers/metricsController.ts

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,30 @@ import {
44
metricsEnabled,
55
renderMetrics,
66
} from "../metrics/authMetrics";
7+
import { env } from "../config/env";
78

8-
export async function metrics(_req: Request, res: Response) {
9+
const getBearerToken = (request: Request): string | null => {
10+
const authorization = request.header("authorization");
11+
const match = authorization?.match(/^Bearer\s+(.+)$/i);
12+
13+
return match?.[1]?.trim() || null;
14+
};
15+
16+
export async function metrics(req: Request, res: Response) {
917
if (!metricsEnabled) {
1018
return res.status(404).json({ message: "metrics disabled" });
1119
}
1220

21+
if (env.METRICS_AUTH_TOKEN && getBearerToken(req) !== env.METRICS_AUTH_TOKEN) {
22+
return res.status(401).json({
23+
error: {
24+
code: "METRICS_AUTHORIZATION_REQUIRED",
25+
message: "metrics authorization is required",
26+
correlationId: req.correlationId,
27+
},
28+
});
29+
}
30+
1331
res.setHeader("Content-Type", metricsContentType);
1432
res.setHeader("Cache-Control", "no-store");
1533

src/middlewares/rateLimiter.ts

Lines changed: 56 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,19 +9,60 @@ type RateLimiterOptions = {
99
maxRequests?: number;
1010
windowMs?: number;
1111
resolveKey: (req: Request) => string;
12+
memoryMaxKeys?: number;
1213
};
1314

1415
type MemoryBucket = {
1516
count: number;
1617
expiresAt: number;
18+
updatedAt: number;
1719
};
1820

1921
const memoryBuckets = new Map<string, MemoryBucket>();
22+
const DEFAULT_MEMORY_MAX_KEYS = 10_000;
23+
24+
const pruneExpiredMemoryBuckets = (now: number): void => {
25+
for (const [key, bucket] of memoryBuckets.entries()) {
26+
if (now > bucket.expiresAt) {
27+
memoryBuckets.delete(key);
28+
}
29+
}
30+
};
31+
32+
const evictOldestMemoryBucket = (): void => {
33+
let oldestKey: string | null = null;
34+
let oldestUpdatedAt = Number.POSITIVE_INFINITY;
35+
36+
for (const [key, bucket] of memoryBuckets.entries()) {
37+
if (bucket.updatedAt < oldestUpdatedAt) {
38+
oldestUpdatedAt = bucket.updatedAt;
39+
oldestKey = key;
40+
}
41+
}
42+
43+
if (oldestKey) {
44+
memoryBuckets.delete(oldestKey);
45+
}
46+
};
47+
48+
const consumeMemory = (
49+
key: string,
50+
windowMs: number,
51+
now: number,
52+
memoryMaxKeys: number,
53+
): number => {
54+
if (!memoryBuckets.has(key) && memoryBuckets.size >= memoryMaxKeys) {
55+
pruneExpiredMemoryBuckets(now);
56+
57+
if (!memoryBuckets.has(key) && memoryBuckets.size >= memoryMaxKeys) {
58+
evictOldestMemoryBucket();
59+
}
60+
}
2061

21-
const consumeMemory = (key: string, windowMs: number, now: number): number => {
2262
const current = memoryBuckets.get(key) ?? {
2363
count: 0,
2464
expiresAt: now + windowMs,
65+
updatedAt: now,
2566
};
2667

2768
if (now > current.expiresAt) {
@@ -30,6 +71,7 @@ const consumeMemory = (key: string, windowMs: number, now: number): number => {
3071
}
3172

3273
current.count += 1;
74+
current.updatedAt = now;
3375
memoryBuckets.set(key, current);
3476

3577
return current.count;
@@ -58,6 +100,7 @@ export function createRateLimiter({
58100
maxRequests = env.RATE_LIMIT_MAX_REQUESTS,
59101
windowMs = env.RATE_LIMIT_WINDOW_MS,
60102
resolveKey,
103+
memoryMaxKeys = DEFAULT_MEMORY_MAX_KEYS,
61104
}: RateLimiterOptions) {
62105
return async function rateLimiter(
63106
req: Request,
@@ -71,7 +114,7 @@ export function createRateLimiter({
71114
try {
72115
const count = redisEnabled
73116
? await consumeRedis(key, windowMs)
74-
: consumeMemory(key, windowMs, now);
117+
: consumeMemory(key, windowMs, now, memoryMaxKeys);
75118

76119
if (count > maxRequests) {
77120
authMetrics.recordRateLimitHit(bucket, redisEnabled ? "redis" : "memory");
@@ -90,7 +133,7 @@ export function createRateLimiter({
90133

91134
req.log.warn({ key, error }, "rate_limiter_fallback");
92135

93-
const count = consumeMemory(key, windowMs, now);
136+
const count = consumeMemory(key, windowMs, now, memoryMaxKeys);
94137
if (count > maxRequests) {
95138
authMetrics.recordRateLimitHit(bucket, "memory");
96139
return next(
@@ -111,3 +154,13 @@ export const authMutationRateLimiter = createRateLimiter({
111154
bucket: "auth",
112155
resolveKey: (req) => req.ip || "global",
113156
});
157+
158+
// Test hooks for the in-memory fail-soft store.
159+
export const __rateLimiterInternals = {
160+
clearMemoryBuckets(): void {
161+
memoryBuckets.clear();
162+
},
163+
getMemoryBucketCount(): number {
164+
return memoryBuckets.size;
165+
},
166+
};

tests/config/env.test.ts

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ describe("env config", () => {
2727
expect(env.LOG_LEVEL).toBe("silent");
2828
expect(env.DOCS_ENABLED).toBe(true);
2929
expect(env.METRICS_ENABLED).toBe(false);
30+
expect(env.METRICS_AUTH_TOKEN).toBeUndefined();
3031
});
3132

3233
it("fails fast when secrets are invalid", async () => {
@@ -39,4 +40,17 @@ describe("env config", () => {
3940
/ACCESS_TOKEN_SECRET|REFRESH_TOKEN_SECRET/,
4041
);
4142
});
43+
44+
it("fails fast when token durations use an unsupported format", async () => {
45+
process.env.NODE_ENV = "test";
46+
process.env.DATABASE_URL = "postgresql://auth_user:auth_password@localhost:5432/auth_api";
47+
process.env.ACCESS_TOKEN_SECRET = "test-access-secret-with-at-least-thirty-two-characters";
48+
process.env.REFRESH_TOKEN_SECRET =
49+
"test-refresh-secret-with-at-least-thirty-two-characters";
50+
process.env.ACCESS_TOKEN_EXPIRES_IN = "15 minutes";
51+
52+
await expect(import("../../src/config/env")).rejects.toThrow(
53+
/ACCESS_TOKEN_EXPIRES_IN/,
54+
);
55+
});
4256
});

0 commit comments

Comments
 (0)