Skip to content

Commit f749e3f

Browse files
georgeglarsonclaude
andcommitted
Add Venice AI resilience, debug CLI, and HP clamping fixes
Venice client rewrite: circuit breaker (5-failure threshold, 30s recovery), retry with backoff for transient errors, error classification (timeout/auth/ rate_limit/server_error/network), latency histogram, and full metrics snapshot for diagnostics. Debug CLI (tools/debug-cli.js): non-interactive probe for AI-assisted diagnostics — players, mobs, aggro, stats, health checks (10 anomaly detectors), and Venice API metrics with live connectivity tests. Bug fixes: HP clamping at 0 on overkill damage across Mob, Player, and AIPlayer (was going negative). AIPlayer level property derived from equipment. Debug server: AI/human player distinction, Venice metrics, async commands. Tests: 86 new tests (47 venice-client, 39 aiplayer). Suite: 45 files, 2,342 tests, 0 failures. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 165f8ae commit f749e3f

12 files changed

Lines changed: 2124 additions & 64 deletions

File tree

README.md

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ Most of my career has been taking old systems and making them maintainable. Frac
1212

1313
The starting point was [BrowserQuest](https://github.com/mozilla/BrowserQuest), Mozilla's 2012 HTML5 demo. A JavaScript prototype with no types, no tests, God-object classes, and everything coupled to everything. I picked it because it's a good stand-in for what legacy modernization actually looks like: code that works but can't scale, can't be safely changed, and has no safety net.
1414

15-
What you're looking at now is **250 TypeScript files**, **2,255 passing tests**, a real-time multiplayer game with AI-driven NPCs, zone-based combat, persistent player progression, and a production deployment behind nginx with SSL. The original codebase is still in there (every entity, every sprite, every tile) but the architecture around it is unrecognizable.
15+
What you're looking at now is **250 TypeScript files**, **2,342 passing tests**, a real-time multiplayer game with AI-driven NPCs, zone-based combat, persistent player progression, and a production deployment behind nginx with SSL. The original codebase is still in there (every entity, every sprite, every tile) but the architecture around it is unrecognizable.
1616

1717
## The legacy modernization story
1818

@@ -91,7 +91,11 @@ Production-grade monitoring stack using the same tools and patterns as commercia
9191

9292
**Log-trace correlation.** `pino-opentelemetry-transport` injects `trace_id` and `span_id` into every log line and ships logs via OTLP to the same collector that receives traces. Click a trace in SigNoz, see every log that happened during that request.
9393

94-
**Self-hosted dashboards.** SigNoz (ClickHouse-backed) with dashboards for server operations and AI/persistence monitoring. Four alert rules: aggro tick P99 latency, Venice AI response time, error rate spike, and save frequency drop. All running on the same VPS with ClickHouse capped at 2GB.
94+
**Self-hosted dashboards.** SigNoz (ClickHouse-backed) with dashboards for server operations and AI/persistence monitoring. Public Grafana dashboards for portfolio demos. All running on the same VPS with ClickHouse capped at 2GB.
95+
96+
**Venice AI resilience.** Circuit breaker (opens after 5 failures, 30s recovery), retry with backoff for transient errors, error classification (timeout, auth, rate_limit, server_error, network), latency histogram, and per-call metrics. Survives API outages without impacting gameplay.
97+
98+
**Debug CLI.** A non-interactive diagnostic probe (`tools/debug-cli.js`) that connects to the game server's debug WebSocket and reports: player/mob state, aggro links, server stats, structured logs, automated health checks (10 anomaly detectors), and Venice AI metrics with live connectivity tests. Designed to be invoked by AI development tools during troubleshooting sessions.
9599

96100
```
97101
Game Server ──OTLP HTTP──→ OTel Collector ──→ ClickHouse
@@ -103,7 +107,7 @@ Game Server ──OTLP HTTP──→ OTel Collector ──→ ClickHouse
103107
## Test suite
104108

105109
```
106-
43 test files | 2,255 tests | 0 failures
110+
45 test files | 2,342 tests | 0 failures
107111
```
108112

109113
| Module | Coverage |
@@ -152,6 +156,12 @@ pnpm run dev
152156
# Tests
153157
pnpm test
154158
pnpm test:coverage
159+
160+
# Debug CLI (requires running server)
161+
pnpm run debug health # Anomaly detection
162+
pnpm run debug players # Connected players
163+
pnpm run debug venice health # Venice AI connectivity test
164+
pnpm run debug watch 10 # Stream state for 10s
155165
```
156166

157167
Client connects to `localhost:8000` by default. For production, configure `client/config/config.prod.json`.
@@ -189,16 +199,20 @@ Fracture/
189199
│ ├── storage/ # SQLite persistence (instrumented with spans)
190200
│ ├── utils/logger.ts # Pino structured logging + OTel transport
191201
│ ├── world/ # Spatial manager, spawn manager, game loop
192-
│ └── __tests__/ # Test suite (43 files)
202+
│ └── __tests__/ # Test suite (45+ files)
193203
├── shared/ts/ # Shared types (27 files)
194204
│ ├── zones/ # Zone boundaries and bonuses
195205
│ ├── skills/ # Skill definitions
196206
│ ├── items/ # Item types, legendaries, rarity
197207
│ └── events/ # Typed event bus
208+
├── tools/ # Development utilities
209+
│ ├── debug-cli.js # Non-interactive debug probe (AI-assisted diagnostics)
210+
│ └── tui.js # Nethack-style live terminal dashboard
198211
├── deploy/ # Deployment configs
199212
│ ├── signoz/ # OTel Collector config
213+
│ ├── grafana/ # Public dashboard provisioning
200214
│ └── common/ # ClickHouse configs
201-
├── docker-compose.signoz.yml # SigNoz observability stack
215+
├── docker-compose.signoz.yml # SigNoz + Grafana observability stack
202216
└── specs/ # Feature specifications
203217
```
204218

@@ -208,7 +222,7 @@ Fracture/
208222
- **Systems design.** Combat, inventory, progression, zones, AI, persistence, real-time networking, all integrated and tested.
209223
- **Observability engineering.** Structured logging, distributed tracing, and self-hosted monitoring wired end-to-end. The same OTel + Pino + SigNoz stack used in production microservices, applied to a game server.
210224
- **AI-augmented development.** Built with Claude as a development partner, showing what one engineer can ship with AI tooling.
211-
- **Testing discipline.** 2,255 tests, coverage thresholds enforced, tests written before refactors.
225+
- **Testing discipline.** 2,300+ tests, coverage thresholds enforced, tests written before refactors.
212226
- **Production operations.** SSL, reverse proxy, rate limiting, anti-exploit guards, Docker Compose infrastructure, deployed and running.
213227

214228
## Credits

package.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,8 @@
1616
"test:watch": "vitest",
1717
"test:coverage": "vitest run --coverage",
1818
"test:ui": "vitest --ui",
19-
"tui": "node tools/tui.js"
19+
"tui": "node tools/tui.js",
20+
"debug": "node tools/debug-cli.js"
2021
},
2122
"dependencies": {
2223
"@opentelemetry/api": "^1.9.0",

0 commit comments

Comments
 (0)