You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+52-4Lines changed: 52 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,7 +26,9 @@ This is the same approach that works on enterprise software. It works on games t
26
26
27
27
**4. Make the architecture earn its keep.** Every abstraction exists because it solved a real problem. The SpatialManager exists because the aggro tick was O(n\*m) and needed spatial partitioning. The AggroPolicy exists because safe zones, density caps, and level scaling were scattered across three files. The EventBus exists because mob death needed to notify five decoupled systems.
28
28
29
-
**5. Ship it.** The game is live, behind nginx with SSL. It handles concurrent players. It's been through a full security audit. It's deployed and maintained.
29
+
**5. Add observability.** You can't maintain what you can't see. Structured logging (Pino), distributed tracing (OpenTelemetry), and a self-hosted monitoring dashboard (SigNoz) so every request, every save, every AI call is traceable end-to-end. The same stack you'd wire up for a production microservice, applied to a game server.
30
+
31
+
**6. Ship it.** The game is live, behind nginx with SSL. It handles concurrent players. It's been through a full security audit. It's deployed and maintained.
30
32
31
33
## What I built on top of the legacy code
32
34
@@ -54,6 +56,10 @@ Decomposed an 1,100-line God-object Player class into focused modules using SRP.
54
56
│ │ Venice AI SDK (llama-3.3-70b) + Fish Audio TTS │ │
@@ -75,6 +81,25 @@ Seven progression zones from village (safe) to boss arena. An AggroPolicy engine
75
81
76
82
Socket.IO WebSocket transport with 105 message types. Spatial partitioning for zone-based broadcasting so mobs only scan players in adjacent groups, not all players globally. Party system with proximity-based XP sharing. Per-message-type rate limiting. Spawn protection. Anti-exploit validation. SQLite persistence for all player state.
77
83
84
+
### Observability
85
+
86
+
Production-grade monitoring stack using the same tools and patterns as commercial microservices:
87
+
88
+
**Structured logging.** Every `console.*` call (316 across 48 files) replaced with Pino structured logging. 50 modules emit JSON logs with typed context: player IDs, item kinds, damage values, zone names. Player-scoped child loggers automatically attach identity to every log line. Hot-path logging (aggro ticks, movement) uses `trace` level that Pino skips entirely unless explicitly enabled.
89
+
90
+
**Distributed tracing.** OpenTelemetry SDK with manual span instrumentation on the paths that matter: message routing (`player.message.{type}`), persistence operations (`storage.saveCharacter`, `storage.loadPlayerState`), aggro ticks (`game.aggro_tick` with mob count attributes), and external AI calls (`ai.venice`, `ai.tts` with latency tracking). HTTP auto-instrumentation covers Socket.IO transport. 10% sampling in production to control volume.
91
+
92
+
**Log-trace correlation.**`pino-opentelemetry-transport` injects `trace_id` and `span_id` into every log line and ships logs via OTLP to the same collector that receives traces. Click a trace in SigNoz, see every log that happened during that request.
93
+
94
+
**Self-hosted dashboards.** SigNoz (ClickHouse-backed) with dashboards for server operations and AI/persistence monitoring. Four alert rules: aggro tick P99 latency, Venice AI response time, error rate spike, and save frequency drop. All running on the same VPS with ClickHouse capped at 2GB.
95
+
96
+
```
97
+
Game Server ──OTLP HTTP──→ OTel Collector ──→ ClickHouse
-**Legacy modernization.** Taking a real codebase from 2012 and systematically improving it without rewriting from scratch.
161
208
-**Systems design.** Combat, inventory, progression, zones, AI, persistence, real-time networking, all integrated and tested.
209
+
-**Observability engineering.** Structured logging, distributed tracing, and self-hosted monitoring wired end-to-end. The same OTel + Pino + SigNoz stack used in production microservices, applied to a game server.
162
210
-**AI-augmented development.** Built with Claude as a development partner, showing what one engineer can ship with AI tooling.
163
211
-**Testing discipline.** 2,255 tests, coverage thresholds enforced, tests written before refactors.
0 commit comments