docs: add enterprise phase 1 roadmap

MackDing · MackDing · commit b423d903c814 · 2026-03-14T03:07:23.000+08:00
diff --git a/README.md b/README.md
@@ -119,6 +119,7 @@ Core modules:
 - `src/cron/scheduler.js`: proactive scheduled push
 
 Enterprise target architecture: [docs/enterprise-architecture.md](/Users/ding/Documents/Code/Github/codex-telegram-claws/docs/enterprise-architecture.md)
+Enterprise Phase 1 roadmap: [docs/phase-1-roadmap.md](/Users/ding/Documents/Code/Github/codex-telegram-claws/docs/phase-1-roadmap.md)
 
 ## Routing and MCP Boundary
 
diff --git a/docs/enterprise-architecture.md b/docs/enterprise-architecture.md
@@ -82,6 +82,8 @@ Subagents should be triggered only after:
 
 ## Recommended Deployment Phases
 
+Implementation roadmap: [phase-1-roadmap.md](/Users/ding/Documents/Code/Github/codex-telegram-claws/docs/phase-1-roadmap.md)
+
 ### Phase 1: Harden Current Single-Host Beta
 
 - Migrate core runtime modules to TypeScript.
diff --git a/docs/phase-1-roadmap.md b/docs/phase-1-roadmap.md
@@ -0,0 +1,152 @@
+# Phase 1 Roadmap
+
+## Scope
+
+Phase 1 hardens the current single-host beta so it can be distributed to subsidiary CTO teams as a controlled enterprise beta. This phase does not introduce the full control-plane and worker split yet. It makes the existing runtime governable, testable, and easier to operate.
+
+## Success Criteria
+
+- Core runtime contracts are type-safe and documented.
+- Dangerous actions require explicit approval and produce audit events.
+- Health, logs, and smoke tests are machine-readable and usable in operations.
+- A new subsidiary team can install and validate the bot with a repeatable checklist.
+
+## Workstreams
+
+### 1. TypeScript Migration For Core Runtime
+
+Scope:
+
+- Migrate `src/config.js`
+- Migrate `src/orchestrator/router.js`
+- Migrate `src/orchestrator/skillRegistry.js`
+- Migrate `src/orchestrator/mcpClient.js`
+- Migrate `src/runner/ptyManager.js`
+- Migrate `src/runner/shellManager.js`
+
+Deliverables:
+
+- `tsconfig.json`
+- build and typecheck commands
+- stable interfaces for config, skill contracts, runner sessions, and runtime state
+
+Acceptance:
+
+- `npm run typecheck` passes
+- existing tests still pass
+- no runtime behavior regression in `/status`, `/repo`, `/mcp`, `/gh`, `/sh`
+
+### 2. Audit Event Model
+
+Scope:
+
+- Define a structured event schema for operator actions and bot decisions.
+- Capture user identity, chat id, project, command, worker host, result, and timestamp.
+
+Deliverables:
+
+- audit event schema document
+- append-only local event sink for beta
+- hooks in Telegram handlers, skill execution, shell execution, and restart flow
+
+Acceptance:
+
+- every privileged action emits an event
+- audit records can be exported as JSON lines
+
+### 3. Approval Gates For Dangerous Actions
+
+Scope:
+
+- Add an approval state machine for write-capable shell actions
+- Add approval for `git push`, repo creation, and other GitHub write actions
+
+Deliverables:
+
+- approval command flow
+- pending approval state storage
+- localized operator-facing prompts
+
+Acceptance:
+
+- dangerous actions cannot execute without explicit approval
+- approval and denial both create audit events
+
+### 4. Structured Logging And Health Output
+
+Scope:
+
+- Replace ad hoc console output with structured logs
+- Add machine-readable health output for automation
+
+Deliverables:
+
+- JSON log mode
+- healthcheck `--json` output
+- operator-visible startup summary
+
+Acceptance:
+
+- logs can be ingested by PM2 or external log collectors
+- healthcheck can be parsed by CI and supervisor tooling
+
+### 5. Telegram Regression Coverage
+
+Scope:
+
+- Extend beyond `getMe` smoke checks
+- Validate critical Telegram command paths against a real bot
+
+Deliverables:
+
+- scripted regression checks for `/status`, `/repo`, `/language`, `/verbose`, `/mcp list`
+- operator runbook for live regression
+
+Acceptance:
+
+- regression script can run in a controlled staging bot environment
+- failures are visible in CI or release gating
+
+### 6. Subsidiary Deployment Pack
+
+Scope:
+
+- Make first-time installation repeatable for subsidiary CTO teams
+
+Deliverables:
+
+- environment checklist
+- service account guidance
+- directory isolation requirements
+- token and secret handling guide
+- PM2 deployment example per host
+
+Acceptance:
+
+- a new team can deploy from docs without direct maintainer intervention
+
+## Recommended Execution Order
+
+1. TypeScript migration for `config`, `router`, and `skillRegistry`
+2. Audit event schema and append-only sink
+3. Approval flow for dangerous actions
+4. Structured logging and machine-readable health output
+5. Telegram regression automation
+6. Subsidiary deployment pack finalization
+
+## Risks
+
+- Migrating runtime modules to TypeScript without preserving behavior will create operational regressions.
+- Approval flow added too late leaves write actions under-governed.
+- Regression checks that depend on a personal bot token are not suitable for shared enterprise CI.
+- Audit logs without a stable schema will become unusable once the control-plane split starts.
+
+## Out Of Scope
+
+- Multi-worker control plane
+- SSO/OIDC and enterprise RBAC
+- centralized database-backed audit store
+- full tenant isolation
+- webhook-based high-availability Telegram ingress
+
+These begin in Phase 2.