Skip to content

Commit 9d2db92

Browse files
committed
docs: add enterprise architecture guide
1 parent fc82b78 commit 9d2db92

2 files changed

Lines changed: 153 additions & 0 deletions

File tree

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,8 @@ Core modules:
118118
- `src/runner/ptyManager.js`: Codex PTY process + streaming
119119
- `src/cron/scheduler.js`: proactive scheduled push
120120

121+
Enterprise target architecture: [docs/enterprise-architecture.md](/Users/ding/Documents/Code/Github/codex-telegram-claws/docs/enterprise-architecture.md)
122+
121123
## Routing and MCP Boundary
122124

123125
To avoid duplicated context fetch:

docs/enterprise-architecture.md

Lines changed: 151 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
# Enterprise Architecture
2+
3+
## Purpose
4+
5+
This document defines the target architecture when `codex-telegram-claws` is deployed as a financial enterprise engineering assistant for multiple subsidiary CTO teams. The current repository is a strong single-host beta. The enterprise target is a controlled multi-host platform.
6+
7+
## Target Operating Model
8+
9+
- One central Telegram control plane, owned by the platform team.
10+
- One worker agent per company, business unit, or regulated environment.
11+
- Each worker runs close to its own repositories, Codex CLI, MCP servers, and secrets.
12+
- The control plane never executes local shell or git actions directly against remote business units.
13+
14+
## Logical Architecture
15+
16+
```text
17+
Telegram User
18+
-> Control Plane API / Bot Gateway
19+
-> Identity + RBAC + Policy Engine
20+
-> Audit Log + Event Bus
21+
-> Worker Registry
22+
-> Subsidiary Worker A
23+
-> Codex CLI
24+
-> MCP Servers
25+
-> Git / CI / Safe Shell
26+
-> Subsidiary Worker B
27+
-> Subsidiary Worker C
28+
```
29+
30+
## Core Components
31+
32+
### Control Plane
33+
34+
- Terminates Telegram traffic and normalizes commands.
35+
- Resolves tenant, user role, target worker, and policy set.
36+
- Stores chat state, project selection, model override, and approval state.
37+
- Emits immutable audit events for every privileged operation.
38+
39+
### Worker Agent
40+
41+
- Runs on the subsidiary-owned host or VPC.
42+
- Owns local `node-pty`, Codex CLI, repo checkout, shell allowlist, MCP clients, and GitHub access.
43+
- Accepts signed task requests from the control plane.
44+
- Returns structured task events, output chunks, status, and final result.
45+
46+
### Policy Engine
47+
48+
- Controls who can use which model, worker, repo, MCP server, shell command, and GitHub operation.
49+
- Enforces read-only vs write permissions.
50+
- Requires approval for dangerous actions such as `git push`, repo creation, or production release actions.
51+
52+
### Audit And Observability
53+
54+
- Append-only audit trail for commands, approvals, model usage, and output status.
55+
- Structured logs, metrics, and health endpoints per worker.
56+
- Export path to SIEM or internal compliance tooling.
57+
58+
## Security Baseline
59+
60+
- Replace `ALLOWED_USER_IDS`-only trust with SSO/OIDC backed identity.
61+
- Add RBAC roles such as `platform_admin`, `subsidiary_cto`, `reviewer`, and `auditor`.
62+
- Store tokens in Vault, KMS, or another enterprise secret manager.
63+
- Require one service account per worker host.
64+
- Enforce one polling instance per bot token, or move the control plane to webhooks.
65+
- Keep shell disabled by default. Enable only per worker policy.
66+
67+
## Subagent Strategy
68+
69+
Subagents remain the control-plane execution units. In the enterprise model:
70+
71+
- `codex` stays the coding execution surface.
72+
- `github` becomes a governed change-management subagent.
73+
- `mcp` becomes a governed enterprise context subagent.
74+
- New subagents should cover architecture review, security control review, release governance, and dependency risk review.
75+
76+
Subagents should be triggered only after:
77+
78+
- policy validation
79+
- worker selection
80+
- tenant and repo authorization
81+
- optional approval checks for high-risk actions
82+
83+
## Recommended Deployment Phases
84+
85+
### Phase 1: Harden Current Single-Host Beta
86+
87+
- Migrate core runtime modules to TypeScript.
88+
- Add structured logs and machine-readable health output.
89+
- Add real Telegram regression checks beyond `getMe`.
90+
- Introduce approval gates for dangerous shell and GitHub operations.
91+
92+
### Phase 2: Introduce Control Plane + Worker Split
93+
94+
- Move Telegram bot logic into a central service.
95+
- Convert the current runtime into a worker daemon with a signed RPC interface.
96+
- Persist chat state and audit events in a database instead of a local JSON file.
97+
98+
### Phase 3: Enterprise Governance
99+
100+
- Integrate SSO/OIDC, RBAC, and centralized policy.
101+
- Add multi-tenant worker registry and tenant-scoped routing.
102+
- Add formal release, rollback, and disaster recovery procedures.
103+
104+
## TypeScript Recommendation
105+
106+
For enterprise rollout, migrate the following first:
107+
108+
- `src/config.js`
109+
- `src/orchestrator/router.js`
110+
- `src/orchestrator/skillRegistry.js`
111+
- `src/orchestrator/mcpClient.js`
112+
- `src/runner/ptyManager.js`
113+
- `src/runner/shellManager.js`
114+
115+
TypeScript matters here because config shape, skill contracts, worker RPC payloads, and audit event schemas must remain stable across teams and releases.
116+
117+
## First-Time Installation Guidance For Subsidiaries
118+
119+
- Install Node.js 20+ and Codex CLI.
120+
- Complete `codex login` on the worker host before starting the bot.
121+
- Use a dedicated service account and a dedicated bot token per environment.
122+
- Set `WORKSPACE_ROOT`, `CODEX_WORKDIR`, and `GITHUB_DEFAULT_WORKDIR` to controlled directories only.
123+
- Start with `SHELL_ENABLED=false`.
124+
- Run:
125+
126+
```bash
127+
npm install
128+
npm run ci
129+
npm run healthcheck:strict
130+
npm run telegram:smoke
131+
```
132+
133+
- Deploy with PM2 or another formal supervisor, not an ad hoc terminal session.
134+
135+
## Current Gap Summary
136+
137+
The current repository already has:
138+
139+
- PTY fallback and PTY preflight repair
140+
- per-project chat context
141+
- MCP and GitHub subagents
142+
- local health checks, CI, smoke checks, and release workflow
143+
144+
It still lacks:
145+
146+
- multi-worker control plane
147+
- enterprise identity and RBAC
148+
- approvals and policy enforcement
149+
- centralized audit storage
150+
- tenant isolation
151+
- TypeScript contracts for long-term maintainability

0 commit comments

Comments
 (0)