Skip to content

Commit 45b502b

Browse files
committed
docs: add coordinator proxy design document
1 parent ad216e9 commit 45b502b

1 file changed

Lines changed: 388 additions & 0 deletions

File tree

docs/coordinator-proxy.md

Lines changed: 388 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,388 @@
1+
# Coordinator Proxy
2+
3+
The **Coordinator Proxy** is a standalone HTTP reverse-proxy application that sits between prover clients and multiple upstream Scroll L2 Coordinators. It exposes the same REST API surface as a real coordinator, allowing provers to connect transparently without knowing that requests are being load-balanced across a pool of backend coordinators.
4+
5+
---
6+
7+
## Table of Contents
8+
9+
- [Overview](#overview)
10+
- [Architecture](#architecture)
11+
- [Key Components](#key-components)
12+
- [Data Structures](#data-structures)
13+
- [API Endpoints](#api-endpoints)
14+
- [Authentication Flow](#authentication-flow)
15+
- [Task Routing](#task-routing)
16+
- [Configuration](#configuration)
17+
- [Usage](#usage)
18+
- [Design Decisions](#design-decisions)
19+
20+
---
21+
22+
## Overview
23+
24+
In a production environment, a single coordinator may not be sufficient to serve a large fleet of provers. The coordinator proxy addresses this by:
25+
26+
- **Multiplexing** a single prover connection across multiple upstream coordinators.
27+
- **Authenticating** provers locally using the same JWT challenge-response mechanism as a real coordinator.
28+
- **Maintaining per-upstream, per-prover sessions** (including bearer tokens) so each coordinator sees the prover as a direct client.
29+
- **Routing `get_task` and `submit_proof` requests** intelligently across upstreams with sticky priority and random fallback.
30+
31+
Because the proxy exposes the standard coordinator API (`/coordinator/v1/...`), existing prover SDKs require no changes to work through the proxy.
32+
33+
---
34+
35+
## Architecture
36+
37+
```
38+
┌─────────────┐ ┌─────────────────────┐ ┌─────────────────┐
39+
│ Prover SDK │─────>│ Coordinator Proxy │─────>│ Coordinator A │
40+
└─────────────┘ │ (port 8590) │ └─────────────────┘
41+
│ │ ┌─────────────────┐
42+
│ • Local Auth │─────>│ Coordinator B │
43+
│ • Session Manager │ └─────────────────┘
44+
│ • Task Router │ ┌─────────────────┐
45+
│ • Token Cache │─────>│ Coordinator C │
46+
└─────────────────────┘ └─────────────────┘
47+
```
48+
49+
### Entry Point
50+
51+
| File | Role |
52+
|------|------|
53+
| `coordinator/cmd/proxy/main.go` | Thin wrapper that invokes the CLI application. |
54+
| `coordinator/cmd/proxy/app/app.go` | Bootstraps the proxy: parses `ProxyConfig`, optionally initializes the database, and starts the HTTP server with graceful shutdown. |
55+
| `coordinator/cmd/proxy/app/flags.go` | Defines HTTP server flags (`--http`, `--http.addr`, `--http.port`). Default port is `8590`. |
56+
57+
---
58+
59+
## Key Components
60+
61+
### 1. Auth Controller (`auth.go`)
62+
63+
Handles the `/login` endpoint locally, then fans out a **proxy-login** to every upstream coordinator asynchronously.
64+
65+
- Runs standard challenge-response validation (reuses existing coordinator auth logic).
66+
- On success, spawns goroutines to call `proxy_login` on each upstream.
67+
- Stores returned upstream tokens in the prover's session.
68+
69+
### 2. GetTask Controller (`get_task.go`)
70+
71+
Routes prover task requests to the most appropriate upstream.
72+
73+
- First tries the **priority upstream** (sticky routing from a previous successful task assignment).
74+
- If the priority upstream has no task, shuffles the remaining upstreams randomly and tries each until one returns a task.
75+
- Prefixes the returned `taskID` with the upstream name (`upstream:taskID`) so that `submit_proof` can route correctly.
76+
77+
### 3. SubmitProof Controller (`submit_proof.go`)
78+
79+
Forwards proof submissions to the correct upstream coordinator.
80+
81+
- Parses the `upstream:taskID` prefix to determine the target coordinator.
82+
- Uses the cached upstream token from the prover session.
83+
- On success, clears the priority upstream (task is complete).
84+
85+
### 4. Client Manager (`client_manager.go`)
86+
87+
Manages the proxy's own identity and bearer token for each upstream.
88+
89+
- Keeps a cached `upClient` with a background re-login goroutine.
90+
- If the cached token expires or becomes invalid, `Reset()` clears it so a fresh login is attempted.
91+
92+
### 5. Prover Session Manager (`prover_session.go`)
93+
94+
Maintains in-memory (and optionally DB-backed) sessions for every prover public key.
95+
96+
- Each session holds a map of `upstream -> loginToken`.
97+
- Transparently refreshes expired tokens.
98+
- Implements a session size limit; when exceeded, the old session map is rotated to a deprecated map rather than deleted immediately.
99+
100+
### 6. Priority Upstream Manager
101+
102+
Stores sticky routing hints (`publicKey -> upstreamName`).
103+
104+
- When a prover successfully receives a task from an upstream, that upstream becomes the priority for the next `get_task` call.
105+
- Can be persisted to the database so routing preferences survive restarts.
106+
107+
### 7. HTTP Client (`client.go`)
108+
109+
Low-level HTTP client (`upClient`) that implements the `ProxyCli` and `ProverCli` interfaces.
110+
111+
- `ProxyCli`: used by the proxy itself to log in to an upstream.
112+
- `ProverCli`: used to impersonate a prover when calling `get_task` or `submit_proof` on an upstream.
113+
114+
---
115+
116+
## Data Structures
117+
118+
### Configuration
119+
120+
```go
121+
// ProxyConfig — top-level configuration
122+
type ProxyConfig struct {
123+
ProxyManager *ProxyManager `json:"proxy_manager"`
124+
ProxyName string `json:"proxy_name"`
125+
Coordinators map[string]*UpStream `json:"coordinators"`
126+
}
127+
128+
// ProxyManager — auth, verifier, client identity, optional DB
129+
type ProxyManager struct {
130+
Verifier *VerifierConfig `json:"verifier"` // minimum prover version
131+
Client *ProxyClient `json:"proxy_cli"` // proxy's own identity
132+
Auth *Auth `json:"auth"` // JWT secret & expiry
133+
DB *database.Config `json:"db,omitempty"`
134+
}
135+
136+
// ProxyClient — identity the proxy uses to authenticate with upstreams
137+
type ProxyClient struct {
138+
ProxyName string `json:"proxy_name"`
139+
ProxyVersion string `json:"proxy_version,omitempty"`
140+
Secret string `json:"secret,omitempty"`
141+
}
142+
143+
// UpStream — per-coordinator connection settings
144+
type UpStream struct {
145+
BaseUrl string `json:"base_url"`
146+
RetryCount uint `json:"retry_count"`
147+
RetryWaitTime uint `json:"retry_wait_time_sec"`
148+
ConnectionTimeoutSec uint `json:"connection_timeout_sec"`
149+
CompatibileMode bool `json:"compatible_mode,omitempty"`
150+
}
151+
```
152+
153+
### Runtime Data Structures
154+
155+
```go
156+
// Client interface — abstracts per-upstream access
157+
type Client interface {
158+
Client(string) ProverCli // token-bound prover client
159+
ClientAsProxy(context.Context) ProxyCli // proxy's own authenticated client
160+
Name() string
161+
}
162+
163+
// upClient — actual HTTP implementation
164+
type upClient struct {
165+
httpClient *http.Client
166+
baseURL string
167+
loginToken string
168+
compatibileMode bool
169+
resetFromMgr func()
170+
}
171+
172+
// ProverManager — registry of active prover sessions
173+
type ProverManager struct {
174+
data map[string]*proverSession
175+
willDeprecatedData map[string]*proverSession
176+
sizeLimit int
177+
persistent *proverDataPersist
178+
// ... prometheus metrics
179+
}
180+
181+
// proverSession — per-prover tokens across all upstreams
182+
type proverSession struct {
183+
persistent *proverDataPersist
184+
proverToken map[string]loginToken
185+
completionCtx context.Context
186+
}
187+
188+
// loginToken — upstream token with a monotonic phase
189+
type loginToken struct {
190+
token string
191+
phase uint
192+
}
193+
```
194+
195+
### Database Models (Optional Persistence)
196+
197+
When a database is configured, the proxy persists the following tables:
198+
199+
| Table | Columns | Purpose |
200+
|-------|---------|---------|
201+
| `prover_sessions` | `public_key`, `upstream`, `up_token`, `expired` | Stores upstream tokens per prover so restarts do not force re-login. |
202+
| `priority_upstream` | `public_key`, `upstream` | Restores sticky routing preferences after restart. |
203+
204+
---
205+
206+
## API Endpoints
207+
208+
### Exposed API (Prover → Proxy)
209+
210+
All endpoints are mounted under `/coordinator/v1/`.
211+
212+
| Method | Path | Auth | Description |
213+
|--------|------|------|-------------|
214+
| `GET` | `/challenge` | None | Returns a challenge token for the prover to sign. |
215+
| `POST` | `/login` | Challenge | Validates the prover signature, then fans out `proxy_login` to all upstreams. |
216+
| `POST` | `/get_task` | JWT | Routes a task request to the best available upstream. |
217+
| `POST` | `/submit_proof` | JWT | Forwards the proof to the upstream that issued the task. |
218+
219+
### Upstream API Calls (Proxy → Coordinator)
220+
221+
| Method | Path | Purpose |
222+
|--------|------|---------|
223+
| `GET` | `/coordinator/v1/challenge` | Obtain a challenge token for proxy login. |
224+
| `POST` | `/coordinator/v1/login` | Proxy authenticates itself as a client. |
225+
| `POST` | `/coordinator/v1/proxy_login` | Forward prover identity to the upstream. |
226+
| `POST` | `/coordinator/v1/get_task` | Forward prover task request. |
227+
| `POST` | `/coordinator/v1/submit_proof` | Forward proof submission. |
228+
229+
### Key Interfaces
230+
231+
```go
232+
// ProxyCli — proxy's own client to an upstream
233+
type ProxyCli interface {
234+
Login(ctx context.Context, genLogin func(string) (*types.LoginParameter, error)) (*ctypes.Response, error)
235+
ProxyLogin(ctx context.Context, param *types.LoginParameter) (*ctypes.Response, error)
236+
Token() string
237+
Reset()
238+
}
239+
240+
// ProverCli — prover-impersonating client to an upstream
241+
type ProverCli interface {
242+
GetTask(ctx context.Context, param *types.GetTaskParameter) (*ctypes.Response, error)
243+
SubmitProof(ctx context.Context, param *types.SubmitProofParameter) (*ctypes.Response, error)
244+
}
245+
```
246+
247+
---
248+
249+
## Authentication Flow
250+
251+
### 1. Prover → Proxy `/login`
252+
253+
1. Prover requests a challenge from the proxy (`GET /challenge`).
254+
2. Prover signs the challenge and sends it to `POST /login`.
255+
3. The proxy validates the signature locally using the same logic as a real coordinator.
256+
4. On success, the proxy spawns asynchronous goroutines to call `POST /proxy_login` on **every** upstream coordinator.
257+
5. Each upstream returns a bearer token specific to that prover.
258+
6. The proxy stores all upstream tokens in the prover's session and returns its own JWT to the prover.
259+
260+
### 2. Token Refresh & Resilience
261+
262+
Both `GetTask` and `SubmitProof` implement a retry-on-token-expiry pattern:
263+
264+
1. Try the request with the cached upstream token.
265+
2. If the upstream returns `ErrJWTTokenExpired` or `ErrJWTCommonErr`, trigger `maintainLogin` to refresh the token.
266+
3. Retry the request once with the new token.
267+
268+
The `ClientManager` also maintains a background login goroutine for the proxy's own identity; if the cached client fails, `Reset()` clears it so a fresh login is attempted on the next call.
269+
270+
---
271+
272+
## Task Routing
273+
274+
### Sticky Priority Routing
275+
276+
When a prover successfully receives a task from an upstream, that upstream is recorded as the **priority upstream** for that prover. On the next `get_task` call, the proxy tries the priority upstream first.
277+
278+
### Random Fallback
279+
280+
If the priority upstream has no available task (or returns an error), the proxy shuffles the remaining upstreams randomly and tries each one until a task is returned or all upstreams are exhausted.
281+
282+
### Task ID Namespacing
283+
284+
To ensure `submit_proof` routes to the correct upstream, the proxy prefixes the task ID returned to the prover:
285+
286+
```
287+
upstreamName:originalTaskID
288+
```
289+
290+
The `SubmitProof` controller parses this prefix, extracts the upstream name, and forwards the proof to the correct coordinator.
291+
292+
---
293+
294+
## Configuration
295+
296+
Example `config_proxy.json`:
297+
298+
```json
299+
{
300+
"proxy_manager": {
301+
"proxy_cli": {
302+
"proxy_name": "proxy_name",
303+
"secret": "client private key"
304+
},
305+
"auth": {
306+
"secret": "proxy secret key",
307+
"challenge_expire_duration_sec": 3600,
308+
"login_expire_duration_sec": 3600
309+
},
310+
"verifier": {
311+
"min_prover_version": "v4.4.45",
312+
"verifiers": []
313+
},
314+
"db": {
315+
"driver_name": "postgres",
316+
"dsn": "postgres://localhost/scroll?sslmode=disable",
317+
"maxOpenNum": 200,
318+
"maxIdleNum": 20
319+
}
320+
},
321+
"coordinators": {
322+
"sepolia": {
323+
"base_url": "http://localhost:8555",
324+
"retry_count": 10,
325+
"retry_wait_time_sec": 10,
326+
"connection_timeout_sec": 30
327+
}
328+
}
329+
}
330+
```
331+
332+
### Field Reference
333+
334+
| Field | Description |
335+
|-------|-------------|
336+
| `proxy_manager.proxy_cli.secret` | ECDSA private key material used to derive the proxy's signing key for upstream login. |
337+
| `proxy_manager.auth.secret` | JWT HMAC key for prover-to-proxy sessions. |
338+
| `proxy_manager.verifier.min_prover_version` | Minimum prover version allowed to connect. |
339+
| `proxy_manager.db` | Optional database configuration. If omitted, the proxy runs in memory-only mode (no persistence across restarts). |
340+
| `coordinators.*.base_url` | HTTP endpoint of the upstream coordinator. |
341+
| `coordinators.*.compatible_mode` | If `true`, skips `proxy_login` and uses standard login with a dummy token (for legacy coordinators). |
342+
343+
---
344+
345+
## Usage
346+
347+
### Build
348+
349+
```bash
350+
cd coordinator
351+
make proxy
352+
```
353+
354+
### Run
355+
356+
```bash
357+
./build/bin/coordinator_proxy --config conf/config_proxy.json
358+
```
359+
360+
### Run with custom HTTP address
361+
362+
```bash
363+
./build/bin/coordinator_proxy --config conf/config_proxy.json --http.addr 0.0.0.0 --http.port 8590
364+
```
365+
366+
### Prover Connection
367+
368+
Provers connect to the proxy exactly as they would connect to a real coordinator:
369+
370+
```
371+
https://proxy.example.com/coordinator/v1
372+
```
373+
374+
No SDK changes are required.
375+
376+
---
377+
378+
## Design Decisions
379+
380+
| Decision | Rationale |
381+
|----------|-----------|
382+
| **Sticky routing for tasks** | Reduces cross-coordinator state churn; a prover that received a task from upstream A is likely to get the next task from the same upstream. |
383+
| **Random load balancing as fallback** | Simple and stateless; avoids hot-spotting when one upstream runs out of tasks. |
384+
| **Task ID namespacing** | Allows the proxy to remain stateless for proof submissions; the task ID itself encodes the routing target. |
385+
| **Session size limit with rotation** | Prevents unbounded memory growth in the proxy; old sessions are moved to a deprecated map and eventually garbage-collected. |
386+
| **Phase-based token updates** | A monotonic `phase` counter on `loginToken` prevents stale concurrent login attempts from overwriting a fresher token. |
387+
| **Compatible mode** | Allows the proxy to work with older coordinators that do not support the `proxy_login` endpoint. |
388+
| **Optional DB persistence** | The proxy can run entirely in memory for simplicity, or use a database for token and routing persistence across restarts. |

0 commit comments

Comments
 (0)