|
| 1 | +# TinkerScript Design Blueprint / TinkerScript 设计蓝图 |
| 2 | + |
| 3 | +[English](#english-version) | [中文](#chinese-version) |
| 4 | + |
| 5 | +--- |
| 6 | + |
| 7 | +<a id="english-version"></a> |
| 8 | +## 🇬🇧 English Version |
| 9 | + |
| 10 | +### 1. Overview |
| 11 | +**TinkerScript** is an experimental component of AgentJet designed to decouple the **Training Logic** from the **Agent Execution Logic**. It allows users to train **full-weight LLM models** on machines without GPUs (e.g., a laptop) by offloading the actual model computation to a remote GPU server. |
| 12 | + |
| 13 | +Unlike traditional setups where the user code must run inside the training cluster, TinkerScript allows you to verify and run your agent logic locally while the heavy lifting (training & inference) happens remotely. |
| 14 | + |
| 15 | +### 2. Core Architecture |
| 16 | + |
| 17 | +The system involves two main parties: the **TinkerScript Server** (running on the GPU cluster) and the **TinkerScript Client** (running on your local machine). |
| 18 | + |
| 19 | +```mermaid |
| 20 | +graph TD |
| 21 | + subgraph "GPU Cluster (Server Side)" |
| 22 | + TrainingLoop[Training Loop (AgentJet/GRPO)] |
| 23 | + TSS[TinkerScript Server (FastAPI)] |
| 24 | + ZMQ[ZeroMQ / IPC] |
| 25 | + SharedMem[(Shared Memory)] |
| 26 | + LLM[LLM Engine (vLLM/SGLang)] |
| 27 | + end |
| 28 | +
|
| 29 | + subgraph "User Laptop / CPU Cluster (Client Side)" |
| 30 | + UserScript[User Script (python while loop)] |
| 31 | + AgentLogic[Agent Logic / Tools] |
| 32 | + end |
| 33 | +
|
| 34 | + TrainingLoop -- "1. Generate Task" --> SharedMem |
| 35 | + SharedMem -- "2. Register Episode" --> TSS |
| 36 | +
|
| 37 | + UserScript -- "3. Claim Episode (HTTP)" --> TSS |
| 38 | + TSS -- "4. Return API Key & Base URL" --> UserScript |
| 39 | +
|
| 40 | + UserScript -- "5. Inference (OpenAI API)" --> LLM |
| 41 | + LLM -- "Token Stream" --> UserScript |
| 42 | +
|
| 43 | + UserScript -- "6. Submit Reward (HTTP)" --> TSS |
| 44 | + TSS -- "7. Push Result" --> ZMQ |
| 45 | + ZMQ -- "8. Update Weights" --> TrainingLoop |
| 46 | +``` |
| 47 | + |
| 48 | +### 3. Detailed Workflow |
| 49 | + |
| 50 | +The workflow relies on a "Claim & Submit" model. The training loop generates tasks ("Episodes") and waits for external workers to pick them up. |
| 51 | + |
| 52 | +```mermaid |
| 53 | +sequenceDiagram |
| 54 | + participant TL as Training Loop (Internal) |
| 55 | + participant S as Server (FastAPI) |
| 56 | + participant C as Client (User Script) |
| 57 | + participant M as LLM Model |
| 58 | +
|
| 59 | + Note over TL, S: 1. Task Generation |
| 60 | + TL->>S: Register Episode (Status: Unclaimed) |
| 61 | +
|
| 62 | + Note over C, S: 2. Task Acquisition |
| 63 | + loop Worker Loop |
| 64 | + C->>S: POST /claim_episode |
| 65 | + alt No Tasks |
| 66 | + S-->>C: Retry Later |
| 67 | + else Task Available |
| 68 | + S->>S: Mark as "Claimed" |
| 69 | + S-->>C: Return {EpisodeID, OpenAI_BaseURL, API_Key} |
| 70 | + end |
| 71 | +
|
| 72 | + Note over C, M: 3. Execution (Rollout) |
| 73 | + C->>M: Chat Completion Request (Inference) |
| 74 | + M-->>C: Response (Generation) |
| 75 | + C->>C: Calculate Reward (e.g., Verify Math Answer) |
| 76 | +
|
| 77 | + Note over C, S: 4. Result Submission |
| 78 | + C->>S: POST /end_episode {Reward, Metadata} |
| 79 | + S->>TL: Forward Result via ZeroMQ |
| 80 | + S->>S: Delete Episode Record (Complete) |
| 81 | + end |
| 82 | +``` |
| 83 | + |
| 84 | +### 4. Episode State Machine |
| 85 | + |
| 86 | +To handle network failures or client crashes, the server maintains a state machine for every episode. |
| 87 | + |
| 88 | +```mermaid |
| 89 | +stateDiagram-v2 |
| 90 | + [*] --> Registered |
| 91 | + Registered --> Unclaimed_Queue : Add to Queue |
| 92 | +
|
| 93 | + Unclaimed_Queue --> Claimed : Client requests task |
| 94 | +
|
| 95 | + Claimed --> Completed : Client submits result |
| 96 | + Claimed --> Registered : Client Timeout / Crash |
| 97 | +
|
| 98 | + Completed --> [*] : Removed from Memory |
| 99 | +``` |
| 100 | + |
| 101 | +* **Registered**: Task created by the training algorithm. |
| 102 | +* **Claimed**: A client is currently working on it. |
| 103 | +* **Timeout**: If a client claims a task but doesn't report back within `allow_discard_timeout`, the server reverts the status to **Registered** so another client can try. |
| 104 | + |
| 105 | +### 5. Implementation Example |
| 106 | + |
| 107 | +The user experience is designed to be minimal. You simply query the remote server for a "job", do the work, and report the "score". |
| 108 | + |
| 109 | +```python |
| 110 | +# User-side Code Concept |
| 111 | +def rollout(task): |
| 112 | + # 1. Handshake & Claim (Get credentials for this specific episode) |
| 113 | + api_baseurl_key = tinkerjet_remote.begin_episode() |
| 114 | + |
| 115 | + # 2. Run your existing agent logic using standard OpenAI format |
| 116 | + workflow_output = execute_agent(task, api_baseurl_key) |
| 117 | + |
| 118 | + # 3. Submit results |
| 119 | + tinkerjet_remote.end_episode(workflow_output) |
| 120 | + return workflow_output.reward |
| 121 | +``` |
| 122 | + |
| 123 | +### 6. Limitations |
| 124 | + |
| 125 | +1. **Strict OpenAI Protocol**: Users must use the OpenAI `base_url` + `api_key` pattern. Internal access (like direct model object access) is not available. |
| 126 | +2. **Implicit Multi-Agent Handling**: AgentJet cannot explicitly distinguish different agents in a multi-agent scenario via API, though it attempts to merge timeline shards automatically. |
| 127 | +3. **No Prompt Tuning**: TinkerScript is designed for full-weight model training, not for soft-prompt tuning. |
| 128 | + |
| 129 | +--- |
| 130 | + |
| 131 | +<a id="chinese-version"></a> |
| 132 | +## 🇨🇳 中文版本 (Chinese Version) |
| 133 | + |
| 134 | +### 1. 概述 (Overview) |
| 135 | +**TinkerScript** 是 AgentJet 的一个实验性组件,旨在将 **训练逻辑 (Training Logic)** 与 **Agent 执行逻辑 (Execution Logic)** 解耦。它允许用户在 **没有 GPU** 的机器上(例如普通笔记本电脑)训练 **全参数 LLM 模型**,计算压力完全由远程 GPU 服务器承担。 |
| 136 | + |
| 137 | +与传统的将用户代码嵌入训练集群的方式不同,TinkerScript 允许你在本地运行并验证 Agent 逻辑,通过网络与远程训练循环交互。 |
| 138 | + |
| 139 | +### 2. 核心架构 (Core Architecture) |
| 140 | + |
| 141 | +系统包含两个主要部分:运行在 GPU 集群上的 **TinkerScript Server** 和运行在本地的 **TinkerScript Client**。 |
| 142 | + |
| 143 | +```mermaid |
| 144 | +graph TD |
| 145 | + subgraph "GPU 集群 (Server 端)" |
| 146 | + TrainingLoop[训练循环 (AgentJet/GRPO)] |
| 147 | + TSS[TinkerScript Server (FastAPI)] |
| 148 | + ZMQ[ZeroMQ / IPC 通信] |
| 149 | + SharedMem[(共享内存)] |
| 150 | + LLM[LLM 推理引擎 (vLLM/SGLang)] |
| 151 | + end |
| 152 | +
|
| 153 | + subgraph "用户笔记本 / CPU 集群 (Client 端)" |
| 154 | + UserScript[用户脚本 (Python While Loop)] |
| 155 | + AgentLogic[Agent 业务逻辑 / 工具调用] |
| 156 | + end |
| 157 | +
|
| 158 | + TrainingLoop -- "1. 生成任务 (Task)" --> SharedMem |
| 159 | + SharedMem -- "2. 注册 Episode" --> TSS |
| 160 | +
|
| 161 | + UserScript -- "3. 领取任务 (HTTP Claim)" --> TSS |
| 162 | + TSS -- "4. 返回 API Key 与 Base URL" --> UserScript |
| 163 | +
|
| 164 | + UserScript -- "5. 推理请求 (OpenAI 协议)" --> LLM |
| 165 | + LLM -- "生成 Token 流" --> UserScript |
| 166 | +
|
| 167 | + UserScript -- "6. 提交 Reward (HTTP End)" --> TSS |
| 168 | + TSS -- "7. 推送结果" --> ZMQ |
| 169 | + ZMQ -- "8. 更新权重" --> TrainingLoop |
| 170 | +``` |
| 171 | + |
| 172 | +### 3. 详细工作流 (Detailed Workflow) |
| 173 | + |
| 174 | +基于“领取 (Claim) - 提交 (Submit)”模式。训练循环生成任务(Episode),等待外部 Worker 领取执行。 |
| 175 | + |
| 176 | +```mermaid |
| 177 | +sequenceDiagram |
| 178 | + participant TL as 训练循环 (内部) |
| 179 | + participant S as Server (FastAPI) |
| 180 | + participant C as Client (用户脚本) |
| 181 | + participant M as LLM 模型服务 |
| 182 | +
|
| 183 | + Note over TL, S: 1. 任务生成阶段 |
| 184 | + TL->>S: 注册 Episode (状态: Unclaimed) |
| 185 | +
|
| 186 | + Note over C, S: 2. 任务领取阶段 |
| 187 | + loop Worker Loop |
| 188 | + C->>S: POST /claim_episode (请求任务) |
| 189 | + alt 无可用任务 |
| 190 | + S-->>C: 请稍后重试 |
| 191 | + else 有可用任务 |
| 192 | + S->>S: 标记为 "Claimed" |
| 193 | + S-->>C: 返回 {EpisodeID, OpenAI_BaseURL, API_Key} |
| 194 | + end |
| 195 | +
|
| 196 | + Note over C, M: 3. 执行阶段 (Rollout) |
| 197 | + C->>M: Chat Completion 请求 (推理通过网络回传) |
| 198 | + M-->>C: 返回生成结果 |
| 199 | + C->>C: 计算 Reward (例如: 验证数学答案) |
| 200 | +
|
| 201 | + Note over C, S: 4. 结果提交阶段 |
| 202 | + C->>S: POST /end_episode {Reward, Metadata} |
| 203 | + S->>TL: 通过 ZeroMQ 转发结果给训练器 |
| 204 | + S->>S: 删除 Episode 记录 (完成) |
| 205 | + end |
| 206 | +``` |
| 207 | + |
| 208 | +### 4. 状态机管理 (Episode State Machine) |
| 209 | + |
| 210 | +为了处理网络波动或客户端崩溃(Crash),服务端为每个 Episode 维护了一个状态机。 |
| 211 | + |
| 212 | +```mermaid |
| 213 | +stateDiagram-v2 |
| 214 | + [*] --> Registered (已注册) |
| 215 | + Registered --> Unclaimed_Queue : 加入待领取队列 |
| 216 | +
|
| 217 | + Unclaimed_Queue --> Claimed (已被领取) : 客户端请求任务 |
| 218 | +
|
| 219 | + Claimed --> Completed (已完成) : 客户端提交结果 |
| 220 | + Claimed --> Registered (已注册) : 客户端超时 / 崩溃 |
| 221 | +
|
| 222 | + Completed --> [*] : 从内存中移除 |
| 223 | +``` |
| 224 | + |
| 225 | +* **Registered (已注册)**: 训练算法生成了该任务,等待被执行。 |
| 226 | +* **Claimed (已被领取)**: 某个 Client 正在处理该任务。 |
| 227 | +* **Timeout (超时)**: 如果 Client 领取任务后在规定时间 (`allow_discard_timeout`) 内未提交结果,服务器会将状态重置为 **Registered**,允许其他 Client 重新领取该任务(容错机制)。 |
| 228 | + |
| 229 | +### 5. 实现代码示例 |
| 230 | + |
| 231 | +用户侧的代码非常简洁。简而言之:向远程服务器要一个“活儿”,干完活,上报“得分”。 |
| 232 | + |
| 233 | +```python |
| 234 | +# 用户侧代码概念演示 |
| 235 | +def rollout(task): |
| 236 | + # 1. 握手 & 领取任务 (获取当前 Episode 专属的鉴权信息) |
| 237 | + api_baseurl_key = tinkerjet_remote.begin_episode() |
| 238 | + |
| 239 | + # 2. 运行你现有的 Agent 逻辑 (使用标准 OpenAI 接口) |
| 240 | + workflow_output = execute_agent(task, api_baseurl_key) |
| 241 | + |
| 242 | + # 3. 提交结果 |
| 243 | + tinkerjet_remote.end_episode(workflow_output) |
| 244 | + return workflow_output.reward |
| 245 | +``` |
| 246 | + |
| 247 | +### 6. 局限性 (Limitations) |
| 248 | + |
| 249 | +1. **严格依赖 OpenAI 协议**: 用户必须使用 OpenAI `base_url` + `api_key` 的方式与模型交互。无法获取模型内部对象(Weights/Gradients)。 |
| 250 | +2. **隐式多智能体处理**: 在多智能体(Multi-Agent)场景下,AgentJet 无法通过 API 显式区分不同的 Agent 角色,但后台会尝试自动合并时间线片段。 |
| 251 | +3. **不支持 Prompt Tuning**: TinkerScript 专为全量模型微调设计,不支持 Soft-Prompt Tuning 等轻量级微调。 |
0 commit comments