Skip to content

Commit 2178481

Browse files
committed
feat(tinkerscript): Add comprehensive design blueprint and workflow documentation
1 parent 21f9bb8 commit 2178481

File tree

3 files changed

+371
-90
lines changed

3 files changed

+371
-90
lines changed

ajet_tinkerscript.py

Lines changed: 0 additions & 90 deletions
This file was deleted.

tinkerscript.md

Lines changed: 251 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,251 @@
1+
# TinkerScript Design Blueprint / TinkerScript 设计蓝图
2+
3+
[English](#english-version) | [中文](#chinese-version)
4+
5+
---
6+
7+
<a id="english-version"></a>
8+
## 🇬🇧 English Version
9+
10+
### 1. Overview
11+
**TinkerScript** is an experimental component of AgentJet designed to decouple the **Training Logic** from the **Agent Execution Logic**. It allows users to train **full-weight LLM models** on machines without GPUs (e.g., a laptop) by offloading the actual model computation to a remote GPU server.
12+
13+
Unlike traditional setups where the user code must run inside the training cluster, TinkerScript allows you to verify and run your agent logic locally while the heavy lifting (training & inference) happens remotely.
14+
15+
### 2. Core Architecture
16+
17+
The system involves two main parties: the **TinkerScript Server** (running on the GPU cluster) and the **TinkerScript Client** (running on your local machine).
18+
19+
```mermaid
20+
graph TD
21+
subgraph "GPU Cluster (Server Side)"
22+
TrainingLoop[Training Loop (AgentJet/GRPO)]
23+
TSS[TinkerScript Server (FastAPI)]
24+
ZMQ[ZeroMQ / IPC]
25+
SharedMem[(Shared Memory)]
26+
LLM[LLM Engine (vLLM/SGLang)]
27+
end
28+
29+
subgraph "User Laptop / CPU Cluster (Client Side)"
30+
UserScript[User Script (python while loop)]
31+
AgentLogic[Agent Logic / Tools]
32+
end
33+
34+
TrainingLoop -- "1. Generate Task" --> SharedMem
35+
SharedMem -- "2. Register Episode" --> TSS
36+
37+
UserScript -- "3. Claim Episode (HTTP)" --> TSS
38+
TSS -- "4. Return API Key & Base URL" --> UserScript
39+
40+
UserScript -- "5. Inference (OpenAI API)" --> LLM
41+
LLM -- "Token Stream" --> UserScript
42+
43+
UserScript -- "6. Submit Reward (HTTP)" --> TSS
44+
TSS -- "7. Push Result" --> ZMQ
45+
ZMQ -- "8. Update Weights" --> TrainingLoop
46+
```
47+
48+
### 3. Detailed Workflow
49+
50+
The workflow relies on a "Claim & Submit" model. The training loop generates tasks ("Episodes") and waits for external workers to pick them up.
51+
52+
```mermaid
53+
sequenceDiagram
54+
participant TL as Training Loop (Internal)
55+
participant S as Server (FastAPI)
56+
participant C as Client (User Script)
57+
participant M as LLM Model
58+
59+
Note over TL, S: 1. Task Generation
60+
TL->>S: Register Episode (Status: Unclaimed)
61+
62+
Note over C, S: 2. Task Acquisition
63+
loop Worker Loop
64+
C->>S: POST /claim_episode
65+
alt No Tasks
66+
S-->>C: Retry Later
67+
else Task Available
68+
S->>S: Mark as "Claimed"
69+
S-->>C: Return {EpisodeID, OpenAI_BaseURL, API_Key}
70+
end
71+
72+
Note over C, M: 3. Execution (Rollout)
73+
C->>M: Chat Completion Request (Inference)
74+
M-->>C: Response (Generation)
75+
C->>C: Calculate Reward (e.g., Verify Math Answer)
76+
77+
Note over C, S: 4. Result Submission
78+
C->>S: POST /end_episode {Reward, Metadata}
79+
S->>TL: Forward Result via ZeroMQ
80+
S->>S: Delete Episode Record (Complete)
81+
end
82+
```
83+
84+
### 4. Episode State Machine
85+
86+
To handle network failures or client crashes, the server maintains a state machine for every episode.
87+
88+
```mermaid
89+
stateDiagram-v2
90+
[*] --> Registered
91+
Registered --> Unclaimed_Queue : Add to Queue
92+
93+
Unclaimed_Queue --> Claimed : Client requests task
94+
95+
Claimed --> Completed : Client submits result
96+
Claimed --> Registered : Client Timeout / Crash
97+
98+
Completed --> [*] : Removed from Memory
99+
```
100+
101+
* **Registered**: Task created by the training algorithm.
102+
* **Claimed**: A client is currently working on it.
103+
* **Timeout**: If a client claims a task but doesn't report back within `allow_discard_timeout`, the server reverts the status to **Registered** so another client can try.
104+
105+
### 5. Implementation Example
106+
107+
The user experience is designed to be minimal. You simply query the remote server for a "job", do the work, and report the "score".
108+
109+
```python
110+
# User-side Code Concept
111+
def rollout(task):
112+
# 1. Handshake & Claim (Get credentials for this specific episode)
113+
api_baseurl_key = tinkerjet_remote.begin_episode()
114+
115+
# 2. Run your existing agent logic using standard OpenAI format
116+
workflow_output = execute_agent(task, api_baseurl_key)
117+
118+
# 3. Submit results
119+
tinkerjet_remote.end_episode(workflow_output)
120+
return workflow_output.reward
121+
```
122+
123+
### 6. Limitations
124+
125+
1. **Strict OpenAI Protocol**: Users must use the OpenAI `base_url` + `api_key` pattern. Internal access (like direct model object access) is not available.
126+
2. **Implicit Multi-Agent Handling**: AgentJet cannot explicitly distinguish different agents in a multi-agent scenario via API, though it attempts to merge timeline shards automatically.
127+
3. **No Prompt Tuning**: TinkerScript is designed for full-weight model training, not for soft-prompt tuning.
128+
129+
---
130+
131+
<a id="chinese-version"></a>
132+
## 🇨🇳 中文版本 (Chinese Version)
133+
134+
### 1. 概述 (Overview)
135+
**TinkerScript** 是 AgentJet 的一个实验性组件,旨在将 **训练逻辑 (Training Logic)****Agent 执行逻辑 (Execution Logic)** 解耦。它允许用户在 **没有 GPU** 的机器上(例如普通笔记本电脑)训练 **全参数 LLM 模型**,计算压力完全由远程 GPU 服务器承担。
136+
137+
与传统的将用户代码嵌入训练集群的方式不同,TinkerScript 允许你在本地运行并验证 Agent 逻辑,通过网络与远程训练循环交互。
138+
139+
### 2. 核心架构 (Core Architecture)
140+
141+
系统包含两个主要部分:运行在 GPU 集群上的 **TinkerScript Server** 和运行在本地的 **TinkerScript Client**
142+
143+
```mermaid
144+
graph TD
145+
subgraph "GPU 集群 (Server 端)"
146+
TrainingLoop[训练循环 (AgentJet/GRPO)]
147+
TSS[TinkerScript Server (FastAPI)]
148+
ZMQ[ZeroMQ / IPC 通信]
149+
SharedMem[(共享内存)]
150+
LLM[LLM 推理引擎 (vLLM/SGLang)]
151+
end
152+
153+
subgraph "用户笔记本 / CPU 集群 (Client 端)"
154+
UserScript[用户脚本 (Python While Loop)]
155+
AgentLogic[Agent 业务逻辑 / 工具调用]
156+
end
157+
158+
TrainingLoop -- "1. 生成任务 (Task)" --> SharedMem
159+
SharedMem -- "2. 注册 Episode" --> TSS
160+
161+
UserScript -- "3. 领取任务 (HTTP Claim)" --> TSS
162+
TSS -- "4. 返回 API Key 与 Base URL" --> UserScript
163+
164+
UserScript -- "5. 推理请求 (OpenAI 协议)" --> LLM
165+
LLM -- "生成 Token 流" --> UserScript
166+
167+
UserScript -- "6. 提交 Reward (HTTP End)" --> TSS
168+
TSS -- "7. 推送结果" --> ZMQ
169+
ZMQ -- "8. 更新权重" --> TrainingLoop
170+
```
171+
172+
### 3. 详细工作流 (Detailed Workflow)
173+
174+
基于“领取 (Claim) - 提交 (Submit)”模式。训练循环生成任务(Episode),等待外部 Worker 领取执行。
175+
176+
```mermaid
177+
sequenceDiagram
178+
participant TL as 训练循环 (内部)
179+
participant S as Server (FastAPI)
180+
participant C as Client (用户脚本)
181+
participant M as LLM 模型服务
182+
183+
Note over TL, S: 1. 任务生成阶段
184+
TL->>S: 注册 Episode (状态: Unclaimed)
185+
186+
Note over C, S: 2. 任务领取阶段
187+
loop Worker Loop
188+
C->>S: POST /claim_episode (请求任务)
189+
alt 无可用任务
190+
S-->>C: 请稍后重试
191+
else 有可用任务
192+
S->>S: 标记为 "Claimed"
193+
S-->>C: 返回 {EpisodeID, OpenAI_BaseURL, API_Key}
194+
end
195+
196+
Note over C, M: 3. 执行阶段 (Rollout)
197+
C->>M: Chat Completion 请求 (推理通过网络回传)
198+
M-->>C: 返回生成结果
199+
C->>C: 计算 Reward (例如: 验证数学答案)
200+
201+
Note over C, S: 4. 结果提交阶段
202+
C->>S: POST /end_episode {Reward, Metadata}
203+
S->>TL: 通过 ZeroMQ 转发结果给训练器
204+
S->>S: 删除 Episode 记录 (完成)
205+
end
206+
```
207+
208+
### 4. 状态机管理 (Episode State Machine)
209+
210+
为了处理网络波动或客户端崩溃(Crash),服务端为每个 Episode 维护了一个状态机。
211+
212+
```mermaid
213+
stateDiagram-v2
214+
[*] --> Registered (已注册)
215+
Registered --> Unclaimed_Queue : 加入待领取队列
216+
217+
Unclaimed_Queue --> Claimed (已被领取) : 客户端请求任务
218+
219+
Claimed --> Completed (已完成) : 客户端提交结果
220+
Claimed --> Registered (已注册) : 客户端超时 / 崩溃
221+
222+
Completed --> [*] : 从内存中移除
223+
```
224+
225+
* **Registered (已注册)**: 训练算法生成了该任务,等待被执行。
226+
* **Claimed (已被领取)**: 某个 Client 正在处理该任务。
227+
* **Timeout (超时)**: 如果 Client 领取任务后在规定时间 (`allow_discard_timeout`) 内未提交结果,服务器会将状态重置为 **Registered**,允许其他 Client 重新领取该任务(容错机制)。
228+
229+
### 5. 实现代码示例
230+
231+
用户侧的代码非常简洁。简而言之:向远程服务器要一个“活儿”,干完活,上报“得分”。
232+
233+
```python
234+
# 用户侧代码概念演示
235+
def rollout(task):
236+
# 1. 握手 & 领取任务 (获取当前 Episode 专属的鉴权信息)
237+
api_baseurl_key = tinkerjet_remote.begin_episode()
238+
239+
# 2. 运行你现有的 Agent 逻辑 (使用标准 OpenAI 接口)
240+
workflow_output = execute_agent(task, api_baseurl_key)
241+
242+
# 3. 提交结果
243+
tinkerjet_remote.end_episode(workflow_output)
244+
return workflow_output.reward
245+
```
246+
247+
### 6. 局限性 (Limitations)
248+
249+
1. **严格依赖 OpenAI 协议**: 用户必须使用 OpenAI `base_url` + `api_key` 的方式与模型交互。无法获取模型内部对象(Weights/Gradients)。
250+
2. **隐式多智能体处理**: 在多智能体(Multi-Agent)场景下,AgentJet 无法通过 API 显式区分不同的 Agent 角色,但后台会尝试自动合并时间线片段。
251+
3. **不支持 Prompt Tuning**: TinkerScript 专为全量模型微调设计,不支持 Soft-Prompt Tuning 等轻量级微调。

0 commit comments

Comments
 (0)