You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add Pi Coding Agent rollout seed source (#513) (#514)
Add support for ingesting Pi Coding Agent session artifacts as an agent
rollout seed source. Pi sessions are tree-structured JSONL files; the
handler resolves the active conversation path by walking from the last
entry back to the root via id/parentId links.
Key points:
- Tree-structured sessions with automatic active-path resolution
- Entry-level types: model_change, compaction, branch_summary,
custom_message, thinking_level_change
- Message roles: user, assistant (inline ToolCall/ThinkingContent/
TextContent blocks), toolResult, bashExecution (synthesized as
tool-call pairs), custom, compactionSummary, branchSummary
- Extract shared normalize_message_content to utils.py (was duplicated
in Hermes handler)
Copy file name to clipboardExpand all lines: docs/concepts/agent-rollout-ingestion.md
+34-22Lines changed: 34 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -42,6 +42,18 @@ Use `AgentRolloutSeedSource` when you want to work from existing agent traces in
42
42
)
43
43
```
44
44
45
+
=== "Pi Coding Agent"
46
+
47
+
Uses `~/.pi/agent/sessions` and `*.jsonl` by default. Sessions are tree-structured JSONL files; the active conversation path is resolved automatically.
48
+
49
+
```python
50
+
import data_designer.config as dd
51
+
52
+
seed_source = dd.AgentRolloutSeedSource(
53
+
format=dd.AgentRolloutFormat.PI_CODING_AGENT,
54
+
)
55
+
```
56
+
45
57
=== "ATIF"
46
58
47
59
ATIF requires an explicit `path`. See Harbor's [ATIF documentation](https://harborframework.com/docs/trajectory-format) for the format specification.
@@ -63,31 +75,31 @@ You can override `path` and `file_pattern` for any format when your rollout arti
63
75
64
76
All supported rollout formats map into the same seeded row schema. In the table below, `None` means the source artifact does not expose that field directly, and `derived` means Data Designer computes it from normalized `messages`.
65
77
66
-
| Normalized field | ATIF | Claude Code | Codex | Hermes Agent |
67
-
|---|---|---|---|---|
68
-
|`trace_id`|`session_id`|`sessionId[:agentId]`|`session_meta.id` or file stem | CLI `session_id` or file stem; gateway file stem |
-`trace_id`: Claude Code appends `agentId` when present. Hermes uses either the CLI session ID or the gateway transcript file stem.
88
-
-`is_sidechain`: ATIFand Hermes currently normalize this to `False`. Claude Code preserves `isSidechain` directly.
89
-
-`messages`: All formats normalize into the same chat-style message schema. See [Message Traces](traces.md) for the shared block structure.
90
-
-`source_meta`: This is where format-specific details live, such as ATIF copied-context metadata, Claude summaries, Codex response-item types, or Hermes tool/session metadata.
99
+
-`trace_id`: Claude Code appends `agentId` when present. Hermes uses either the CLI session ID or the gateway transcript file stem. Pi uses the session header `id`.
100
+
-`is_sidechain`: ATIF, Hermes, and Pi currently normalize this to `False`. Claude Code preserves `isSidechain` directly.
101
+
-`messages`: All formats normalize into the same chat-style message schema. See [Message Traces](traces.md) for the shared block structure. Pi sessions are tree-structured; only the active conversation path (from the last entry back to root) is included.
102
+
-`source_meta`: This is where format-specific details live, such as ATIF copied-context metadata, Claude summaries, Codex response-item types, Hermes tool/session metadata, or Pi session version and branch information.
0 commit comments