Skip to content

Commit d57b882

Browse files
committed
Halo support
1 parent ff454df commit d57b882

19 files changed

Lines changed: 983 additions & 22 deletions

CHANGELOG.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,19 @@ All notable changes to this project are documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [0.1.7] - 2026-04-30
9+
10+
### Added
11+
- HALO-style `trace_analysis` RLM environment for diagnosing agent harness failures from one-span-per-line JSONL traces.
12+
- Trace sidecar indexing with dataset rollups for trace counts, span counts, error traces, services, models, agents, token totals, and sample trace ids.
13+
- Bounded trace inspection actions: `get_dataset_overview`, `query_traces`, `count_traces`, `view_trace`, `search_trace`, and `view_spans`.
14+
- Large-trace safeguards: per-attribute truncation, oversized trace summaries, and higher-cap selected-span reads.
15+
- Tests for trace indexing, querying, searching, selected-span viewing, and trace environment actions.
16+
- Trace analysis documentation under the Core Engine docs.
17+
18+
### Changed
19+
- `/rlm` command help now advertises `env=trace_analysis` for run, chat, and doctor workflows.
20+
821
## [0.1.6] - 2026-02-20
922

1023
### Added
@@ -56,3 +69,4 @@ Initial public release of **RLM Code**.
5669

5770
[0.1.5]: https://github.com/SuperagenticAI/rlm-code/releases/tag/v0.1.5
5871
[0.1.6]: https://github.com/SuperagenticAI/rlm-code/releases/tag/v0.1.6
72+
[0.1.7]: https://github.com/SuperagenticAI/rlm-code/releases/tag/v0.1.7

README.md

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -25,21 +25,20 @@ RLM Code implements the [Recursive Language Models](https://arxiv.org/abs/2502.0
2525

2626
RLM Code wraps this algorithm in an interactive terminal UI with built-in benchmarks, trajectory replay, and observability.
2727

28-
## Release v0.1.6
28+
## Release v0.1.7
2929

30-
This release adds the new CodeMode path as an opt-in harness strategy.
30+
This release adds HALO-style trace analysis as a new RLM environment.
3131

32-
- New harness strategy: `strategy=codemode` (default remains `strategy=tool_call`)
33-
- MCP bridge flow for CodeMode: `search_tools` -> typed tool surface -> `call_tool_chain`
34-
- Guardrails before execution: blocked API classes plus timeout/size/tool-call caps
35-
- Benchmark telemetry for side-by-side comparison: `tool_call` vs `codemode`
36-
- Dedicated docs section for CodeMode: quickstart, architecture, guardrails, evaluation
37-
- Multi-backend setup docs for UTCP (local) and Cloudflare (remote MCP)
32+
- New `trace_analysis` environment for diagnosing agent harness failures from OTel-shaped JSONL traces
33+
- Sidecar trace indexing with dataset overview, query, count, search, full-trace view, and selected-span view actions
34+
- Bounded payload handling for large traces, including oversized summaries and higher-cap surgical span reads
35+
- `/rlm` help/docs updated for `env=trace_analysis`
36+
- Dedicated trace analysis docs under the Core Engine section
3837

3938
Example:
4039

4140
```text
42-
/harness run "implement feature and add tests" steps=3 mcp=on strategy=codemode mcp_server=utcp-codemode
41+
/rlm run "Find systemic harness failures trace=./traces.jsonl" env=trace_analysis steps=6
4342
```
4443

4544
## Documentation

docs/core/trace-analysis.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Trace Analysis
2+
3+
`rlm-code` includes a HALO-style trace analysis environment for diagnosing
4+
agent harness failures from one-span-per-line JSONL traces.
5+
6+
The environment is named `trace_analysis`. It indexes a trace file into a
7+
sidecar cache, exposes bounded trace-inspection actions to the RLM planner, and
8+
keeps large payloads under control by returning summaries or selected spans
9+
instead of blindly loading full traces into context.
10+
11+
## Usage
12+
13+
```text
14+
/rlm run "Find systemic harness failures trace=./traces.jsonl" env=trace_analysis steps=6
15+
```
16+
17+
The task can include either `trace=<path>` or `trace_path=<path>`. The planner
18+
can also explicitly load a file with the `set_trace_path` action.
19+
20+
## Actions
21+
22+
The environment supports these planner actions:
23+
24+
| Action | Purpose |
25+
|---|---|
26+
| `set_trace_path` | Load and index a trace JSONL file |
27+
| `get_dataset_overview` | Return dataset-level trace, span, service, model, agent, token, and error counts |
28+
| `query_traces` | List matching trace summaries with pagination |
29+
| `count_traces` | Count matching traces without materializing summaries |
30+
| `view_trace` | Read all spans for a small trace, or return an oversized summary |
31+
| `search_trace` | Search one trace for a literal substring |
32+
| `view_spans` | Read selected spans at a higher per-attribute cap |
33+
| `final` | Return the final evidence report |
34+
35+
Supported filters are `has_errors`, `model_names`, `service_names`,
36+
`agent_names`, and `project_id`.
37+
38+
## Trace Shape
39+
40+
The first implementation expects one JSON object per line. Each line should
41+
represent one span with fields such as:
42+
43+
```json
44+
{
45+
"trace_id": "trace-1",
46+
"span_id": "span-1",
47+
"parent_span_id": null,
48+
"name": "agent.Root",
49+
"kind": "SPAN_KIND_INTERNAL",
50+
"start_time": "2026-01-01T00:00:00Z",
51+
"end_time": "2026-01-01T00:00:01Z",
52+
"status": {"code": "STATUS_CODE_ERROR"},
53+
"resource": {"attributes": {"service.name": "my-agent"}},
54+
"attributes": {
55+
"inference.project_id": "my-project",
56+
"inference.agent_name": "Root",
57+
"inference.llm.model_name": "gpt-test"
58+
}
59+
}
60+
```
61+
62+
This is intentionally compatible with the HALO/OpenTelemetry-style file export
63+
pattern where trace data is stored as JSONL and queried through a sidecar index.

docs/index.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66

77
<p class="rlm-tagline">Research Playground & Evaluation OS for Recursive Language Model Agentic Systems</p>
88

9-
<span class="rlm-badge rlm-badge--purple">v0.1.6</span>
9+
<span class="rlm-badge rlm-badge--purple">v0.1.7</span>
1010
<span class="rlm-badge rlm-badge--green">Python 3.11+</span>
1111
<span class="rlm-badge rlm-badge--blue">Apache 2.0</span>
1212

@@ -46,6 +46,13 @@ Run **Pure RLM** (paper-compliant with context-as-variable), **CodeAct** (contex
4646

4747
<div class="rlm-feature-card" markdown>
4848

49+
### 🔎 Trace Analysis
50+
Run HALO-style trace diagnosis with `env=trace_analysis` over OTel-shaped JSONL traces to find repeated harness failure modes.
51+
52+
</div>
53+
54+
<div class="rlm-feature-card" markdown>
55+
4956
### 🧪 Harness CodeMode
5057
Opt into `strategy=codemode` for MCP tool discovery, guarded single-program generation, and chain execution via `call_tool_chain`.
5158

docs/reference/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ RLM Code is organized into the following top-level packages. Each module is docu
1515
| `rlm_code.rlm.events` | Event bus with 27+ event types, collector, and subscriber system | [Event System](../core/events.md) |
1616
| `rlm_code.rlm.termination` | FINAL/FINAL_VAR detection, code block extraction, answer formatting | [Termination Patterns](../core/termination.md) |
1717
| `rlm_code.rlm.memory_compaction` | LLM and deterministic memory compaction strategies | [Memory Compaction](../core/memory-compaction.md) |
18+
| `rlm_code.traces` | HALO-style trace indexing and bounded trace query helpers | [Trace Analysis](../core/trace-analysis.md) |
1819
| `rlm_code.rlm.repl_types` | REPLVariable, REPLEntry, REPLHistory, REPLResult data types | [REPL Types](../core/repl-types.md) |
1920
| `rlm_code.rlm.trajectory` | Trajectory event logging, viewing, and comparison | [Trajectory Logging](../core/trajectory.md) |
2021
| `rlm_code.rlm.comparison` | Paradigm comparison engine (Pure RLM vs CodeAct vs Traditional) | [Paradigm Comparison](../core/comparison.md) |

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -124,6 +124,7 @@ nav:
124124
- "\U0001F4E1 Event System": core/events.md
125125
- "\U0001F6D1 Termination": core/termination.md
126126
- "\U0001F9F9 Memory Compaction": core/memory-compaction.md
127+
- "Trace Analysis": core/trace-analysis.md
127128
- "\U0001F4DF REPL Types": core/repl-types.md
128129
- "\U0001F4C8 Trajectory": core/trajectory.md
129130
- "\U0001F504 Paradigm Comparison": core/comparison.md

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "rlm-code"
7-
version = "0.1.6"
7+
version = "0.1.7"
88
description = "RLM Code: Research Playground & Evaluation OS for Recursive Language Model Agentic Systems"
99
readme = "README.md"
1010
license = "Apache-2.0"

rlm_code/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,5 +5,5 @@
55
through natural language interactions.
66
"""
77

8-
__version__ = "0.1.6"
8+
__version__ = "0.1.7"
99
__author__ = "Super Agentic AI"

rlm_code/commands/slash_commands.py

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1684,7 +1684,7 @@ def cmd_rlm(self, args: list):
16841684
Manage RLM runs.
16851685
16861686
Usage:
1687-
/rlm run <task> [steps=N] [timeout=N] [branch=N] [depth=N] [children=N] [parallel=N] [budget=N] [framework=<see /rlm frameworks>] [env=generic|dspy|pure_rlm] [sub=provider/model]
1687+
/rlm run <task> [steps=N] [timeout=N] [branch=N] [depth=N] [children=N] [parallel=N] [budget=N] [framework=<see /rlm frameworks>] [env=generic|dspy|pure_rlm|trace_analysis] [sub=provider/model]
16881688
/rlm bench [list|preset=name] [mode=native|harness|direct-llm] [strategy=tool_call|codemode] [mcp=on|off] [mcp_server=name] [pack=path[,path2]] [limit=N] [steps=N] [timeout=N] [branch=N] [framework=<see /rlm frameworks>] [env=generic|dspy|pure_rlm] [sub=provider/model]
16891689
/rlm bench compare [candidate=<id|path|latest>] [baseline=<id|path|previous>] [min_reward_delta=N] [min_completion_delta=N] [max_steps_increase=N]
16901690
/rlm bench validate [candidate=<id|path|latest>] [baseline=<id|path|previous>] [min_reward_delta=N] [min_completion_delta=N] [max_steps_increase=N] [--json]
@@ -1696,8 +1696,8 @@ def cmd_rlm(self, args: list):
16961696
/rlm status [run_id]
16971697
/rlm abort [run_id|all]
16981698
/rlm replay [run_id|latest]
1699-
/rlm doctor [env=generic|dspy|pure_rlm] [--json]
1700-
/rlm chat <message> [session=name] [env=generic|dspy|pure_rlm] [branch=N] [depth=N] [children=N] [parallel=N] [budget=N] [framework=<see /rlm frameworks>] [sub=provider/model]
1699+
/rlm doctor [env=generic|dspy|pure_rlm|trace_analysis] [--json]
1700+
/rlm chat <message> [session=name] [env=generic|dspy|pure_rlm|trace_analysis] [branch=N] [depth=N] [children=N] [parallel=N] [budget=N] [framework=<see /rlm frameworks>] [sub=provider/model]
17011701
/rlm chat status [session=name]
17021702
/rlm chat reset [session=name]
17031703
/rlm observability
@@ -1708,14 +1708,14 @@ def cmd_rlm(self, args: list):
17081708
console.print("[bold cyan]🧠 RLM Commands[/bold cyan]")
17091709
console.print(
17101710
" [yellow]/rlm run <task> [steps=N] [timeout=N] [branch=N] [depth=N] [children=N] "
1711-
f"[parallel=N] [budget=N] [framework={framework_opts}] [env=generic|dspy|pure_rlm] "
1711+
f"[parallel=N] [budget=N] [framework={framework_opts}] [env=generic|dspy|pure_rlm|trace_analysis] "
17121712
"[sub=provider/model][/yellow]"
17131713
)
17141714
console.print(
17151715
" [yellow]/rlm bench [list|preset=name] [mode=native|harness|direct-llm] "
17161716
"[strategy=tool_call|codemode] [mcp=on|off] [mcp_server=name] "
17171717
"[pack=path[,path2]] [limit=N] [steps=N] "
1718-
f"[timeout=N] [branch=N] [framework={framework_opts}] [env=generic|dspy|pure_rlm] [sub=provider/model][/yellow]"
1718+
f"[timeout=N] [branch=N] [framework={framework_opts}] [env=generic|dspy|pure_rlm|trace_analysis] [sub=provider/model][/yellow]"
17191719
)
17201720
console.print(
17211721
" [yellow]/rlm bench compare [candidate=<id|path|latest>] [baseline=<id|path|previous>] "
@@ -1741,9 +1741,9 @@ def cmd_rlm(self, args: list):
17411741
console.print(" [yellow]/rlm status [run_id][/yellow]")
17421742
console.print(" [yellow]/rlm abort [run_id|all][/yellow]")
17431743
console.print(" [yellow]/rlm replay [run_id|latest][/yellow]")
1744-
console.print(" [yellow]/rlm doctor [env=generic|dspy|pure_rlm] [--json][/yellow]")
1744+
console.print(" [yellow]/rlm doctor [env=generic|dspy|pure_rlm|trace_analysis] [--json][/yellow]")
17451745
console.print(
1746-
" [yellow]/rlm chat <message> [session=name] [env=generic|dspy|pure_rlm] [branch=N] [depth=N] "
1746+
" [yellow]/rlm chat <message> [session=name] [env=generic|dspy|pure_rlm|trace_analysis] [branch=N] [depth=N] "
17471747
f"[children=N] [parallel=N] [budget=N] [framework={framework_opts}] "
17481748
"[sub=provider/model][/yellow]"
17491749
)
@@ -2135,7 +2135,7 @@ def cmd_rlm(self, args: list):
21352135
task = " ".join(task_tokens).strip()
21362136
if not task:
21372137
show_error_message(
2138-
"Usage: /rlm run <task> [steps=N] [timeout=N] [env=generic|dspy|pure_rlm] "
2138+
"Usage: /rlm run <task> [steps=N] [timeout=N] [env=generic|dspy|pure_rlm|trace_analysis] "
21392139
"[depth=N] [children=N] [parallel=N] [budget=N] "
21402140
f"[framework={framework_opts}] "
21412141
"[branch=N] [sub=provider/model]"

rlm_code/mcp/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
)
1818
from .session_wrapper import MCPSessionWrapper
1919

20-
__version__ = "0.1.6"
20+
__version__ = "0.1.7"
2121

2222
__all__ = [
2323
"MCPClientManager",

0 commit comments

Comments
 (0)