You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
halo path_to_your_traces.jsonl -p "Diagnose errors you find and suggest fixes"
93
95
```
94
96
97
+
HALO uses the canonical OpenAI env vars: `OPENAI_API_KEY` for credentials and `OPENAI_BASE_URL` for OpenAI-compatible providers. If `OPENAI_BASE_URL` is unset, HALO uses `https://api.openai.com/v1`. Run `halo --help` to see all CLI options. The CLI mirrors the model/provider settings exposed by the Python SDK's
|`--refusal-retries`|`0`| Retry an agent model request this many times when the model refuses |
118
+
|`--reasoning-effort`| model/provider default | Reasoning effort for root, subagent, and synthesis calls. Compaction never uses reasoning |
119
+
|`--telemetry`| off | Emit OpenInference traces of HALO's own LLM, tool, and agent activity |
120
+
121
+
For example:
122
+
123
+
```bash
124
+
halo path_to_your_traces.jsonl \
125
+
-p "Diagnose errors you find and suggest fixes" \
126
+
--base-url https://openrouter.ai/api/v1 \
127
+
-H "HTTP-Referer: https://example.com"
128
+
```
129
+
130
+
### Telemetry
131
+
132
+
HALO can emit OpenInference-shaped traces of its own LLM, tool, and agent activity. It is off by default; nothing is emitted unless you pass `--telemetry`.
133
+
134
+
```bash
135
+
halo TRACE_PATH --prompt "..." --telemetry
136
+
```
137
+
138
+
When telemetry is enabled, `CATALYST_OTLP_TOKEN` uploads spans to inference.net Catalyst over OTLP. If it is unset, spans are written to a local JSONL file at `./halo-telemetry-{run_id}.jsonl` in the current working directory.
139
+
140
+
| Var | Default | Purpose |
141
+
|---|---|---|
142
+
|`CATALYST_OTLP_TOKEN`| unset | If set, uploads to Catalyst over OTLP. If unset, writes JSONL locally |
143
+
|`CATALYST_OTLP_ENDPOINT`| catalyst-tracing default | OTLP endpoint base URL, for example `https://telemetry.inference.net`|
144
+
|`CATALYST_DEBUG`| unset | Set to `1` to surface OTLP export errors at WARNING level |
145
+
|`CATALYST_TRACING_RUN_ID`| unset | Uses this HALO run id instead of a generated uuid |
|`stream_engine_async`| async |`AsyncIterator[AgentOutputItem \| AgentTextDelta]`| You want every event including streaming-token deltas (live UI, custom rendering). |
108
-
|`stream_engine_output_async`| async |`AsyncIterator[AgentOutputItem]`| You want to log / persist each completed step (assistant message, tool call, tool result) as it lands. |
109
-
|`run_engine_async`| async |`list[AgentOutputItem]`| You want the final list at the end and don't care about per-step observability. |
110
-
|`stream_engine`| sync |`Iterator[AgentOutputItem \| AgentTextDelta]`| Sync generator; yields every event including deltas. Drives the async iterator on a private event loop. |
111
-
|`stream_engine_output`| sync |`Iterator[AgentOutputItem]`| Sync generator; yields completed items only. Same shape as the async variant for sync callers. |
112
-
|`run_engine`| sync |`list[AgentOutputItem]`| Sync, collects to a list. Pure convenience over `asyncio.run(run_engine_async(...))`. |
159
+
| Function | Sync / async | Returns | When to use |
|`stream_engine_async`| async |`AsyncIterator[AgentOutputItem \| AgentTextDelta]`| You want every event including streaming-token deltas (live UI, custom rendering). |
162
+
|`stream_engine_output_async`| async |`AsyncIterator[AgentOutputItem]`| You want to log / persist each completed step (assistant message, tool call, tool result) as it lands. |
163
+
|`run_engine_async`| async |`list[AgentOutputItem]`| You want the final list at the end and don't care about per-step observability. |
164
+
|`stream_engine`| sync |`Iterator[AgentOutputItem \| AgentTextDelta]`| Sync generator; yields every event including deltas. Drives the async iterator on a private event loop. |
165
+
|`stream_engine_output`| sync |`Iterator[AgentOutputItem]`| Sync generator; yields completed items only. Same shape as the async variant for sync callers. |
166
+
|`run_engine`| sync |`list[AgentOutputItem]`| Sync, collects to a list. Pure convenience over `asyncio.run(run_engine_async(...))`. |
113
167
114
168
```python
115
169
from engine.main import stream_engine_output_async
Thin Typer wrapper around the HALO engine that streams the engine over a JSONL trace file.
3
+
This package contains the `halo` console entry point registered in `pyproject.toml`.
4
+
It is a thin Typer wrapper around the engine API:
4
5
5
-
## Install
6
+
- Parses CLI arguments and environment-backed provider settings.
7
+
- Builds an `EngineConfig` from those arguments.
8
+
- Calls `stream_engine_async` over a JSONL trace file.
9
+
- Renders streaming text deltas and completed agent output items to stdout.
6
10
7
-
```bash
8
-
pip install halo-engine
9
-
```
11
+
User-facing installation, usage, options, and telemetry docs live in the root
12
+
[`README.md`](../README.md).
10
13
11
-
This installs the `halo` script onto your `PATH`. No extra configuration — the script is registered as a console entry point in the `halo-engine` wheel.
14
+
## Code Layout
12
15
13
-
Verify:
16
+
`main.py` intentionally keeps the CLI small. The engine owns behavior; the CLI only
|`--model`, `-m`|`gpt-5.4-mini`| Model name for root, sub, synthesis, and compaction |
45
-
|`--max-depth`|`1`| Max subagent recursion depth |
46
-
|`--max-turns`|`8`| Max turns per agent |
47
-
|`--max-parallel`|`2`| Max concurrent subagents |
48
-
|`--reasoning-effort`|_(model default)_| Reasoning effort for root, subagent, and synthesis calls. One of `none`, `minimal`, `low`, `medium`, `high`, `xhigh`. Compaction never uses reasoning. Omit to use the model family's documented max for known reasoning models. |
49
-
50
-
## Example
51
-
52
-
```bash
53
-
halo tests/fixtures/realistic_traces.jsonl \
54
-
-p "What are the most common failure modes?" \
55
-
--max-depth 2 \
56
-
--max-turns 12 \
57
-
--reasoning-effort high
58
-
```
59
-
60
-
Output streams to stdout: text deltas inline, then a rule-separated panel for each agent output item.
61
-
62
-
## Telemetry (optional)
63
-
64
-
HALO can emit OpenInference-shaped traces of its **own** LLM, tool, and agent activity — useful when you're tuning HALO and want to inspect what it actually did. Off by default; nothing is emitted unless you pass `--telemetry`.
65
-
66
-
### Enable on a run
67
-
68
-
```bash
69
-
halo TRACE_PATH --prompt "..." --telemetry
70
-
```
71
-
72
-
### Routing
73
-
74
-
The destination is decided by env vars:
75
-
76
-
-`CATALYST_OTLP_TOKEN` set → spans are uploaded to **inference.net Catalyst** over OTLP.
77
-
-`CATALYST_OTLP_TOKEN` unset → spans are written to a **local JSONL file** at `./halo-telemetry-{run_id}.jsonl` in the current working directory.
78
-
79
-
### Environment variables
80
-
81
-
| Var | Default | Purpose |
82
-
|---|---|---|
83
-
|`CATALYST_OTLP_TOKEN`|*(unset)*| If set, uploads to Catalyst over OTLP. If unset, writes JSONL locally. |
84
-
|`CATALYST_OTLP_ENDPOINT`| catalyst-tracing default | OTLP endpoint **base URL** (e.g. `https://telemetry.inference.net`). catalyst-tracing appends `/v1/traces` automatically — do **not** include the path, or you'll get a `.../v1/traces/v1/traces` 404 and silently no traces. |
85
-
|`CATALYST_DEBUG`|*(unset)*| Set to `1` to surface OTLP export errors at WARNING level. Useful for troubleshooting "no errors, no traces" — the default `BatchSpanProcessor` swallows export failures. |
86
-
|`CATALYST_TRACING_RUN_ID`|*(unset)*| When set, becomes the HALO run id (and the `halo.run.id` resource attribute) instead of a generated uuid. Lets a launching system (typically Catalyst) keep its own bookkeeping in sync with HALO's traces. |
87
-
|`CATALYST_TRACING_*`|*(unset)*| Generic passthrough — see below. |
88
-
|`HALO_TELEMETRY_PATH`|`./halo-telemetry-{run_id}.jsonl`| Local fallback file path. Only consulted when `CATALYST_OTLP_TOKEN` is unset. |
89
-
90
-
### Local file format
91
-
92
-
The local JSONL is the inference.net OTLP-shaped form that HALO itself ingests, so traces produced by running HALO can be loaded back into HALO for analysis.
93
-
94
-
### Notes
95
-
96
-
- Enabling `--telemetry` clears the openai-agents SDK's default trace processor (which would otherwise upload to OpenAI's dashboard). HALO's own LLM traffic stays out of OpenAI's dashboard while telemetry is on.
97
-
- When telemetry is off (the default), no env vars are read and no files are written.
98
-
99
-
## Developing locally
100
-
101
-
If you want to hack on the CLI or the engine itself, install from a checkout of this repo with [`uv`](https://docs.astral.sh/uv/):
102
-
103
-
```bash
104
-
git clone https://github.com/context-labs/HALO
105
-
cd HALO
106
-
uv sync
107
-
```
108
-
109
-
`uv sync` creates `.venv/` and installs `halo-engine` in editable mode. Use `uv run halo ...` (or activate the venv) to invoke the CLI against your local checkout.
19
+
Tests for argument parsing and config wiring live in `tests/unit/test_halo_cli.py`.
0 commit comments