Feat/engine by francescov1 · Pull Request #2 · context-labs/HALO

francescov1 · 2026-04-26T16:20:11Z

Note

High Risk
High risk due to adding a large new execution runtime (LLM calls, tool streaming/compaction, sandboxed code execution) and new release/CI pipelines that can affect publishing and developer workflows.

Overview
Adds a new engine/ uv package implementing the HALO agent runtime, including async streaming entrypoints (stream_engine_async/run_engine_async), shared output bus sequencing across root/subagents, context compaction that preserves valid tool-call turns, and an OpenAI Agents SDK runner with retriable-error classification and a circuit breaker.

Introduces a sandboxed run_code tool (bubblewrap on Linux / sandbox-exec on macOS) with capped output and timeouts, plus supporting config/models and trace indexing/store wiring.

Adds repo tooling and delivery: git hook shims in .githooks/ that delegate to engine/scripts/git-hooks, an engine/Taskfile.yml for setup/checks, new GitHub Actions workflows for engine lint/unit/integration/e2e and a multi-stage release workflow (version bump, build, PyPI publish), and a separate cli/ package (halo-cli) exposing a halo-engine Typer command for streaming runs.

^{Reviewed by Cursor Bugbot for commit 1ad22ea. Bugbot is set up for automated code reviews on this repo. Configure here.}

…-format Add sample compressed traces fixture for use by halo-engine e2e tests. Made-with: Cursor

Reverting the removal of the child_depth > maximum_depth check inside guarded_invoke. The structural guard in _child_tools_for_depth (only exposing call_subagent when depth < maximum_depth) makes this path unreachable in normal flow, but the runtime check is cheap insurance against a future refactor that bypasses the structural enforcement. Comment now records that intent. New unit test constructs the tool directly via _build_subagent_as_tool with child_depth=maximum_depth+1 to exercise the branch, addressing the original "unreachable" / "dead code" complaint by making the guard testable and tested.

cursor · 2026-04-27T21:05:39Z

+        compaction_model=ModelConfig(name=E2E_MODEL),
+        # Force compaction to trigger even on small conversations
+        text_message_compaction_keep_last_messages=1,
+        tool_call_compaction_keep_last_messages=1,


Stale field name in plan's integration test code

Medium Severity

Task 11.4's integration test snippet uses tool_call_compaction_keep_last_messages=1, but EngineConfig defines the field as tool_call_compaction_keep_last_turns (with extra="forbid"). When an engineer or agentic worker follows the plan to create test_engine_compaction.py, the test will immediately fail with a Pydantic ValidationError on construction rather than testing compaction behavior.

^{Reviewed by Cursor Bugbot for commit f8bde0f. Configure here.}

cursor · 2026-04-27T21:05:40Z

+        self.items = list(items)
+        self.compaction_model = compaction_model
+        self.text_message_compaction_keep_last_messages = text_message_compaction_keep_last_messages
+        self.tool_call_compaction_keep_last_turns = tool_call_compaction_keep_last_turns


Dead compaction_model field on AgentContext

Low Severity

AgentContext.__init__ accepts and stores compaction_model as a required field, but no method on AgentContext ever reads self.compaction_model. Compaction actually happens through the compactor callable injected into compact_old_items, which is produced by build_openai_compactor_factory(engine_config) — AgentContext has no role in resolving the model. Every call site (including subagent_tool_factory) must pass the field for no effect, and the TODO comment in openai_agent_runner.py acknowledges the current two-path design.

Additional Locations (1)

engine/engine/tools/subagent_tool_factory.py#L208-L209

^{Reviewed by Cursor Bugbot for commit f8bde0f. Configure here.}

…nups Wire agent_context into ToolContext + drop dead subagent code

ModelConfig.name was a Literal that included Claude variants but the compactor and synthesis tool always built AsyncOpenAI() and called chat.completions.create — a Claude name would 4xx at runtime. Targeting the OpenAI-compatible chat-completions surface (de facto 2025 standard exposed by OpenAI, OpenRouter, Anthropic compat, vLLM, Together, Groq, Ollama, LM Studio, Azure, etc.) covers the broad-compatibility goal without adding a dependency: - Drop Literal on ModelConfig.name (now plain str so any provider's model id is accepted) - New ModelProviderConfig (base_url, api_key) on EngineConfig; each field independently falls back to the matching env var when None - Thread it into the AsyncOpenAI client used by the compactor and synthesis tool, and into a new configure_default_sdk_client in openai_agent_runner so the Agents SDK honors the same endpoint - Rename openai_compactor.py -> compactor.py and build_openai_compactor_factory -> build_compactor_factory now that the helper is provider-agnostic

Primary fix (cursor[bot] 3150208823, Medium): - _drive caught BaseException and routed it through output_bus.fail() without re-raising, so CancelledError / KeyboardInterrupt / SystemExit were converted into normal task completions. Structured cancellation broke: an outer task.cancel() left the inner task ended-normally with a fail signal queued for a consumer that may have already torn down. - Narrow the silent-fail branch to except Exception; for BaseException drain the bus then re-raise so the task transitions to cancelled. Cleanups bundled (same file or trivially small): - Rename inner runner -> agent_runner in _drive (3149854827) so it doesn't shadow the outer runner: RunnerProtocol parameter. - Add the {maximum_parallel_subagents} placeholder to the subagent system prompt (3150184259); render_subagent_system_prompt accepted the kwarg but the template had no slot, so str.format silently dropped it. - Anchor /data/ in .gitignore (3150106227) so the rule only matches the top-level data/ directory rather than any data/ at any depth.

…leanups Let _drive cancel cleanly + small bug-bot cleanups

set_default_openai_client defaults use_for_tracing=True, which routes the Agents SDK tracing exporter through the same custom client used for model calls. Non-OpenAI providers (vLLM, Ollama, OpenRouter, etc.) don't speak the tracing API, so tracing POSTs would either error or silently drop — defeating a key goal of the PR. Pass use_for_tracing=False so the tracing exporter stays on its default OpenAI path while model calls go through the configured endpoint.

cursor · 2026-04-27T23:36:34Z

+        "--",
+        "/venv/bin/python",
+        str(script_path),
+    ]


Hardcoded sandbox mount paths duplicated across modules

Medium Severity

The in-sandbox mount paths /mnt/trace/traces.jsonl, /mnt/trace/traces.jsonl.engine-index.jsonl, /venv, /venv/bin/python, and /workspace/bootstrap.py are hardcoded as string literals in both platform_commands.py and sandbox_runner.py. They must stay perfectly synchronized — if one is changed without the other, the sandbox silently breaks because the bootstrap script tries to load files at paths the bind mount no longer provides. Extracting them into shared constants would eliminate this drift risk.

Additional Locations (1)

engine/engine/sandbox/sandbox_runner.py#L44-L66

^{Reviewed by Cursor Bugbot for commit 58a6ec6. Configure here.}

Route LLM calls through configurable OpenAI-compatible endpoint

cursor[bot] 3150754673, High: a single asyncio.Semaphore was shared across every depth, and guarded_invoke held its slot for the full child execution including the parent's wait on its grandchildren. With max_parallel_subagents parents holding every slot at depth N, any depth-(N+1) grandchild blocks forever waiting on a slot its parent is holding — and the parent stays parked inside runner.run waiting for that grandchild's tool result. Reachable on default config (max_depth=2, max_parallel=4) and at max_parallel=1 with even a single grandchild spawn. Replace the single semaphore with one per spawnable depth, built once in build_root_sdk_agent. Each depth has its own pool so parent-holding-slot can no longer block its own grandchildren — they contend on different semaphores. Worst-case concurrent agents at default config: P at each depth, total P × D = 8. Hard cap, honest semantics. Includes a regression test that pre-acquires the depth-1 slot externally then drives a depth-2 tool via _build_subagent_as_tool, verifying it completes within timeout. Verified the test fails (TimeoutError) when guarded_invoke is reverted to a global slot.

Fix nested-subagent deadlock with per-depth semaphores

cursor · 2026-04-28T01:19:45Z

+
+if [[ -x "$ENGINE_HOOK" ]]; then
+  "$ENGINE_HOOK"
+fi


Pre-commit hook silently no-ops when engine hook missing

Low Severity

Both hooks silently skip when $ENGINE_HOOK is not executable or doesn't exist. A developer who hasn't run task setup-git-hooks, or whose checkout has the file but without the executable bit, will get zero feedback that pre-commit/pre-push validation is being bypassed entirely. The PR discussion explicitly flagged "dont like fallbacks like this lets use one way of setting paths" — this silent fallback is the kind of behavior that change was meant to remove.

Additional Locations (1)

.githooks/pre-push#L17-L22

^{Reviewed by Cursor Bugbot for commit 4be7579. Configure here.}

cursor · 2026-04-28T01:19:45Z

+                ],
+                temperature=engine_config.compaction_model.temperature or 0.0,
+            )
+            return (response.choices[0].message.content or "").strip()


Compactor lazy client reads unset model_provider

Low Severity

The factory closure mutates a shared openai_client via nonlocal, lazily constructing one AsyncOpenAI per factory but reusing it across executions. Two issues: (1) under concurrent subagent compaction the lazy if openai_client is None check is racy, possibly creating multiple clients; (2) the inline TODO already calls out this design — building one client per run and injecting it removes the branch entirely. As written, this is overly complex relative to the simpler single-client construction the TODO describes.

^{Reviewed by Cursor Bugbot for commit 4be7579. Configure here.}

cursor[bot] 3151031551 (Medium): release workflow ran ``uv version --bump … --no-sync`` then committed both pyproject.toml and uv.lock — but ``--no-sync`` also skips the lock refresh, so the release would push a uv.lock that lagged the bumped pyproject.toml. Add an explicit ``uv lock`` step after the bump. cursor[bot] 3151031562 (Low): when the consumer of stream_engine_async cancels iteration, the outer ``except`` calls task.cancel() but never awaits the inner _drive task. asyncio then warns "Task was destroyed but it is pending" and any exception _drive raises during cancellation cleanup is silently dropped. Add a try/except-BaseException that drains the task before re-raising the consumer's original exception.

cursor · 2026-04-28T01:27:43Z

+            chunk = await stream.read(65536)
+            if not chunk:
+                break
+        return bytes(buf)


Sandbox stdout/stderr drained twice loses output

Medium Severity

In _read_capped, after the first loop fills buf exactly to cap, the second while True loop continues to read and discard the remaining stream. However, if the first loop exits because len(buf) == cap while the next chunk would have fit (the boundary case), or if the process emits less than cap total bytes, the second loop still runs an extra await stream.read(65536) after EOF — harmless but wasteful. More importantly, when output exceeds cap, bytes between cap and the next chunk boundary are silently discarded with no truncation marker, so users see partial output with no indication that it was capped.

^{Reviewed by Cursor Bugbot for commit 83cea0a. Configure here.}

cursor · 2026-04-28T01:27:43Z

+
+if [[ -x "$ENGINE_HOOK" ]]; then
+  "$ENGINE_HOOK" "$REMOTE" "$URL" < "$REFS_FILE"
+fi


Pre-push hook silently swallows refs when engine hook missing

Low Severity

The wrapper unconditionally consumes stdin via cat > "$REFS_FILE" before checking whether the engine hook exists. If the engine hook is absent or non-executable, the captured refs file is discarded on cleanup and the push proceeds with no validation. That's the intended fallback, but it also means any failure to install the engine hook (typo in path, lost executable bit) becomes a silent no-op rather than an obvious error — defeating the purpose of having a pre-push hook at all. Consider logging a warning when the engine hook is not found.

^{Reviewed by Cursor Bugbot for commit 83cea0a. Configure here.}

cursor[bot] 3151054134 (Low): _map_tool_call and _map_tool_output both used ``item_id = str(getattr(raw, "id", "") or call_id)``. When the SDK's raw item lacks an ``id``, both fall back to the same ``call_id``, so the two AgentContextItems collide on item_id. AgentContext._index is keyed by item_id, so the second append() silently overwrites the first — get_context_item then returns the wrong item even though both still live in self.items. Namespace the fallback: tool-call-{call_id} for the assistant call and tool-result-{call_id} for the tool result. Items with intact raw.id are unaffected. Includes a regression test that constructs a tool_call/tool_output pair without raw.id and asserts distinct item_ids; verified the test fails when the fix is reverted.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 14 total unresolved issues (including 13 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 1ad22ea. Configure here.}

cursor · 2026-04-28T01:43:00Z

+            input=input,
+            context=context,
+            max_turns=engine_config.root_agent.maximum_turns,
+        )


_run_streamed hardcodes root agent's max turns

Low Severity

The _run_streamed closure ignores its agent kwarg and always passes engine_config.root_agent.maximum_turns to Runner.run_streamed. The runner is only bound to the root execution today, so this works, but if OpenAiAgentRunner is ever reused for a non-root agent (or subagent is configured with a different maximum_turns), the wrong cap will be silently applied. Reading max_turns from the agent that was passed in would remove this latent coupling.

^{Reviewed by Cursor Bugbot for commit 1ad22ea. Configure here.}

… new finds Driven by a second sub-agent run that surfaced two new bugs and gave specific framework feedback. This commit lands the framework changes, promotes one probe to an example, and leaves one probe in place as a bug repro per the new lifecycle rules. probe_kit.py — three new helpers: - make_checker(): bundles the _check / _FAILURES / main() boilerplate that every probe duplicates inline. Returns (check, failures); call failures.report_and_exit_code() at the end of main(). - make_tool_context(state, *, agent_context, agent_execution): builds a ToolContext from an EngineRunState. Removes 5-line boilerplate from every unit-style tool probe. - check_raises(callable_or_coro, expected_type): asserts an expected exception is raised, returns the caught exception. Distinguishes legitimate try/except (asserting expected raises) from the forbidden use (hiding unexpected failures). README — significant rewrite around probe lifecycle, driven by user direction: - Probes that PASS are deleted by the agent. Probes that FAIL stay in place as the canonical bug repro; the user evaluates / deletes once fixed. Replaced 'probes are ephemeral' with explicit lifecycle rules in Hard rule 7 plus a top-level summary in the intro. - New 'Section 3 — Suggested example promotions' in the report format. The bar is novel TESTING TECHNIQUE, not code surface coverage. Defines what does and does not count as novel. - Process step 1 expanded with concrete module pointers (added agent_context_items, engine_output_bus, runner_protocol, tool_protocol, trace_store) per second-agent feedback that the prior list felt thin. - 'Areas worth probing' shortened to one-line teasers — the curated sub-bullets had been functioning as a checklist that pulled the agent toward listed items. Now genuinely a seed list. - Try/except clarification: hiding unexpected failures forbidden; asserting expected raises is fine — use check_raises. - Optional 'Repro:' line added to FAIL findings (only when probe file is left behind). - 'How to write a probe' skeleton updated to use make_checker / make_tool_context / check_raises. example_compaction.py — promoted from probe_compaction.py. Demonstrates two genuinely novel patterns the existing examples don't show: - Direct AgentContext probing without run_with_fake at all - _RecordingCompactor class (a real callable that records, not a mock) Surfaced bug #7: compacted plain-text assistant messages mis-rendered as 'Compacted tool calls' in agent_context.py:138-142. probe_max_turns_forwarding.py — kept as ephemeral probe (no novel technique; just runner.calls inspection that example_circuit_breaker already shows). Surfaces bug #6: AgentConfig.maximum_turns never forwarded to Runner.run_streamed at either root or subagent call sites. File stays per lifecycle rule #2 — it's the bug repro until the bug is fixed.

Feat/engine

declanatinference added 30 commits April 24, 2026 22:36

feat(engine): scaffold halo-engine uv package

5c4ca9f

test(engine): add tiny traces fixture

cd554e5

feat(engine): add JsonValue/JsonMapping aliases

9805cbb

test(engine): add traces.jsonl.gz fixture from openai-inference-trace…

ef8dcff

…-format Add sample compressed traces fixture for use by halo-engine e2e tests. Made-with: Cursor

feat(engine): add AgentMessage + tool call models

7904da3

feat(engine): add AgentOutputItem and AgentTextDelta

f404b84

feat(engine): add ModelConfig and AvailableModelName

7c4c295

feat(engine): add AgentConfig

83abba0

feat(engine): add TraceIndexConfig

72ebe44

feat(engine): add sandbox config models

fa20748

feat(engine): add EngineConfig

da7eb49

feat(engine): add typed engine errors

6a1e2fb

feat(engine): add canonical span models

44c65c9

feat(engine): add trace index row/meta models

fdc6676

feat(engine): add trace query/argument/result models

faa658a

feat(engine): TraceIndexBuilder.ensure_index_exists presence check

5d0a2fe

feat(engine): implement TraceIndexBuilder.build_index

a34ef04

feat(engine): reject existing index with mismatched schema_version

b72aeb0

feat(engine): TraceStore.load

e3eb8cf

feat(engine): TraceStore.view_trace via byte-range reads

489006f

feat(engine): TraceStore.query_traces with filters

1d5df86

feat(engine): TraceStore.count_traces

90ff823

feat(engine): TraceStore.get_overview

6d8c1b7

feat(engine): TraceStore.search_trace substring search

ea94aeb

feat(engine): TraceStore.render_trace with budget

6944901

feat(engine): EngineTool protocol + ToolContext

224e947

feat(engine): SDK FunctionTool adapter for EngineTool

dc30d34

feat(engine): SDK FunctionTool adapter for EngineTool (add test)

b6b3ebd

feat(engine): trace tools backed by TraceStore

8a95b9c

feat(engine): AgentContextItem

1e2709a

declanatinference mentioned this pull request Apr 27, 2026

Wire agent_context into ToolContext + drop dead subagent code #8

Merged

3 tasks

francescov1 and others added 2 commits April 27, 2026 13:52

Rename

f8bde0f

cursor Bot reviewed Apr 27, 2026

View reviewed changes

Cleanuo

e6beafe

cursor Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread engine/engine/main.py

declanatinference added 2 commits April 27, 2026 14:11

Merge pull request #8 from context-labs/fix/subagent-context-and-clea…

7601d40

…nups Wire agent_context into ToolContext + drop dead subagent code

declanatinference mentioned this pull request Apr 27, 2026

Route LLM calls through configurable OpenAI-compatible endpoint #10

Merged

4 tasks

declanatinference added 2 commits April 27, 2026 23:05

Add TODO to share one AsyncOpenAI across compactor/synthesis/SDK

9171778

declanatinference mentioned this pull request Apr 27, 2026

Let _drive cancel cleanly + small bug-bot cleanups #11

Merged

4 tasks

declanatinference added 2 commits April 27, 2026 16:28

Merge pull request #11 from context-labs/fix/drive-cancellation-and-c…

58a6ec6

…leanups Let _drive cancel cleanly + small bug-bot cleanups

cursor Bot reviewed Apr 27, 2026

View reviewed changes

Merge pull request #10 from context-labs/fix/openai-compatible-providers

ba5372b

Route LLM calls through configurable OpenAI-compatible endpoint

declanatinference previously approved these changes Apr 27, 2026

View reviewed changes

cursor Bot reviewed Apr 27, 2026

View reviewed changes

Comment thread engine/engine/tools/subagent_tool_factory.py

declanatinference mentioned this pull request Apr 28, 2026

Fix nested-subagent deadlock with per-depth semaphores #12

Merged

4 tasks

Merge pull request #12 from context-labs/fix/per-depth-semaphore

4be7579

Fix nested-subagent deadlock with per-depth semaphores

declanatinference dismissed their stale review via 4be7579 April 28, 2026 01:15

cursor Bot reviewed Apr 28, 2026

View reviewed changes

francescov1 merged commit 31c844e into main Apr 28, 2026
3 checks passed

francescov1 added a commit that referenced this pull request Apr 29, 2026

Merge pull request #2 from context-labs/feat/engine

9a8835e

Feat/engine

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/engine#2

Feat/engine#2
francescov1 merged 122 commits into
mainfrom
feat/engine

francescov1 commented Apr 26, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot Apr 27, 2026

Uh oh!

Uh oh!

cursor Bot Apr 27, 2026

Uh oh!

Uh oh!

cursor Bot Apr 27, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot Apr 28, 2026

Uh oh!

Uh oh!

cursor Bot Apr 28, 2026

Uh oh!

cursor Bot Apr 28, 2026

Uh oh!

cursor Bot Apr 28, 2026

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

francescov1 commented Apr 26, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot Apr 27, 2026

Choose a reason for hiding this comment

Stale field name in plan's integration test code

Uh oh!

Uh oh!

cursor Bot Apr 27, 2026

Choose a reason for hiding this comment

Dead compaction_model field on AgentContext

Uh oh!

Uh oh!

cursor Bot Apr 27, 2026

Choose a reason for hiding this comment

Hardcoded sandbox mount paths duplicated across modules

Uh oh!

Uh oh!

Uh oh!

cursor Bot Apr 28, 2026

Choose a reason for hiding this comment

Pre-commit hook silently no-ops when engine hook missing

Uh oh!

Uh oh!

cursor Bot Apr 28, 2026

Choose a reason for hiding this comment

Compactor lazy client reads unset model_provider

Uh oh!

cursor Bot Apr 28, 2026

Choose a reason for hiding this comment

Sandbox stdout/stderr drained twice loses output

Uh oh!

cursor Bot Apr 28, 2026

Choose a reason for hiding this comment

Pre-push hook silently swallows refs when engine hook missing

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 28, 2026

Choose a reason for hiding this comment

_run_streamed hardcodes root agent's max turns

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

francescov1 commented Apr 26, 2026 •

edited by cursor Bot

Loading

Dead `compaction_model` field on `AgentContext`

Compactor lazy client reads unset `model_provider`

`_run_streamed` hardcodes root agent's max turns