Skip to content

Coding Agent Expansion (Added Tools, New Slash Commands, Rewritten Prompts, Improved Chat UX, and more)#2210

Open
russell-rozenbaum wants to merge 52 commits into
devfrom
coding-agent-projector-tools-extension
Open

Coding Agent Expansion (Added Tools, New Slash Commands, Rewritten Prompts, Improved Chat UX, and more)#2210
russell-rozenbaum wants to merge 52 commits into
devfrom
coding-agent-projector-tools-extension

Conversation

@russell-rozenbaum
Copy link
Copy Markdown
Contributor

@russell-rozenbaum russell-rozenbaum commented Apr 9, 2026

Summary

Major coding-agent expansion. Scope: agent surface (src/web/view/agent{Core,View}/, agent CSS/tests, OpenRouter.re, HighLevelNodeMap.re, prompt factory).

What's new

  • Structure-editor tools — path-addressed syntax-projector tools via HighLevelNodeMap, statics tools, generalized insert_before/insert_after, per-tool + per-category toggles, ProjectorCatalog.
  • Chat loop & control — send queue, stop, multi-tool abort, auto + manual /compact, context meter, sticky scroll, centralized empty/API retry path.
  • Session modes & workbench — Converse / Edit / Plan top-bar toggle; tasks + subtasks with ordering and status tracking.
  • Slash commands — typed payloads + SlashCommandOutput view; /help, /session-usage, /key, /key-usage, /account-usage, /show-thinking, /compact.
  • Model & API — thinking view, reasoning-effort dropup, model picker (FP Lab recs, fuzzy search, only-free), OpenRouter account/key endpoints, safer error paths.
  • Display — plain-text tool-call rows + signifiers + jump-to-node, ToolCallSummary, agent-message Markdown via Omd → vdom, chat export/copy.
  • Prompts & docs — structure-editor-first identity, composition/compaction overhaul, ^^ projector syntax, AGENTS.md + handoff doc.
  • RefactorsAgentCore/AgentViewagentCore/agentView; top-bar above textbox; dropped empty-reply workbench nudge (plan-mode loop fix).
  • Tests — new Test_AgentControlFlow, Test_AgentMultiTool; expanded Test_AgentUX, Test_AgentTools (incl. HighLevelNodeMap).
  • Streaming — tokens are now streamed in via OpenRouter

Files outside agent scope

11 narrow integration touches (icons, Page route, sidebar hook, JsUtil helpers, prompt-string trim, projector/probe perform plumbing, test-runner reg, run_tests/.gitignore). Rest of git diff origin/dev..branch is artifact from dev moving 421 commits since divergence at ab5ba95da25 — no branch-side deletions.

Tour — new bottom-bar controls

  • Toggle agent modes: Edit / Plan / Converse
  • Select reasoning effort
  • Change model inline (current model shown)
Screenshot 2026-04-18 at 3 02 18 AM Screenshot 2026-04-18 at 3 20 06 AM Screenshot 2026-04-18 at 3 20 35 AM

New slash commands

Screenshot 2026-04-18 at 3 04 31 AM

Agent has ability to place probes and understand their intermittent results

Along with this, it can also place projectors.

Screenshot 2026-04-18 at 3 10 05 AM

Agent can batch multiple tools per turn

More informative collapsed tool descriptions

Also comes with the ability to Cmd/Ctrl+Click to jump to the definition of the respective definition that was edited

Screenshot 2026-04-18 at 3 06 20 AM

Thinking blobs now displayed as their own UI chunks

Screenshot 2026-04-18 at 3 07 42 AM

Refactored model selection UI

We can recommend some models (this requires maintaining code though :/ might need to remove).

Also added the ability to grep for models in the master view.

Stopped displaying free models first, instead added a filter toggle.

Screenshot 2026-04-18 at 3 11 42 AM

User can stop chats early (and also queue up messages)

Screenshot 2026-04-18 at 3 14 50 AM

Batched Tool Calls

Along with error handling—if tool call n fails, all tool calls scheduled after n are not executed to avoid cascading failures, rogue bugs, and unintentional logic flow.

Screenshot 2026-04-18 at 3 15 19 AM

… docs

- Scroll chat to bottom on updates only when already pinned to bottom
- System prompt and developer notes use full-screen-doc (left-aligned, readable)
- Hide edits summary when there are no edit tools in a turn
- Wire Agent Tools to tools-view-* CSS (flex row, hover, toggle states)

Made-with: Cursor
Add place_statics, remove_statics, and toggle_statics with ProbePerform
helpers, StaticsAction wiring, and Agent handling. Register tools in
CompositionUtils and extend tests, including projector snippets.

Introduce ProjectorCatalog and refresh CompositionPrompt, CompactionPrompt,
HazelDocumentation, HazelSyntaxNotes, and ProbeTools copy for projectors
and statics.

When marking the active workbench task complete, fold remaining subtasks
to completed instead of failing; document in WorkbenchTools.

Fix chat messages autoscroll while pinned to the bottom by deferring
scroll-to-bottom after layout and enriching the scroll stamp (tool
results and slack). Set min-height on full-screen view content for flex
scrolling.

Made-with: Cursor
Add place/remove/toggle_syntax_projector tools (OpenAPI + CompositionUtils +
CompositionActions) and ProjectorPerform helpers for path-resolved placement.

Agent.re: handle SyntaxProjectorAction with expand-on-success; fail when zero
paths apply; extend mk_diff for projector/probe/statics actions; retry and
workbench nudges can use synthetic user-role API messages; document message
channels in file header.

HighLevelNodeMap: improve closest-path suggestions for nested bindings; note
outer/inner paths in path_to_id errors.

Prompts: message_channels, partnering_and_user_intent, CONTEXT UPDATE echo
ban, EditTools path semantics, projector catalog; compaction prompt aligned.

Tests: HighLevelNodeMap path cases, binding_clause scenarios, syntax projector
parse coverage.

UI: AgentMessageMarkdown + styles for assistant bubbles.

Track .cursor/docs/coding-agent-projector-tools-extension.md via .gitignore
exception for agent onboarding.

Made-with: Cursor
- Copy exact LLM context snapshot from Agent Context panel (clipboard shim +
  toast); shared Message/Agent helpers and tests in Test_AgentUX.
- Multi-tool turns: stop after first real failure; skipped tools with
  AgentToolResult.skipped, grey circle-minus icon, chat export [not executed];
  Test_AgentMultiTool.
- Context meter: 80% rounded limit capped at 100k tokens (AgentGlobals);
  compaction uses same limit.
- JsUtil: copy_via_shim, show_copy_toast; Icons.circle_with_minus; CSS for
  header actions and skipped tool state.
- run_tests: node stack + idb_stub alignment; haz3ltest registers multi-tool suite.
- Add AGENTS.md; .gitignore .cursor/ with docs exception; prompt/catalog/
  HighLevelNodeMap/SyntaxProjectorTools/WorkbenchTools tweaks.

Made-with: Cursor
…agent

- After update_definition, sanitize syntax projectors in the definition segment
  via ProjectorPerform.sanitize_projectors_in_segment so wrappers match init.
- GeneralTreeUtils and CompositionGo: get_refs_to after pattern edits; clearer
  ambiguous-path behavior and EditTools prompt notes.
- StringUtil: trim_leading and trim_trailing_whitespace strip horizontal
  whitespace (space, tab, CR); align with paste (ClipboardCache) and tests.
- CompositionPrompt documents multiple tool calls per turn and sequential
  skip-on-failure; CompactionPrompt is expanded with Overview, Goals, Rules,
  and related sections.
- Agent: surface Action.Failure.show when a structural tool cannot be applied;
  minor clarity in tool-chain failure detection.
- Tests: AgentTools, AgentUX, and StringUtil updates.

Made-with: Cursor
- Chat.Utils: estimate_openrouter_prompt_tokens and context_meter_prompt_tokens
  so the bar updates when history is compacted (estimate) until the next agent
  reply supplies provider usage again.
- ChatBottomBar uses context_meter_prompt_tokens for the token display.
- CompositionPrompt and update_definition tool text: ^^kind(expr) for livelits
  (slider, sliderf, check, text, csv, card) when overwriting definitions.

Made-with: Cursor
Enforce at most one in-flight main or compaction LLM request; add Stop
which clears busy state and ignores the matching reply via flight
sequence numbers (including API error and retry paths). Chat bottom bar
shows Stop while awaiting or compacting and blocks send until idle.

Prompts: prefer incremental insert/update tools over monolithic
initialize; treat workbench tasks as optional for large multi-turn work
only. Compaction instructions prioritize transcript and tool results
over a misleading empty agent-view snapshot.

Tests: HandleLLMResponse passes main_llm_seq; add stop_square icon.
Made-with: Cursor
- Queue user sends while busy; flush when idle; Stop only via button; append
  cancel line before flushing so ordering stays correct.
- Chunked UI: ResponseCancelled outside Filbert; compaction body renders as
  Markdown; CSS for queue panel and markdown compaction.
- Compaction: dialogue slice includes prior summary for chained compacts;
  prompt stresses merging that block with new turns; structured Markdown
  output contract.
- Context meter shows only API prompt_tokens; use em dash until next reply
  after compaction (no client estimate).
- HighLevelNodeMap: duplicate path strings resolve to earliest sibling;
  update CompositionPrompt and EditTools path guidance.
- Tests: duplicate path order; compaction dialogue slice includes summary.

Made-with: Cursor
…ENTS manual QA

- Test_AgentControlFlow: Stop, stale HandleLLMResponse/Compaction, ApiError ignore, queue flush ordering, Send while busy
- Test_AgentUX: context_meter_prompt_tokens, messages_for_openrouter, MarkActiveTaskComplete subtask auto-complete
- Test_AgentTools: get_refs_to_after_pattern_edit vs get_refs_to when pre/post agree
- haz3ltest: register AgentControlFlow suite
- AGENTS.md: agent test filters and manual QA checklist for UI/clipboard

Made-with: Cursor
- Add SetToolsInCategoryEnabled and Agent Tools header toggles per suite
  (View/Edit/Workbench/Other): all-on indicator, click enables or disables
  every tool in that category for the chat.
- Sidebar: assistant tab tooltip "Open Hazel Coding Agent".
- ChatBottomBar: shorter queue placeholders while compacting or awaiting.

Made-with: Cursor
- Document that csv attaches only to empty [] and card expects playing-card
  tuples; update projector catalog, composition prompt, edit tools, and
  syntax projector tool descriptions.
- Context meter: second line for one-decimal percent, show 100k for 100000
  limit, br() layout; drop nowrap on label for line break.

Made-with: Cursor
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 9, 2026

Codecov Report

❌ Patch coverage is 33.86693% with 1322 lines in your changes missing coverage. Please review.
✅ Project coverage is 50.96%. Comparing base (4061cfd) to head (a6e1e94).

Files with missing lines Patch % Lines
src/web/view/agentCore/AgentUpdate.re 25.82% 428 Missing ⚠️
src/web/view/agentCore/Message.re 31.69% 125 Missing ⚠️
src/util/API.re 0.00% 82 Missing ⚠️
src/util/OpenRouter.re 55.00% 81 Missing ⚠️
src/web/view/agentView/ToolCallSummary.re 48.52% 70 Missing ⚠️
src/web/view/agentCore/Chat.re 50.71% 69 Missing ⚠️
src/haz3lcore/projectors/ProjectorPerform.re 11.84% 67 Missing ⚠️
src/web/view/agentCore/AgentToolCallHandler.re 34.34% 65 Missing ⚠️
...mpositionCore/AgentWorkbenchCore/AgentWorkbench.re 8.19% 56 Missing ⚠️
src/web/view/agentCore/ChatSystem.re 32.05% 53 Missing ⚠️
... and 14 more
Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #2210      +/-   ##
==========================================
+ Coverage   50.26%   50.96%   +0.70%     
==========================================
  Files         267      278      +11     
  Lines       31709    32800    +1091     
==========================================
+ Hits        15938    16716     +778     
- Misses      15771    16084     +313     
Files with missing lines Coverage Δ
...CompositionCore/prompt_factory/CompactionPrompt.re 100.00% <ø> (ø)
src/util/StringUtil.re 73.55% <100.00%> (+0.22%) ⬆️
src/web/app/common/Icons.re 100.00% <100.00%> (ø)
src/web/view/agentCore/ChatSlashCommands.re 100.00% <100.00%> (ø)
src/haz3lcore/CompositionCore/AgentToolResult.re 6.66% <50.00%> (+6.66%) ⬆️
src/web/view/agentCore/AgentResult.re 0.00% <0.00%> (ø)
...CompositionCore/prompt_factory/ProjectorCatalog.re 75.00% <75.00%> (ø)
src/haz3lcore/CompositionCore/CompositionGo.re 74.13% <61.53%> (+8.07%) ⬆️
src/haz3lcore/CompositionCore/GeneralTreeUtils.re 57.89% <54.54%> (-0.44%) ⬇️
src/haz3lcore/ProbePerform.re 22.29% <60.00%> (+8.82%) ⬆️
... and 18 more

... and 31 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

russell-rozenbaum and others added 16 commits April 17, 2026 15:08
- HighLevelNodeMap.next_sibling_of / prev_sibling_of: `mod` binds tighter
  than `+/-`, so `idx + 1 mod len` was parsed `idx + (1 mod len)` — never
  wrapped, and raised on empty siblings. Wrap explicitly and guard len=0.
- Agent.Update: API-error content had `"\\Error: "` (literal backslash),
  producing "Code: 429\Error: ..." in user-visible messages. Extract
  format_api_error_content and use `"\nError: "`.
- Tests: 4 new sibling wrap-around cases in HighLevelNodeMap suite;
  1 new API-error format case in Agent UX suite.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- HighLevelNodeMap.closest_valid_path_to_ill_path: called inside the
  error-handling branch of path_to_id, so returning "" on an empty
  node_map is safer than raising (avoids compound failure).
- Agent.Update: hoist max_empty_retries to module level next to
  max_api_retries; interpolate it into the retry copy so the bound
  and the user-visible "/N" stay in sync.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Code-review fixes from the branch review:
- M1: HighLevelNodeMap sibling-nav precedence + empty-siblings guard
- M2: API-error content uses \n between Code and Error labels
- N4: closest_valid_path_to_ill_path no longer raises on empty map
- N1: max_empty_retries centralized and interpolated in retry copy

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…oggle

Restructure the LLM model list into a curated recommended section and a
searchable master list. Recommended entries (Opus 4.6, Sonnet 4.6,
Gemini 3 Flash Preview, MiMo V2 Pro, Gemma 4 31B) are ordered most
capable → cheapest with per-model taglines. Master list supports
subsequence fuzzy search on name+id and an "Only free" toggle, with
state in agent_globals (model_filter, only_free_models; yojson/sexp
defaults keep existing saves compatible). Recommended and master live
in separate scroll containers sized to keep the Confirm Settings button
visible at typical sidebar heights.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Empty-program seeding now uses insert_before / insert_after with no path;
removes the separate initialize tool. Omitting path on either insert
prepends (Before) or appends (After) at the program boundary, collapsing
two code paths into one.

- Drop EditTools.initialize JSON and Initialize action variant
- Add InsertAtProgramBoundary(direction, code) for no-path inserts
- Parser treats missing / empty-string path as the boundary case
- Update prompts (Composition, Compaction, RecFib), tool catalogs, and
  chat/agent UI tool-name lists
- Repurpose initialize tests as no-path insert_before/insert_after tests
- Tool count: 33 -> 32; insert tools drop path from required list
…ge, /cost, /help

Replaces markdown-rendered slash output with a typed `slash_command_payload`
variant pipeline. Each command builds its own record (cost_output, credits_output,
usage_output, help_output, KeyOutput, SlashError); the view layer owns formatting
via SlashCommandOutputView, which renders custom card UIs per kind with branded
border-left colors, stat tiles, kv rows, a credit progress bar, and a help table.

New commands:
- /cost          — estimates session $ from per-message token counts
- /account-usage — hits /api/v1/credits (account-wide credit pool)
- /key-usage     — hits /api/v1/key (per-key spend, limits, daily/weekly/monthly)
- /key           — shows the currently-set OpenRouter API key string
- /help          — lists all slash commands

OpenRouter HTTP additions follow the existing get_models idiom; no caching, no
key-persistence change. Tests updated to cover the new alphabetical ordering and
help_payload contents.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pure formatter reflow, no semantic changes.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds a per-model reasoning-effort selector to the chat bottom bar,
gated on whether the active OpenRouter model advertises `reasoning`
in its `supported_parameters`. The dropup appears in the action-button
row (left side) with options Off / Low / Medium / High and uses pure
CSS hover for open/close.

Wiring:
- `llm_info` gains a `supports_reasoning: bool` field, parsed from
  the `/api/v1/models` payload alongside the existing required-params
  check; defaults false on legacy stored data.
- `AgentGlobals` gains `reasoning_effort: option(effort_level)` and a
  `SetReasoningEffort` action.
- `Payload.Utils.mk_default` accepts an optional `~reasoning`. Threaded
  through the four main-chat send paths (initial send, retries,
  retry-empty); chat-naming and compaction deliberately leave it `None`.

Also fixes overflow on the `/key` slash card: long API keys now wrap
within the card (`overflow-wrap: anywhere` + `word-break: break-all`).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…o-node

Replaces the boxed tool-call rows in the chat transcript with plain-text
rows that expand inline on click. Collapsed rows now show a per-tool
signifier (structural path or free-text summary) so the user can see at a
glance what each call did; expanded content is indented rather than boxed.

Lead-glyph categories (edit/read/view/projector/probe/statics/workbench)
are rendered on the left; success/fail/skip status icon stays on the right.
Cmd/Ctrl-click a row dispatches Globals.Update.JumpToTile(id) to the
corresponding AST node via HighLevelNodeMap.path_to_id_opt; primary click
still toggles expand. Rows whose path no longer resolves dim (.stale)
only for tools where persistence is expected — delete_* / remove_* never
dim.

New ToolCallSummary module centralizes per-tool category + signifier +
jump_paths + persists derivation. Summarizer is applied to the three
tool-call surfaces: ToolResultView (full redesign), ChatMessagesView
.summary-tool-link (signifier added), and WorkbenchView tool rows
(redesign via shared ToolResultView).

Tests: 14 summarizer unit tests in Test_AgentUX covering category
mapping, signifier extraction, path-list joining, free-text truncation,
and unknown-tool fallback.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…meter bar

Renames the user-facing slash command for symmetry with /key-usage and
/account-usage. Internal variant names (CostOutput, RunSlashCommandCost)
intentionally left as-is — only the typed-in command label changed.

Also moves the "(N.N%)" line in the bottom-bar context meter from the
label group (above the bar) to its own div below the bar, matching the
visual hierarchy of label → bar → percentage.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Match repo camelCase folder convention (e.g. menhirParser).
include_subdirs unqualified means no dune/import edits needed.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Render reasoning above response: italic dim body with
  "Thought for Ns" / "Nm Ms" header. Persisted show_thinking
  flag (default true) gates display.
- /show-thinking slash command toggles the flag, posts Notice.
- Reply.Model gains reasoning field; parse reasoning_content/
  reasoning/thinking from OpenRouter responses. Capture request
  elapsed_ms in HandleLLMResponse, store on Message for header.
- Bottom bar stacks active model's pretty name above a
  "change model" button routing to main menu; main menu auto-
  returns to chat on model select.
- Page.re: clipboard-shim copy falls back to window selection
  inside agent containers via new
  JsUtil.try_copy_window_selection_in_classes.
Persisted mode in AgentGlobals gates which tools the agent may call:
- converse: only view tools (no edits, workbench, or overlays)
- plan: blocks edits, keeps workbench so the agent can build todos
- edit: full toolset (still subject to per-tool toggles)

Mode is injected into the per-turn context snapshot and explained
statically in CompositionPrompt; identity copy now stresses Filbert
is a structure-editor agent, not a text-editor agent.

Bottom-bar gets a top row: info-icon + colored mode toggle (click
to cycle) on the left, current model name on the right. Change-model
button stays in the bottom row. Placeholder gains a "type / for
commands" hint.
Rewrote the identity block so "structure-editor agent operating on
typed syntactic structure via a small calculus of typed tool calls"
is the headline, not a parenthetical buried mid-paragraph. Added an
explicit guideline that introspective questions ("what are you?",
"what can you do?") must open with that framing instead of a generic
feature list — Filbert was answering them by listing capabilities
("writes code", "debugs", "explains") without mentioning the
structure-editor angle at all.
UI: move mode toggle + model name out of the bordered input container
into a top-bar sibling above it. Context meter stays in the action-row
(extracted to a let-binding for reuse). Trim placeholder back to
"Type your message..." — the slash-cmd tip added bulk.

Behavior: drop the workbench-nudge that fires when the assistant
finishes a turn with no text and an active subtask still open. The
"MANDATORY: write a sentence to the user" follow-up was creating loops
in plan mode. Idle path goes straight to the compact check.
russell-rozenbaum and others added 8 commits April 18, 2026 20:26
…o scroll stamp

The stick-to-bottom hook is keyed on chat_messages_scroll_stamp, which
previously only hashed finalized-log contents. During streaming the log
is stable and the stamp didn't move, so the hook's update never fired
and the view drifted away from the growing in-progress bubble.

Mixing String.length of pending_assistant_{content,reasoning} into the
stamp makes every delta re-trigger the hook. The existing
stick_to_bottom / is_near_bottom policy already disengages when the
user scrolls up and re-engages when they scroll back.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds four tools so the agent can maintain its own plan, not just append:
- update_active_task / update_active_subtask — edit title and/or
  description; title changes rename the StringMap key and update the
  active_task / display_task / subtask_ordering pointers that reference it.
- delete_task / delete_subtask — hard delete by title; clears pointers
  (active_task, display_task, active_subtask) that pointed at the removed
  entry.

Closes the gap where the agent could only add/reorder and had no way to
correct a bad title or drop a planned step.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The 4 new CRUD tools were falling through to the default None branch in
ToolCallSummary.of_tool_call and rendering under the OTHER category.
Route update_active_task/subtask and delete_task/subtask through the
workbench(...) helper with sensible signifiers (new_title or new_description
for updates; title for deletes).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds `AgentTools.AscribedBindings` group (16 tests) covering
`let x : T = v in ...`-style explicitly-typed bindings across:
- path_to_id resolution (top-level, after type-alias chain, nested)
- delete_binding_clause (simple, middle of chain, after type chain, list-literal body)
- delete_body on ascribed let
- update_definition (preserves ascription)
- update_binding_clause (replaces incl. ascription)
- insert_before/insert_after around ascribed lets
- chess-style full-chain delete (all 5 type aliases + ascribed initial_board)
- verbatim chess program: path_to_id finds every binding (incl. tuple-body Piece)
- verbatim chess program: delete_binding_clause Piece succeeds

Motivated by a live-editor repro where `delete_binding_clause Piece` failed
with "Path 'Piece' not found in node map" on a chain ending in an ascribed
let. String-parsed reproducers all pass — leaves room to detect regressions
if the string-parse path ever breaks.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When an agent's tool call references a binding path that doesn't resolve,
the error message now includes (a) the full list of available paths in
the node map, and (b) a fully qualified nested path when the requested
bare name is uniquely present deeper in the tree (e.g. "outer/inner"
for a bare "inner" query).

Motivated by a live-editor repro where `delete_binding_clause Piece`
failed with "Path 'Piece' not found in node map" but the agent only saw
a single levenshtein suggestion (PieceType) with no way to know what
*was* in the map. The root-cause of why `Piece` wasn't in the map from
the live zipper is still unreproduced from string-parse tests; this
change makes the failure mode self-diagnosing so the next occurrence
leaves actionable evidence.

Resolution semantics are unchanged — `path_to_id` is still strict on
the success path. Only the error message grew richer.

Adds 2 InvalidPaths tests covering the new diagnostic fields.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Every successful Delete(BindingClause) was surfacing to the agent as
"Path X not found in node map". Root cause: CompositionGo.Local.get_diff
called path_to_id(new_node_map, path) after a Delete, which correctly
no longer contains the path — so it raised, and the exception propagated
up as a spurious tool-call failure despite the edit having worked.

Swap to path_to_id_opt on the new-side lookup; None → new_segment=None
(semantically correct: a deleted binding has no replacement segment).
Old-side lookup unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Captures the agent-side invariants for the live-editor scenario that
triggered "Exception during View: Cannot read properties of undefined
(reading 'length')" when placing a probe on [fib]. The test asserts
path resolves, add_manual does not raise, statics rebuild, and
node_map rebuild. All pass — confirming the agent tool call path is
clean and the crash lives downstream in view/eval render (core, out
of agent scope).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
4 workbench tools added on this branch (see WorkbenchTools.re); the
sentinel count in Test_AgentTools fell behind and failed CI.
@russell-rozenbaum russell-rozenbaum force-pushed the coding-agent-projector-tools-extension branch from 1842763 to 6aac7a8 Compare April 21, 2026 02:57
@disconcision
Copy link
Copy Markdown
Member

The system prompt refers to the old forall keyword; this should be replaced by poly.

@russell-rozenbaum russell-rozenbaum marked this pull request as draft April 21, 2026 14:18
@russell-rozenbaum
Copy link
Copy Markdown
Contributor Author

Okay, verified that main code this hits external to agent infrastructure are *Perform.re files, where it mainly adds to them.
This was for the projector and probe placement tools we gave the agent.

@russell-rozenbaum russell-rozenbaum marked this pull request as ready for review April 21, 2026 22:37
russell-rozenbaum and others added 4 commits April 21, 2026 18:39
Agent.re was ~4600 lines. Extract each top-level module into its own
file under src/web/view/agentCore/. Pure code motion: no logic or
behavior changes.

New files:
  - AgentResult.re     (Failure + Result, paired foundational types)
  - Message.re         (Message.Model/Utils/Update)
  - Chat.re            (Chat.Model/Utils/Update)
  - ChunkedUIChat.re   (UI chunking of the chat log)
  - ChatSlashCommands.re
  - ChatSystem.re      (multi-chat container)

Agent.re now contains only the Agent module and unwraps the outer
module wrapper, so callers drop the Agent.Agent.X stutter in favor of
Agent.X. All other agent modules are referenced bare at file scope:
Message.X, Chat.X, ChunkedUIChat.X, ChatSlashCommands.X, ChatSystem.X.

Agent.re: 4584 -> 3046 lines. Build clean, full agent test suite green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 2 of the Agent.re god-module split. After phase 1 carved out the
chat-layer modules, Agent.re still held 3046 lines for the Agent
module itself. Extract each submodule into its own file:

  - AgentModel.re             (llm_error_origin + Model + Persistent)
  - AgentToolUtils.re         (tool-JSON helpers)
  - AgentUtils.re             (init, cleanup helpers)
  - AgentToolCallHandler.re   (CompositionActions -> ChatSystem dispatch)
  - AgentUpdate.re            (action dispatch + LLM request plumbing)

Agent.re is now a 30-line facade: doc comment + `include AgentModel`
for Model/Persistent/llm_error_origin + four module aliases for the
rest. External callers keep their `Agent.Model.t`, `Agent.Update.X`,
etc. paths unchanged.

Pure code motion; no logic, behavior, or public-API changes. Full
agent test suite green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
- Replace outdated `forall` keyword with `poly` in Hazel language
  guides (HazelSyntaxNotes, HazelDocumentation). `forall` is now
  the exp-level form; type-level polymorphism uses `poly`.
- Add three guideline directives to CompositionPrompt: iterate
  until done (stop only for user input), write adequate tests,
  end with a concise summary of what was done.
@russell-rozenbaum
Copy link
Copy Markdown
Contributor Author

The system prompt refers to the old forall keyword; this should be replaced by poly.

resolved

russell-rozenbaum and others added 8 commits April 21, 2026 21:07
Probe and Statics branches in ToolCallHandler.update silently
dropped unresolved paths and returned Ok, so `place_probe(["bogus"])`
produced a "tool call was successful" message while leaving the
editor untouched. Agent had no signal to self-correct.

Mirror the SyntaxProjector guard: track resolved vs unresolved
paths, and when `List.length(paths) > 0` but every path was
unresolved, return an Error listing the unresolved paths and
explaining the HighLevelNodeMap path format.

Partial resolves (some valid, some not) still succeed — matches
existing SyntaxProjector semantics. Strengthening that to surface
partial failures is a separate design call.

Adds 3 regression tests in Test_AgentUX.toolcall_handler_tests:
- place_probe with only bogus paths → Error
- place_statics with only bogus paths → Error
- place_probe with mixed valid/bogus paths → still Ok
- Eg_EmojiPaint.re was an 11-line placeholder (with a typo in the
  first sentence) never referenced from CompositionPrompt. Delete.
- LanguageServerAction branch in ToolCallHandler.update returned
  Ok silently for an unimplemented path. Return Error instead so
  the agent gets a real signal if a future tool ever produces it.
…rlay_action

The three overlay-tool branches in ToolCallHandler.update had near-identical
scaffolding: build the HighLevelNodeMap, fold per path tracking unresolved
and changed counts, error on total failure, rebuild the zipper/editor,
dispatch AgentContext.Expand on the changed paths. Extract into a shared
apply_overlay_action helper parameterized by tool_label, resolve_path, and
a perform closure returning option((zipper, should_expand)).

Net: 98 fewer lines and a single place to update overlay dispatch semantics.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Three new test groups pin down agent-core semantics that previously had no
coverage:

- tool_allowed_in_mode: Edit allows all; Plan blocks edit tools only;
  Converse additionally blocks workbench + overlay tools.
- backoff_ms: attempts 0..3 return 1000, 2000, 4000, 8000 (1000 * 2^n).
- StreamDelta: dropped when flight_seq matches pending_ignore_main_reply_seq;
  otherwise accumulates content + reasoning onto pending_assistant_*. Also
  covers the case where pending_ignore is set for a different flight.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rework the compaction system prompt so the summarizer produces a thorough
markdown recap rather than an aggressive compression. Key shifts:

- Add "Completeness over brevity" as the first goal. Length is cheap;
  forgetting a user rule or a failing test is expensive.
- Require near-verbatim reproduction of the final user and assistant
  messages in a new "## Most recent exchange" section, with blockquotes
  / fenced blocks preserving exact wording and any pending tool calls.
- Add "## User rules & preferences (quoted)" as a required section so
  standing instructions ("don't touch core/…", "prefer tail recursion")
  survive compaction verbatim and carry forward across chained compactions.
- Add "## Tool results & program values" so probe values, test pass/fail
  counts, and per-tool outcomes are enumerated with arguments/paths/values.
- Add "## Plans & notes" for stated plans, TODOs, and "next I'll…" commits.
- Expand "Preserve in the summary" to spell out every category the
  summarizer must cover, and explicitly tell the model that many output
  tokens are expected when the history warrants it.

No code-path changes — only the static string list feeding
mk_system_prompt. Build + 2605 tests green.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The .cursor/ IDE scratch folder had a gitignore exception for .cursor/docs/
that was letting personal handoff notes ship to PRs. Remove the tracked
file and drop the exception so .cursor/ is fully ignored, matching dev's
posture. The file remains on disk locally.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two call sites added on this branch referenced the old Info.exp.term
field, which Elastatics (#2213) split into user_term/elab_term. Other
call sites in these files were renamed on dev; our added functions
were not reached by that sweep.
@russell-rozenbaum russell-rozenbaum changed the title Coding agent expansion and improvements March/April Coding Agent Expansion (Added Tools, New Slash Commands, Rewritten Prompts, Improved Chat UX, and more) Apr 23, 2026
@cyrus- cyrus- moved this to AI Assistant in Hazel Big Board Apr 23, 2026
Copy link
Copy Markdown
Member

@disconcision disconcision left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks solid in general. I skimmed most of the agent-specific code, focussing on the interface with the rest of hazel; see code comments for detail.

Functional issues:

  1. New code insertion still seems to have leading spaces. Probably worth trying to make tests around this as it seems to be a recurring issue.
  2. Pressing enter to send a message in a long-ish conversation often lags... probably worth identifying why, but regardless, it would be nice to get some immediate feedback, just to make it obvious something is happening (eg message gets added to log and disappears from entry box), even if actual processing is subject to the lag. otherwise the user is left doubting what's going on.
  3. When inserting new let bindings into the body below the last let binding, it doesn't seem to always insert a linebreak or space. so you sometimes get stuff like inlet empty_playlist : PlayList = ([], NoSongSelected) in, which is problematic as if it's copy-pasted it'll break.

Aesthetic:

  1. Slash cards (.slash-card) are a bit dark... lets use background: var(--T2) instead.
  2. Tool calls are currently displayed as tool_name def_name. despite using different colors these kind of run into each other, let's stick some kind of separator between the two words. Also, it seems like def_name is slightly vertically elevated over tool_name; check the css.

Broader issues:

  • Not specific to this PR, but I think I'm running into issues based on how we're only keeping the current code view/map. I had the agent do one of the test tasks; he also wrote some tests. I then pasted in our test suite over the agent's, and told him that I did so and that a test was failing. The agent got confused though as all he saw were 'the tests he added'. I'm not actually totally clear on what was going on here, but I think it was that the agent is only ever seeing the current version, plus agent edits, so the agent gets confused when referring to 'updated'/'new' versions resulting from user edits, as those don't show up in the log. something like that. I copy/pasted the history here (https://gist.github.com/disconcision/cdf2950ee8b49820ce36f8540a169669) but it's not too readable. We should discuss this. I think this showcases the importance of a way to see what the model saw at each stage of a job.

changed. Strips a projector (exposing underlying syntax) when
[[MakeTerm.for_projection]] or [[ProjectorInit.init]] fails, migrating
refractors to the underlying term id when possible. */
let sanitize_projectors_in_segment =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does sanitize mean? re-validate? what does it mean for maketerm/init to fail? what does this have to do with refractors, which aren't in syntax the same way projectors are?

I don't really get what this is doing and the call site doesn't clear things up for me either

Comment thread AGENTS.md
@@ -0,0 +1,170 @@
# AGENTS.md - Hazel Development Guide
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we should commit this or not; seems fine; discuss with @cyrus-

Comment thread src/web/app/Page.re
switch (target) {
| Some(el) =>
let elId = Js.Opt.to_option(Js.Unsafe.coerce(el)##.id);
if (is_input_field(elId)) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hopefully won't be an issue when you're up to date with dev; @Negabinary has some keyboard handling changes that should obviate the need for these workarounds. lmk if that's not the case and we'll find a better way of doing this.

Comment thread src/util/StringUtil.re
let is_trailing_ws = (c: char): bool => c == ' ' || c == '\t' || c == '\r';
let trim_line = (line: string): string => {
let chars = String.to_seq(line) |> List.of_seq;
let rec drop_leading_spaces = (chars: list(char)): list(char) =>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this just mis-named before?

Comment thread src/util/JsUtil.re

Used to rescue native copy when a focused hidden element (e.g. the editor's
clipboard shim) would otherwise intercept Cmd/Ctrl+C. */
let try_copy_window_selection_in_classes =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopefully won't be necessary when current with dev... see comment below

api_key: option(string),
active_llm: option(OpenRouter.AvailableLLMs.Model.llm_info),
available_llms: OpenRouter.AvailableLLMs.Model.t,
[@yojson.default ""] [@sexp.default ""]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are these fields but not others defaulted? trying to make sure this isn't just papering over something

};

/** Like [[toggle_statics]], but for path-resolved ids (agent tools). */
let toggle_statics_at = toggle_statics;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure this alias is necessary?

| SampleFocus(a) => Ok(SampleFocusPerform.go(z, a))
};
};

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get what this set of functions is doing. they seem to be for generic projector placement, but internally they call migrate_refractor, which only applies to probes/statics, which unlike projectors are not written into the syntax tree.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: AI Assistant

Development

Successfully merging this pull request may close these issues.

3 participants