You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(llm): wire observe_llm_call into honcho_llm_call
Wraps the body of honcho_llm_call (both tool-less and tool-loop paths)
in observe_llm_call(...) so every invocation produces one set of
Prometheus samples and one logfmt log line.
Captures the AttemptPlan that produced the most-recent (and on success,
the winning) call via a `last_plan` cell updated inside _get_attempt_plan,
so the recorded provider/model is the one that actually answered —
primary on early attempts, backup on the final retry. This makes
backup-on-final-attempt observable directly from llm_calls / llm_tokens
without parsing logs.
Passes track_name and trace_name through to execute_tool_loop so its
per-tool counter (added in the previous commit) carries the same
feature label as the call-level metrics.
When the tool loop returns response.hit_max_iterations=True, the call's
outcome is overridden to error_max_iterations via mark_max_iterations
so dashboards can split "model didn't converge" from clean success
without the tool-loop having to know about outcome semantics.
Streaming responses don't carry token counts at the entry point —
the recorded call still emits but token counters skip those rows
(record_llm_tokens silently no-ops on count<=0). Acceptable partial
signal until streaming refactor surfaces tokens earlier.
ruff + basedpyright clean. End-to-end smoke verified all six series
fire correctly across success, success_via_backup, error_max_iterations,
error_timeout, and tool-call paths.
0 commit comments