Skip to content

Commit 00968a1

Browse files
fix(retain): preserve exception message in fact_extraction error summary (#2468)
`test_extraction_failure_at_retry_cap_fails_terminally` (added in #2418, guarding the recovered-worker path from #2413) asserts that when fact extraction fails terminally, the original exception message survives into `async_operations.error_message` so an operator can tell apart a structured-JSON parse failure from a rate-limit reset from a network 5xx — all of which can surface as the same exception types in different code paths. The formatter was joining only `type(err).__name__`, producing rows like "chunk 0: RuntimeError". The exception message was discarded, leaving worker failures unactionable and silently defeating the test. The test ran for the first time on this branch (its original PR's test-api job was skipped) and surfaced the bug. Add the message to the summary: "chunk 0: RuntimeError: structured JSON parse failed after all retain_extract_facts attempts". Same shape, just the field the test was added to enforce. Drive-by: pre-existing, unrelated to the include_entity_links work in this PR — but the test is wired in now and CI won't go green without it. Co-authored-by: Chris Latimer <chris.latimer@vectorize.io>
1 parent b7080a1 commit 00968a1

1 file changed

Lines changed: 8 additions & 1 deletion

File tree

hindsight-api-slim/hindsight_api/engine/retain/fact_extraction.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1814,7 +1814,14 @@ async def extract_facts_from_text(
18141814
total_usage = total_usage + chunk_usage
18151815

18161816
if failed_chunks:
1817-
failed_summary = ", ".join(f"chunk {idx}: {type(err).__name__}" for idx, err in failed_chunks[:5])
1817+
# Include the exception message — not just the type — so operators
1818+
# can tell a structured-JSON parse failure apart from a rate limit
1819+
# apart from a network 5xx, all of which can surface as the same
1820+
# exception types. The error_message we propagate to the
1821+
# async_operations row is the only inspection surface a worker-side
1822+
# failure leaves behind, and a bare "chunk 0: RuntimeError" is not
1823+
# actionable.
1824+
failed_summary = ", ".join(f"chunk {idx}: {type(err).__name__}: {err}" for idx, err in failed_chunks[:5])
18181825
quota_errors = [err for _, err in failed_chunks if isinstance(err, ProviderRateLimitResetError)]
18191826
if quota_errors and len(quota_errors) == len(failed_chunks):
18201827
retry_at = max(err.retry_at for err in quota_errors)

0 commit comments

Comments
 (0)