Commit d755bee
authored
feat: 0.21.0 — capture integrity for launch-grade benchmark runs (#36)
Closes the layer-1 gap a downstream consumer surfaced: better post-run
statistics don't help if the underlying data wasn't captured. 0.21 ships:
1. RawProviderSink — first-class HTTP-level capture
2. assertLlmRoute — fail-loud route guard
3. assertRunCaptured — run-completion integrity check
4. onRunComplete hooks + traceAnalystOnRunComplete — auto orchestration
Each piece is opt-in but composes cleanly: the matrix runner wires
FileSystemRawProviderSink, calls assertLlmRoute({ requireExplicitBaseUrl,
allowedBaseUrls }) at preflight, attaches traceAnalystOnRunComplete via
TraceEmitterOptions.onRunComplete, and asserts a clean RunIntegrityReport
before declaring the run complete. Result: a launch-decision-grade artifact
without out-of-band glue.
RawProviderSink notes:
- InMemoryRawProviderSink, FileSystemRawProviderSink (NDJSON, rolls at 32MiB),
NoopRawProviderSink ship in core
- Default redactor strips Authorization / X-Api-Key / Cookie headers and
credential-shaped body fields (apiKey, bearer, password, secret, token)
- redactedFields array on every event records what was stripped
- Wired into callLlm: every retry attempt produces a request and either a
response or an error event with attemptIndex
- Forensics-only: sink errors never crash the underlying LLM call
Verifier route guard:
- assertLlmRoute(opts, req) is pure (no I/O); safe to call from constructors
and CI gates
- Throws structured LlmRouteAssertionError with code field for programmatic
handling (no_explicit_base_url, base_url_blocked, base_url_not_allowed,
no_auth, wrong_provider)
Integrity check:
- assertRunCaptured returns RunIntegrityReport with issue codes; caller
decides throw vs mark-failed via throwIfRunIncomplete
- Pair with requireRawCoverageOfLlmSpans to catch the bug class where the
structured span was emitted but raw HTTP capture was wired to a different
sink
Run-complete hooks:
- TraceEmitterOptions.onRunComplete + addRunCompleteHook
- Errors are swallowed by default (auto-orchestration must not crash the
underlying flow) and logged as 'log' events; opt into propagation via
hookErrors: 'throw'
- traceAnalystOnRunComplete is the drop-in factory for the analyst case
Version lockstep:
- npm @tangle-network/agent-eval 0.21.0
- pypi tangle-agent-eval 0.21.0
867/867 tests passing (+30 new across 5 files: raw sink, route assertion,
run integrity, hook lifecycle, llm raw capture).1 parent c8f03bd commit d755bee
16 files changed
Lines changed: 1463 additions & 33 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
4 | 11 | | |
5 | 12 | | |
6 | 13 | | |
7 | | - | |
8 | | - | |
9 | | - | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
10 | 50 | | |
11 | 51 | | |
12 | 52 | | |
13 | | - | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
33 | | - | |
34 | | - | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
35 | 61 | | |
36 | 62 | | |
37 | 63 | | |
38 | | - | |
| 64 | + | |
39 | 65 | | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
40 | 78 | | |
41 | 79 | | |
42 | 80 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | | - | |
| 7 | + | |
8 | 8 | | |
9 | 9 | | |
10 | 10 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
554 | 554 | | |
555 | 555 | | |
556 | 556 | | |
| 557 | + | |
| 558 | + | |
557 | 559 | | |
558 | 560 | | |
559 | 561 | | |
560 | 562 | | |
561 | 563 | | |
562 | 564 | | |
563 | 565 | | |
| 566 | + | |
564 | 567 | | |
565 | 568 | | |
566 | 569 | | |
| |||
0 commit comments