Commit 9c66420
chore: scope package and align tooling with @reaatech/* standards (#12)
* chore: scope package and align tooling with @reaatech/* standards
Renames the npm package to @reaatech/agent-eval-harness and brings the repo
in line with the house standard set by ../a2a-reference-ts.
Naming
- npm package: agent-eval-harness -> @reaatech/agent-eval-harness
- Added publishConfig.access=public for scoped publishing
- Bin command, OTel service names, MCP server name, Docker tag, repo URL
all kept unscoped (identity strings, not package references)
- Updated install/import examples in README, CLAUDE.md, AGENTS.md, all
skills/*/skill.md, and runtime getLibraryInfo()
- Added top-level LICENSE (was missing despite "license": "MIT")
Tooling
- Replaced ESLint + Prettier with Biome 1.9.4 (single tool, faster)
- Removed eslint.config.mjs, .prettierrc, all eslint/prettier devDeps
- Tightened tsconfig: strictFunctionTypes, strictBindCallApply,
strictPropertyInitialization, alwaysStrict, isolatedModules,
verbatimModuleSyntax (full reference parity)
- Removed invalid ignoreDeprecations="6.0" that was breaking builds on TS 5.x
- Fixed bogus version specs (TS ^6.0.3, vitest ^4.1.5, eslint ^10.x didn't
exist on registry); aligned to reference's ^5.8.3 / ^3.2.4
Package manager
- Switched from npm to pnpm@10.22.0
- Added .npmrc with strict-peer-dependencies=true
- Generated pnpm-lock.yaml, removed package-lock.json
- Surfaced and fixed real peer-dep conflict (otel api 1.9 vs sdk-* needing
<1.9): pinned api to ~1.8.0
- Added pnpm.overrides to force uuid >=14.0.0 (resolves moderate audit finding)
- Dropped --legacy-peer-deps workaround from Dockerfile
CI/Release
- Rewrote ci.yml to mirror reference shape: install -> {audit, format, lint,
typecheck} -> build (uploads artifact) -> {test (matrix node 20+22),
coverage, docker-build, docker-compose, eval} -> all-checks final gate
- Updated eval.yml to use pnpm
- release.yml: kept tag-trigger pattern (no changesets, single-package),
switched to pnpm, added npm provenance, GitHub Packages mirror
Code cleanup
- Eliminated all 46 noNonNullAssertion sites for full reference parity
(rule now at error level, matching reference)
- Production: refactored Levenshtein to use ?? fallbacks; added explicit
null guards in pairwise loop iteration; replaced filter().map(t => t.opt!)
with explicit for-loop in eval.command
- Tests: replaced expect(x.opt!).toBeLessThan(y.opt!) with `as number`
casts; replaced find()!.prop with optional chaining + toBeDefined()
Documentation
- Added SCOPED_REMEDIATION.md: phased checklist for applying these same
standards to other @reaatech/* repos, including pre-flight inventory,
decision points, common pitfalls (git stash hazards, biome --unsafe
breakage, audit overrides), and a final verification matrix
Local pipeline now all green:
- pnpm typecheck: clean
- pnpm lint: 0 errors, 0 warnings
- pnpm test: 735 passing
- pnpm build: clean
- pnpm audit --audit-level moderate: no vulnerabilities
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(ci): call CLI via node, grant pull-requests permission
Two issues surfaced by PR #12 first CI run:
1. `pnpm exec agent-eval-harness` failed with "Command not found".
Unlike npm, pnpm does not link a package's own bin into
node_modules/.bin/, so the CLI was unreachable. Switched to
`node dist/cli.js` which works under both managers and avoids
the magic-bin-linking quirk entirely.
2. The PR-comment step hit a 403 "Resource not accessible by
integration" because the workflow lacked pull-requests:write.
This was a pre-existing gap that only surfaced when issue (1)
caused the comment step to actually run with an error body.
Added explicit permissions block.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* fix(ci): grant pull-requests permission at eval reusable-workflow call site
A reusable workflow's permissions block can only be a subset of what the
caller grants. When eval.yml started declaring pull-requests:write, ci.yml's
eval: job invocation needed to grant it explicitly — without that, the
workflow fails at startup before any job runs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
* docs: capture two CI pitfalls discovered on PR #12 first run
Adds two new entries to SCOPED_REMEDIATION.md based on real failures:
1. pnpm does not link the package's own bin into node_modules/.bin/
the way npm does — `pnpm exec <own-bin>` fails. Use `node dist/cli.js`
in CI scripts and Dockerfiles.
2. Reusable workflow `permissions:` blocks must be mirrored at every
`uses:` call site in the parent workflow. Without that, the run
fails with `startup_failure` before any job executes — and there are
no job logs to inspect, only a workflow-level error.
Both were caught by PR #12's CI run; both are documented now so a
future remediation against another @reaatech/* repo doesn't need to
rediscover them.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>1 parent 5677fb7 commit 9c66420
74 files changed
Lines changed: 6618 additions & 9333 deletions
File tree
- .github/workflows
- skills
- cost-tracking
- eval-gating
- faithfulness-scoring
- golden-trajectories
- latency-budgets
- llm-judge-calibrated
- regression-suites
- relevance-scoring
- tool-use-validation
- trajectory-eval
- src
- cli
- commands
- cost
- gate
- golden
- judge
- latency
- mcp-server
- tools
- gate
- judge
- suite
- observability
- suite
- tool-use
- trajectory
- tests
- integration
- unit
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
9 | 9 | | |
10 | 10 | | |
11 | 11 | | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
12 | 16 | | |
13 | 17 | | |
14 | 18 | | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
15 | 22 | | |
16 | 23 | | |
17 | 24 | | |
18 | 25 | | |
19 | | - | |
20 | | - | |
| 26 | + | |
21 | 27 | | |
22 | 28 | | |
23 | | - | |
| 29 | + | |
24 | 30 | | |
25 | 31 | | |
26 | | - | |
| 32 | + | |
27 | 33 | | |
28 | 34 | | |
29 | 35 | | |
| |||
40 | 46 | | |
41 | 47 | | |
42 | 48 | | |
43 | | - | |
| 49 | + | |
44 | 50 | | |
45 | 51 | | |
46 | 52 | | |
47 | 53 | | |
48 | 54 | | |
49 | 55 | | |
50 | 56 | | |
51 | | - | |
| 57 | + | |
52 | 58 | | |
53 | 59 | | |
54 | 60 | | |
55 | 61 | | |
56 | 62 | | |
57 | 63 | | |
58 | 64 | | |
59 | | - | |
| 65 | + | |
60 | 66 | | |
61 | 67 | | |
62 | 68 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
7 | 15 | | |
8 | 16 | | |
9 | 17 | | |
| 18 | + | |
10 | 19 | | |
11 | 20 | | |
12 | 21 | | |
13 | 22 | | |
| 23 | + | |
14 | 24 | | |
15 | | - | |
16 | | - | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
17 | 33 | | |
18 | 34 | | |
19 | 35 | | |
20 | | - | |
21 | | - | |
22 | | - | |
| 36 | + | |
| 37 | + | |
23 | 38 | | |
24 | | - | |
| 39 | + | |
25 | 40 | | |
26 | | - | |
27 | | - | |
| 41 | + | |
| 42 | + | |
28 | 43 | | |
29 | | - | |
30 | | - | |
| 44 | + | |
| 45 | + | |
31 | 46 | | |
32 | | - | |
33 | | - | |
| 47 | + | |
| 48 | + | |
34 | 49 | | |
35 | | - | |
| 50 | + | |
36 | 51 | | |
37 | 52 | | |
38 | | - | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
39 | 65 | | |
40 | 66 | | |
41 | | - | |
| 67 | + | |
42 | 68 | | |
43 | 69 | | |
44 | 70 | | |
45 | 71 | | |
46 | 72 | | |
47 | | - | |
| 73 | + | |
48 | 74 | | |
49 | | - | |
| 75 | + | |
50 | 76 | | |
51 | 77 | | |
52 | 78 | | |
53 | 79 | | |
54 | 80 | | |
55 | 81 | | |
56 | | - | |
57 | | - | |
58 | | - | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
59 | 85 | | |
60 | | - | |
| 86 | + | |
61 | 87 | | |
62 | 88 | | |
63 | 89 | | |
64 | 90 | | |
65 | 91 | | |
66 | 92 | | |
67 | 93 | | |
68 | | - | |
| 94 | + | |
69 | 95 | | |
70 | | - | |
| 96 | + | |
71 | 97 | | |
72 | 98 | | |
73 | | - | |
| 99 | + | |
74 | 100 | | |
75 | | - | |
| 101 | + | |
76 | 102 | | |
77 | 103 | | |
78 | 104 | | |
| 105 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
3 | | - | |
4 | | - | |
5 | | - | |
6 | | - | |
7 | | - | |
8 | | - | |
9 | | - | |
10 | | - | |
11 | | - | |
12 | | - | |
| 2 | + | |
| 3 | + | |
13 | 4 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
This file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
196 | 196 | | |
197 | 197 | | |
198 | 198 | | |
199 | | - | |
| 199 | + | |
200 | 200 | | |
201 | 201 | | |
202 | 202 | | |
| |||
251 | 251 | | |
252 | 252 | | |
253 | 253 | | |
254 | | - | |
| 254 | + | |
255 | 255 | | |
256 | 256 | | |
257 | 257 | | |
| |||
363 | 363 | | |
364 | 364 | | |
365 | 365 | | |
366 | | - | |
| 366 | + | |
367 | 367 | | |
368 | 368 | | |
369 | 369 | | |
| |||
405 | 405 | | |
406 | 406 | | |
407 | 407 | | |
408 | | - | |
| 408 | + | |
409 | 409 | | |
410 | 410 | | |
411 | 411 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
187 | 187 | | |
188 | 188 | | |
189 | 189 | | |
190 | | - | |
| 190 | + | |
191 | 191 | | |
192 | 192 | | |
193 | 193 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
| 4 | + | |
5 | 5 | | |
6 | | - | |
7 | | - | |
| 6 | + | |
8 | 7 | | |
9 | | - | |
10 | | - | |
| 8 | + | |
| 9 | + | |
11 | 10 | | |
12 | | - | |
13 | 11 | | |
14 | 12 | | |
15 | 13 | | |
16 | | - | |
17 | | - | |
| 14 | + | |
18 | 15 | | |
19 | 16 | | |
20 | 17 | | |
21 | 18 | | |
| 19 | + | |
| 20 | + | |
22 | 21 | | |
23 | 22 | | |
24 | | - | |
25 | | - | |
| 23 | + | |
| 24 | + | |
26 | 25 | | |
27 | 26 | | |
28 | 27 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
22 | 22 | | |
23 | 23 | | |
24 | 24 | | |
25 | | - | |
| 25 | + | |
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| |||
0 commit comments