Skip to content

Commit 9c66420

Browse files
reaatechclaude
andauthored
chore: scope package and align tooling with @reaatech/* standards (#12)
* chore: scope package and align tooling with @reaatech/* standards Renames the npm package to @reaatech/agent-eval-harness and brings the repo in line with the house standard set by ../a2a-reference-ts. Naming - npm package: agent-eval-harness -> @reaatech/agent-eval-harness - Added publishConfig.access=public for scoped publishing - Bin command, OTel service names, MCP server name, Docker tag, repo URL all kept unscoped (identity strings, not package references) - Updated install/import examples in README, CLAUDE.md, AGENTS.md, all skills/*/skill.md, and runtime getLibraryInfo() - Added top-level LICENSE (was missing despite "license": "MIT") Tooling - Replaced ESLint + Prettier with Biome 1.9.4 (single tool, faster) - Removed eslint.config.mjs, .prettierrc, all eslint/prettier devDeps - Tightened tsconfig: strictFunctionTypes, strictBindCallApply, strictPropertyInitialization, alwaysStrict, isolatedModules, verbatimModuleSyntax (full reference parity) - Removed invalid ignoreDeprecations="6.0" that was breaking builds on TS 5.x - Fixed bogus version specs (TS ^6.0.3, vitest ^4.1.5, eslint ^10.x didn't exist on registry); aligned to reference's ^5.8.3 / ^3.2.4 Package manager - Switched from npm to pnpm@10.22.0 - Added .npmrc with strict-peer-dependencies=true - Generated pnpm-lock.yaml, removed package-lock.json - Surfaced and fixed real peer-dep conflict (otel api 1.9 vs sdk-* needing <1.9): pinned api to ~1.8.0 - Added pnpm.overrides to force uuid >=14.0.0 (resolves moderate audit finding) - Dropped --legacy-peer-deps workaround from Dockerfile CI/Release - Rewrote ci.yml to mirror reference shape: install -> {audit, format, lint, typecheck} -> build (uploads artifact) -> {test (matrix node 20+22), coverage, docker-build, docker-compose, eval} -> all-checks final gate - Updated eval.yml to use pnpm - release.yml: kept tag-trigger pattern (no changesets, single-package), switched to pnpm, added npm provenance, GitHub Packages mirror Code cleanup - Eliminated all 46 noNonNullAssertion sites for full reference parity (rule now at error level, matching reference) - Production: refactored Levenshtein to use ?? fallbacks; added explicit null guards in pairwise loop iteration; replaced filter().map(t => t.opt!) with explicit for-loop in eval.command - Tests: replaced expect(x.opt!).toBeLessThan(y.opt!) with `as number` casts; replaced find()!.prop with optional chaining + toBeDefined() Documentation - Added SCOPED_REMEDIATION.md: phased checklist for applying these same standards to other @reaatech/* repos, including pre-flight inventory, decision points, common pitfalls (git stash hazards, biome --unsafe breakage, audit overrides), and a final verification matrix Local pipeline now all green: - pnpm typecheck: clean - pnpm lint: 0 errors, 0 warnings - pnpm test: 735 passing - pnpm build: clean - pnpm audit --audit-level moderate: no vulnerabilities Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(ci): call CLI via node, grant pull-requests permission Two issues surfaced by PR #12 first CI run: 1. `pnpm exec agent-eval-harness` failed with "Command not found". Unlike npm, pnpm does not link a package's own bin into node_modules/.bin/, so the CLI was unreachable. Switched to `node dist/cli.js` which works under both managers and avoids the magic-bin-linking quirk entirely. 2. The PR-comment step hit a 403 "Resource not accessible by integration" because the workflow lacked pull-requests:write. This was a pre-existing gap that only surfaced when issue (1) caused the comment step to actually run with an error body. Added explicit permissions block. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * fix(ci): grant pull-requests permission at eval reusable-workflow call site A reusable workflow's permissions block can only be a subset of what the caller grants. When eval.yml started declaring pull-requests:write, ci.yml's eval: job invocation needed to grant it explicitly — without that, the workflow fails at startup before any job runs. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * docs: capture two CI pitfalls discovered on PR #12 first run Adds two new entries to SCOPED_REMEDIATION.md based on real failures: 1. pnpm does not link the package's own bin into node_modules/.bin/ the way npm does — `pnpm exec <own-bin>` fails. Use `node dist/cli.js` in CI scripts and Dockerfiles. 2. Reusable workflow `permissions:` blocks must be mirrored at every `uses:` call site in the parent workflow. Without that, the run fails with `startup_failure` before any job executes — and there are no job logs to inspect, only a workflow-level error. Both were caught by PR #12's CI run; both are documented now so a future remediation against another @reaatech/* repo doesn't need to rediscover them. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
1 parent 5677fb7 commit 9c66420

74 files changed

Lines changed: 6618 additions & 9333 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/ci.yml

Lines changed: 322 additions & 44 deletions
Large diffs are not rendered by default.

.github/workflows/eval.yml

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -9,21 +9,27 @@ on:
99
jobs:
1010
evaluate:
1111
runs-on: ubuntu-latest
12+
permissions:
13+
contents: read
14+
pull-requests: write
15+
issues: write
1216
steps:
1317
- uses: actions/checkout@v4
1418

19+
- name: Setup pnpm
20+
uses: pnpm/action-setup@v4
21+
1522
- name: Setup Node.js
1623
uses: actions/setup-node@v4
1724
with:
1825
node-version: '22'
19-
cache: 'npm'
20-
cache-dependency-path: package-lock.json
26+
cache: 'pnpm'
2127

2228
- name: Install dependencies
23-
run: npm ci --legacy-peer-deps
29+
run: pnpm install --frozen-lockfile
2430

2531
- name: Build
26-
run: npm run build
32+
run: pnpm build
2733

2834
- name: Download baseline results
2935
if: github.event_name == 'pull_request'
@@ -40,23 +46,23 @@ jobs:
4046
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
4147
run: |
4248
mkdir -p results
43-
npx agent-eval-harness eval \
49+
node dist/cli.js eval \
4450
trajectories/examples/*.jsonl \
4551
--config trajectories/examples/config.yaml \
4652
--output results/
4753
4854
- name: Run regression gates
4955
if: github.event_name == 'pull_request' && hashFiles('baseline/') != ''
5056
run: |
51-
npx agent-eval-harness compare \
57+
node dist/cli.js compare \
5258
baseline/results.json \
5359
results/results.json \
5460
--format markdown \
5561
--output results/comparison.md
5662
5763
- name: Check gates
5864
run: |
59-
npx agent-eval-harness gate \
65+
node dist/cli.js gate \
6066
results/results.json \
6167
--preset standard \
6268
--exit-code

.github/workflows/release.yml

Lines changed: 52 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -4,75 +4,102 @@ on:
44
push:
55
tags:
66
- 'v*'
7+
workflow_dispatch:
8+
9+
concurrency:
10+
group: ${{ github.workflow }}-${{ github.ref }}
11+
cancel-in-progress: false
12+
13+
env:
14+
NODE_VERSION: 22
715

816
jobs:
917
release:
18+
name: Release
1019
runs-on: ubuntu-latest
1120
permissions:
1221
contents: write
1322
packages: write
23+
id-token: write
1424
steps:
15-
- uses: actions/checkout@v4
16-
25+
- name: Checkout
26+
uses: actions/checkout@v4
27+
with:
28+
fetch-depth: 0
29+
30+
- name: Setup pnpm
31+
uses: pnpm/action-setup@v4
32+
1733
- name: Setup Node.js
1834
uses: actions/setup-node@v4
1935
with:
20-
node-version: '22'
21-
cache: 'npm'
22-
cache-dependency-path: package-lock.json
36+
node-version: ${{ env.NODE_VERSION }}
37+
cache: 'pnpm'
2338
registry-url: 'https://registry.npmjs.org'
24-
39+
2540
- name: Install dependencies
26-
run: npm ci
27-
41+
run: pnpm install --frozen-lockfile
42+
2843
- name: Run tests
29-
run: npm test
30-
44+
run: pnpm test
45+
3146
- name: Build
32-
run: npm run build
33-
47+
run: pnpm build
48+
3449
- name: Publish to npm
35-
run: npm publish --access public
50+
run: pnpm publish --access public --no-git-checks
3651
env:
3752
NODE_AUTH_TOKEN: ${{ secrets.NPM_TOKEN }}
38-
53+
NPM_CONFIG_PROVENANCE: 'true'
54+
55+
- name: Mirror to GitHub Packages
56+
env:
57+
NODE_AUTH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
58+
run: |
59+
cat > .npmrc <<EOF
60+
@reaatech:registry=https://npm.pkg.github.com
61+
//npm.pkg.github.com/:_authToken=${NODE_AUTH_TOKEN}
62+
EOF
63+
pnpm publish --registry=https://npm.pkg.github.com --no-git-checks
64+
3965
- name: Set up Docker Buildx
4066
uses: docker/setup-buildx-action@v3
41-
67+
4268
- name: Login to Docker Hub
4369
uses: docker/login-action@v3
4470
with:
4571
username: ${{ secrets.DOCKER_USERNAME }}
4672
password: ${{ secrets.DOCKER_PASSWORD }}
47-
73+
4874
- name: Build and push Docker image
49-
uses: docker/build-push-action@v5
75+
uses: docker/build-push-action@v6
5076
with:
5177
context: .
5278
push: true
5379
tags: |
5480
${{ github.repository }}:${{ github.ref_name }}
5581
${{ github.repository }}:latest
56-
cache-from: type=registry,ref=${{ github.repository }}:buildcache
57-
cache-to: type=inline
58-
82+
cache-from: type=gha
83+
cache-to: type=gha,mode=max
84+
5985
- name: Create GitHub Release
60-
uses: softprops/action-gh-release@v1
86+
uses: softprops/action-gh-release@v2
6187
with:
6288
generate_release_notes: true
6389
files: |
6490
dist/*.js
6591
body: |
6692
## Changes
6793
See the [CHANGELOG](https://github.com/${{ github.repository }}/blob/main/CHANGELOG.md) for details.
68-
94+
6995
## Installation
70-
96+
7197
### npm
7298
```bash
73-
npm install agent-eval-harness
99+
npm install @reaatech/agent-eval-harness
74100
```
75-
101+
76102
### Docker
77103
```bash
78104
docker pull ${{ github.repository }}:${{ github.ref_name }}
105+
```

.lintstagedrc.json

Lines changed: 2 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,4 @@
11
{
2-
"src/**/*.{ts,js}": [
3-
"eslint --fix",
4-
"prettier --write"
5-
],
6-
"tests/**/*.{ts,js}": [
7-
"eslint --fix",
8-
"prettier --write"
9-
],
10-
"*.{json,md,yaml,yml}": [
11-
"prettier --write"
12-
]
2+
"*.{ts,js,json,jsonc}": ["biome check --write --no-errors-on-unmatched"],
3+
"*.{md,yaml,yml}": ["biome format --write --no-errors-on-unmatched"]
134
}

.npmrc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
shamefully-hoist=false
2+
strict-peer-dependencies=true

.prettierrc

Lines changed: 0 additions & 11 deletions
This file was deleted.

AGENTS.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -196,7 +196,7 @@ Golden trajectories serve as reference implementations for regression testing.
196196
### Comparing Against Golden
197197

198198
```typescript
199-
import { compareAgainstGolden } from 'agent-eval-harness';
199+
import { compareAgainstGolden } from '@reaatech/agent-eval-harness';
200200

201201
const result = compareAgainstGolden(trajectory, goldenTrajectory, {
202202
similarityThreshold: 0.85,
@@ -251,7 +251,7 @@ judge:
251251
4. **Apply calibration** to future judge scores
252252
253253
```typescript
254-
import { calibrate, applyCalibration } from 'agent-eval-harness';
254+
import { calibrate, applyCalibration } from '@reaatech/agent-eval-harness';
255255

256256
await calibrate({
257257
humanLabelsPath: 'calibration/human-labels.jsonl',
@@ -363,7 +363,7 @@ latency:
363363
### Latency Monitoring
364364
365365
```typescript
366-
import { monitorLatency } from 'agent-eval-harness';
366+
import { monitorLatency } from '@reaatech/agent-eval-harness';
367367

368368
const budget = {
369369
per_turn_p99: 5000,
@@ -405,7 +405,7 @@ tool_validation:
405405
### Validation Example
406406
407407
```typescript
408-
import { validateTrajectory, validateSchema } from 'agent-eval-harness';
408+
import { validateTrajectory, validateSchema } from '@reaatech/agent-eval-harness';
409409

410410
const toolSchemas = {
411411
send_reset_email: {

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -187,7 +187,7 @@ describe('MyEvaluator', () => {
187187

188188
```typescript
189189
import { describe, it, expect } from 'vitest';
190-
import { loadFromFile, evaluate } from 'agent-eval-harness';
190+
import { loadFromFile, evaluate } from '@reaatech/agent-eval-harness';
191191

192192
describe('Integration: Load and Evaluate', () => {
193193
it('should load and evaluate trajectory', () => {

Dockerfile

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,28 +1,27 @@
11
# Stage 1: Build
22
FROM node:22-alpine AS builder
33

4-
WORKDIR /app
4+
RUN npm install -g pnpm@10
55

6-
# Copy package files
7-
COPY package.json package-lock.json ./
6+
WORKDIR /app
87

9-
# Install dependencies (full install for build)
10-
RUN npm ci --legacy-peer-deps && npm cache clean --force
8+
COPY package.json pnpm-lock.yaml ./
9+
RUN pnpm install --frozen-lockfile
1110

12-
# Copy source
1311
COPY tsconfig.json ./
1412
COPY src ./src
1513

16-
# Build
17-
RUN npm run build
14+
RUN pnpm build
1815

1916
# Stage 2: Install production deps only
2017
FROM node:22-alpine AS prod-deps
2118

19+
RUN npm install -g pnpm@10
20+
2221
WORKDIR /app
2322

24-
COPY package.json package-lock.json ./
25-
RUN npm ci --legacy-peer-deps --only=production --ignore-scripts && npm cache clean --force
23+
COPY package.json pnpm-lock.yaml ./
24+
RUN pnpm install --prod --frozen-lockfile --ignore-scripts
2625

2726
# Stage 3: Runtime
2827
FROM node:22-alpine AS runtime

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ End-to-end agent evaluation harness for full agent runs. Supports trajectory eva
2222

2323
```bash
2424
# npm
25-
npm install agent-eval-harness
25+
npm install @reaatech/agent-eval-harness
2626

2727
# Or use without installing
2828
npx agent-eval-harness eval trajectories/*.jsonl

0 commit comments

Comments
 (0)