tangle-network
diff --git a/‎CHANGELOG.md‎
Lines changed: 30 additions & 5 deletions b/‎CHANGELOG.md‎
Lines changed: 30 additions & 5 deletions
diff --git a/‎LICENSE‎
Lines changed: 21 additions & 0 deletions b/‎LICENSE‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 9 additions & 6 deletions b/‎README.md‎
Lines changed: 9 additions & 6 deletions
diff --git a/‎clients/python/README.md‎
Lines changed: 3 additions & 2 deletions b/‎clients/python/README.md‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎clients/python/pyproject.toml‎
Lines changed: 1 addition & 1 deletion b/‎clients/python/pyproject.toml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎clients/python/src/tangle_agent_eval/__init__.py‎
Lines changed: 1 addition & 1 deletion b/‎clients/python/src/tangle_agent_eval/__init__.py‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/concepts.md‎
Lines changed: 4 additions & 4 deletions b/‎docs/concepts.md‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/knowledge-readiness.md‎
Lines changed: 2 additions & 2 deletions b/‎docs/knowledge-readiness.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/wire-protocol.md‎
Lines changed: 3 additions & 3 deletions b/‎docs/wire-protocol.md‎
Lines changed: 3 additions & 3 deletions
diff --git a/‎examples/benchmarks/README.md‎
Lines changed: 4 additions & 11 deletions b/‎examples/benchmarks/README.md‎
Lines changed: 4 additions & 11 deletions
@@ -1,5 +1,31 @@
 # Changelog
 
+## 0.20.9 — release hygiene and runtime failure fixes
+
+### Fixed
+
+- Initial `runAgentControlLoop` observe/validate failures now report the
+  actual observe/validate error even when trace start/end emission also fails.
+- Knowledge readiness recommended actions now honor non-blocking gap
+  acquisition modes such as `ask_user`, `search_web`, `query_connector`, and
+  `inspect_repo`.
+- Npm builds now generate `dist/openapi.json`, and the package exports
+  `@tangle-network/agent-eval/openapi.json`.
+- Npm and Python client versions are locked at `0.20.9`.
+
+### Added
+
+- `CallbackResearcher`, a concrete callback-backed implementation of the
+  stable `Researcher` interface for scripts, tests, and small integrations.
+- Public `@tangle-network/agent-eval/benchmarks` subpath for the supported
+  routing benchmark surface.
+- Root MIT `LICENSE`.
+
+### Changed
+
+- Raw TypeScript examples are no longer included in the npm package; they remain
+  repository examples to read, copy, and adapt.
+
 ## 0.20.2 — freshness-aware knowledge readiness
 
 ### Added
@@ -107,9 +133,9 @@
 - `runProposeReviewAsControlLoop` accepts a caller-provided verifier failure
   mapper for domain-specific failure classes.
 
-## 0.17.0 — surface cleanup + SKILL pitfalls
+## 0.17.0 — surface cleanup + usage-guidance pitfalls
 
-This release tightens the public benchmark surface and lands the SKILL.md guidance that the v0.15 dispatch couldn't write.
+This release tightens the public benchmark surface and lands internal usage guidance that the v0.15 dispatch couldn't write.
 
 ### Moved
 
@@ -123,7 +149,7 @@ These are reference implementations of `BenchmarkAdapter`, not core surface. Con
 ### Added
 
 - `examples/benchmarks/README.md` documents how to use, copy, and extend the example wrappers.
-- `.claude/skills/agent-eval/SKILL.md` gains a "Production-rigor primitives (v0.16+)" section and a "Pitfalls" section with 13 footgun directives covering the v0.16 primitives. (Couldn't be written in v0.15 due to harness sandbox; landed in v0.17.)
+- Internal agent-eval usage guidance gains production-rigor and pitfalls sections covering the v0.16 primitives.
 
 ### Migration
 
@@ -218,8 +244,7 @@ optimization with held-out promotion gates.
   are additive.
 - All new public symbols carry JSDoc.
 - 87 new tests across 7 new test files. 571 total tests pass.
-- See `.claude/skills/agent-eval/SKILL.md` for usage directives and
-  pitfalls; `## Pitfalls` section added in this release.
+- See the package docs for usage directives and pitfalls.
 
 ## 0.11.0
 
 
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2026 Tangle Network
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -55,9 +55,9 @@ Package responsibilities:
   optimization, reporting.
 - Product app: domain state, tools, credentials, UI, storage, deployment, model
   gateway.
-- `agent-runtime`: production agent-loop/session runtime.
-- `agent-knowledge`: evidence stores, claim/page synthesis, retrieval, knowledge
-  readiness implementation.
+- `@tangle-network/agent-runtime`: production agent-loop/session runtime.
+- `@tangle-network/agent-knowledge`: evidence stores, claim/page synthesis,
+  retrieval, knowledge readiness implementation.
 
 ## Install
 
@@ -72,10 +72,12 @@ npm i -g @tangle-network/agent-eval
 agent-eval serve --port 5005
 ```
 
-Python client:
+Python client source lives in `clients/python`. Until the PyPI package is
+published, install it from the repo:
 
 ```sh
-pip install tangle-agent-eval
+cd clients/python
+pip install -e .
 ```
 
 ## Core Primitives
@@ -98,7 +100,8 @@ pip install tangle-agent-eval
 
 ## Examples
 
-Runnable examples live in [`examples/`](./examples):
+Runnable examples live in the repository's [`examples/`](./examples)
+directory. They are not part of the published npm package.
 
 - [`examples/same-sandbox-harness`](./examples/same-sandbox-harness) - run
   multiple eval passes against the same workspace.
 
@@ -27,7 +27,8 @@ That's the entire surface for content judging.
 ## Install
 
 ```sh
-pip install tangle-agent-eval
+cd clients/python
+pip install -e .
 ```
 
 To use it, **one of**:
@@ -140,7 +141,7 @@ All errors carry `.code` and `.details` (the structured payload from the server)
 
 ## Versioning
 
-This package is **version-locked** to the npm package. `tangle-agent-eval==0.19.0` ↔ `@tangle-network/agent-eval@0.19.0`. The two ship from the same git tag in the same CI workflow; if either fails to publish, neither does. Mismatched versions are a build-time error.
+This package is **version-locked** to the npm package. `tangle-agent-eval==0.20.9` ↔ `@tangle-network/agent-eval@0.20.9`. The two ship from the same git tag in the same CI workflow; if either fails to publish, neither does. Mismatched versions are a build-time error.
 
 `wire_version` is separate. It bumps only on breaking schema changes. Package versions can differ across releases as long as `wire_version` is the same.
 
 
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
 
 [project]
 name = "tangle-agent-eval"
-version = "0.19.0"
+version = "0.20.9"
 description = "Python client for @tangle-network/agent-eval — judge content against rubrics over HTTP or stdio RPC."
 readme = "README.md"
 requires-python = ">=3.10"
 
@@ -39,7 +39,7 @@
     VersionResponse,
 )
 
-__version__ = "0.19.0"
+__version__ = "0.20.9"
 
 __all__ = [
     "Client",
 
@@ -43,7 +43,7 @@ that can seed memory, replay scenarios, and optimization.
 | **Trace store** | The append-only log of every span/event during a run. Replay = read this back. |
 | **Composite score** | A 0..1 number combining all dimensions. The single number you gate on. |
 | **Rubric version** | A stable hash of the rubric. Scores from different rubric versions are not comparable. |
-| **Muffled gate** | A check that should fail loud but silently passes (e.g. `command || true`). The most expensive bug class in this codebase — see SKILL.md. |
+| **Muffled gate** | A check that should fail loud but silently passes (e.g. `command || true`). The most expensive bug class in this codebase. |
 
 ## The feedback trajectory loop
 
@@ -119,7 +119,7 @@ report.blendedScore   // 0..1 — weighted aggregate
 report.layers         // per-layer status, findings, duration
 ```
 
-Two rules that will save you bugs (paid for in real incidents — see SKILL.md):
+Two rules that will save you bugs:
 
 1. **Run both gates.** Build gates catch code that doesn't compile; structural assertions catch missing files. Run both unconditionally — they catch orthogonal failures.
 
@@ -150,6 +150,6 @@ You don't need to build the trace tree by hand. `BuilderSession` does it for you
 - **Just want to score a string against a rubric?** → [wire-protocol.md](./wire-protocol.md) — HTTP/RPC interface, pluggable from any language.
 - **Need a reusable driver/worker/evaluator loop?** → [control-runtime.md](./control-runtime.md) — generic runtime plus coding, browser, computer-use, and research integration patterns.
 - **Want review feedback to become eval/optimization data?** → [feedback-trajectories.md](./feedback-trajectories.md) — turn feedback into datasets, optimizer rows, and preference memory.
-- **Building a code-generator eval?** → SKILL.md §Minimal working path — the `BuilderSession` recipe.
-- **Multi-layer verifier?** → SKILL.md §Verification pipeline.
+- **Building a code-generator eval?** → Start with `BuilderSession`, `SandboxHarness`, and `MultiLayerVerifier`.
+- **Multi-layer verifier?** → Use [control-runtime.md](./control-runtime.md) and `MultiLayerVerifier` for ordered gates with dependencies.
 - **Adding a new judge or rubric?** → `src/wire/rubrics.ts` for the cross-language path; `src/anti-slop.ts` and `src/judges.ts` for the in-process path.
@@ -2,8 +2,8 @@
 
 `agent-eval` owns the contract for deciding whether an agent had enough
 task-world context to run. It does not own web crawling, connector storage, wiki
-pages, credentials, or product policy. Those live in `agent-knowledge` and
-product repos.
+pages, credentials, or product policy. Those live in
+`@tangle-network/agent-knowledge` and product repos.
 
 The core loop is:
 
 
@@ -96,13 +96,13 @@ GET /v1/version
 ```json
 {
   "package": "@tangle-network/agent-eval",
-  "version": "0.19.0",
+  "version": "0.20.9",
   "wireVersion": "1.0.0",
   "apiSurface": ["judge", "listRubrics", "version"]
 }
 ```
 
-`version` matches the npm/PyPI package version. `wireVersion` bumps independently — only on breaking request/response schema changes. Package versions can differ across releases as long as `wireVersion` matches.
+`version` matches the package version. `wireVersion` bumps independently — only on breaking request/response schema changes. Package versions can differ across releases as long as `wireVersion` matches.
 
 ### `GET /healthz` — liveness
 
@@ -176,7 +176,7 @@ Each invocation is one process — Node startup adds ~500 ms. For more than a fe
 
 ## Clients
 
-- **Python**: [`tangle-agent-eval`](../clients/python/README.md) on PyPI. Auto-detects HTTP, falls back to subprocess. Version-locked to npm.
+- **Python**: source lives in [`clients/python`](../clients/python/README.md). Auto-detects HTTP, falls back to subprocess. Version-locked to npm.
 - **TypeScript**: import directly from `@tangle-network/agent-eval` (no wire round-trip needed in-process).
 - **Rust / Go / Other**: generate from `dist/openapi.json`. PRs welcome to add an officially-maintained client.
 
 
@@ -11,17 +11,10 @@ The novel benchmark we ship and own — the synthetic routing task — lives in
 
 ## Using these wrappers
 
-Two paths.
-
-**Option A — read and inline.** Copy the wrapper file into your project. Replace the import paths from `../../../src/benchmarks/types` and `../../../src/run-record` with `@tangle-network/agent-eval`. Done.
-
-**Option B — import from agent-eval source.** If your project sits in this monorepo (or you've cloned the repo), import directly:
-
-```ts
-import * as gsm8k from '@tangle-network/agent-eval/examples/benchmarks/gsm8k'
-```
-
-This requires adding `examples/**/*.ts` to your TypeScript paths. Easier to just copy.
+Read and inline them. Copy the wrapper file into your project, then replace
+imports such as `../../../src/benchmarks/types` and `../../../src/run-record`
+with `@tangle-network/agent-eval`. These examples are repository source, not
+published npm subpaths.
 
 ## What every BenchmarkAdapter exports
Original file line number	Diff line number	Diff line change
`@@ -39,7 +39,7 @@`
`39`	`39`	`VersionResponse,`
`40`	`40`	`)`
`41`	`41`
`42`		`-__version__ = "0.19.0"`
	`42`	`+__version__ = "0.20.9"`
`43`	`43`
`44`	`44`	`__all__ = [`
`45`	`45`	`"Client",`