docs: align eval guidance with non-pinned Tessl default

martinfrancois · martinfrancois · commit e5cf5448ae85 · 2026-06-24T03:53:02.000+02:00
diff --git a/docs/agents/evals.md b/docs/agents/evals.md
@@ -126,6 +126,11 @@ benchmark claims, or scoring rules.
   keys, replay logs, and one-off run details.
 - Keep hosted eval usage minimal while preserving confidence:
   - Use `scripts/run_eval_suite.sh` so variants match suite purpose and runs use the plugin context.
+  - Use Tessl's default solver unless the account has model-selection entitlements and you intentionally
+    want a representative frontier check. The hosted default is intentionally not pinned by policy.
+    See Tessl model-selection notes if available:
+    - https://docs.tessl.io/changelog
+    - https://tessl.io/blog/why-were-changing-our-default-eval-model/
   - Main and reference scenarios run with both variants.
   - Regression scenarios run with context only by default. Run regression without-context only when
     intentionally checking whether a scenario should move back to reference.
diff --git a/docs/agents/workflow.md b/docs/agents/workflow.md
@@ -31,9 +31,16 @@ release-readiness.
   patch-bump dry-run for PR safety. For skill, eval, package, or release changes that should publish
   a new version, let Release Please bump the version before the exact-version publish check.
 
-- For skill behavior or eval changes, run hosted evals with Sonnet 4.6, but start with the smallest
-  useful set to conserve Tessl daily rate-limit budget. Use `scripts/run_eval_suite.sh` so the run
-  uses plugin context and the right variant policy.
+- For skill behavior or eval changes, run hosted evals with Tessl's default solver, but start with the
+  smallest useful set to conserve Tessl daily rate-limit budget. Use `scripts/run_eval_suite.sh` so the
+  run uses plugin context and the right variant policy.
+
+  Model-selection note:
+  Do not pin Sonnet in default commands; the script runs with the current Tessl default solver.
+  If model-selection is available, Sonnet 4.6 or better is a good representative check. See Tessl
+  model-selection and default-model discussions:
+  - https://docs.tessl.io/changelog
+  - https://tessl.io/blog/why-were-changing-our-default-eval-model/
 
   If any eval scenario's `task.md`, `criteria.json`, or `capability.txt` changed, run that exact
   scenario before finishing the PR. A pure move between `evals/`, `evals-reference/`, and