chore: add Evalite for optional LLM and task evals#1538
chore: add Evalite for optional LLM and task evals#1538ieduardogf wants to merge 2 commits intomasterfrom
Conversation
Add evalite@beta, vitest, yarn scripts, sample eval under evals/, and README notes. Made-with: Cursor
|
Size stats
|
|
Deploy preview for mistica-web ready!
Deployed with vercel-action |
|
Accessibility report ℹ️ You can run this locally by executing |
The evals/ directory imports from evalite subpath exports (e.g. evalite/scorers/deterministic) which require a modern moduleResolution. Excluding evals/ from tsconfig.production.json prevents gen-ts-defs from trying to compile dev-only eval files during the library build. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Screenshot tests report ✔️ All passing |
| import {evalite} from 'evalite'; | ||
| import {exactMatch} from 'evalite/scorers/deterministic'; | ||
|
|
||
| evalite('My Eval', { |
There was a problem hiding this comment.
I thought we were going to test this lib against its competitors before pushing a installation+configuration to the repo. I mean, there are other options that could potentially be better, perhaps we should test them before commiting (?)
There was a problem hiding this comment.
Yes I think we should test different tools
There was a problem hiding this comment.
Pull request overview
Adds Evalite-based evaluation tooling (plus Vitest dependency) to the repo so developers can author and run local/CI-friendly scored eval suites under evals/, with minimal initial scaffolding.
Changes:
- Add
evalite(beta) andvitestdev dependencies plusyarn eval/yarn eval:devscripts. - Introduce a minimal example eval file under
evals/and document the workflow in the README. - Exclude
evals/from the production TS declaration build config and vendor new Yarn cache artifacts.
Reviewed changes
Copilot reviewed 4 out of 123 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| tsconfig.production.json | Excludes evals/ from the production declaration emit config. |
| package.json | Adds Evalite/Vitest dev deps and new eval scripts. |
| evals/my-eval.eval.ts | Adds a minimal example Evalite suite using deterministic scoring. |
| README.md | Documents how to run/author evals and notes scorer import guidance. |
| .yarn/cache/why-is-node-running-npm-2.3.0-011cf61a18-58ebbf406e.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/token-types-npm-6.1.2-1f6e70d865-ddade9c99f.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/tinyrainbow-npm-3.1.0-35ba47f8ae-dbb16b4aa5.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/stream-shift-npm-1.0.3-c1c29210c7-a24c0a3f66.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/stackback-npm-0.0.2-73273dc92e-2d4dc4e64e.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/split2-npm-4.2.0-16aa3883ba-05d5410254.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/siginfo-npm-2.0.0-9bbac931f8-8aa5a98640.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/set-cookie-parser-npm-2.7.2-e1a4d1221b-9e1b09e718.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/real-require-npm-0.2.0-7f69dbc7b6-fa060f19f2.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/quick-format-unescaped-npm-4.0.4-7e22c9b7dc-7bc32b9935.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/on-exit-leak-free-npm-2.1.2-0d0c5ad67d-6ce7acdc7b.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/lodash.truncate-npm-4.4.2-bc50fe1663-b463d8a382.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/js-levenshtein-npm-1.1.6-ab883e61a3-409f052a7f.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/is-stream-npm-4.0.1-328fd196cc-cbea3f1fc2.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/get-port-npm-7.2.0-7f76d3f2ea-f8785ccdcc.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/fast-querystring-npm-1.1.2-81dfb4019b-7149f82ee9.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/fast-decode-uri-component-npm-1.0.1-578ba9fecf-427a48fe09.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/duplexify-npm-4.1.3-f0053971e9-9636a02734.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/atomic-sleep-npm-1.0.0-17d8a762a3-b95275afb2.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/abstract-logging-npm-2.0.1-b805b8edfa-6967d15e5a.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/@standard-schema-spec-npm-1.1.0-d3e5ccd2e2-6245ebef5e.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/@lukeed-ms-npm-2.0.2-5e69b6e173-6ae47ed3eb.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/@fastify-forwarded-npm-3.0.1-03d48a4e5e-2cde644dc3.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/@fastify-accept-negotiator-npm-2.0.1-d797505dde-7a2db0bb9f.zip | Yarn offline cache update (transitive dependency). |
| .yarn/cache/@borewit-text-codec-npm-0.2.2-11871252cc-7b4852c38b.zip | Yarn offline cache update (transitive dependency). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - `yarn eval:dev` / `yarn eval`: [Evalite](https://v1.evalite.dev/guides/quickstart/) for task/LLM evals in | ||
| `evals/*.eval.ts` (use `evalite/scorers/deterministic` for scorers like `exactMatch` unless you add the `ai` | ||
| package for LLM-judge scorers from `evalite/scorers`) |
There was a problem hiding this comment.
README says eval files live in evals/*.eval.ts, but the PR description and intended workflow mention evals/**/*.eval.ts (and follow-up suites likely want nested folders). Suggest updating the README glob/path wording to match the intended recursive pattern (or adjust the tooling expectation if it’s actually non-recursive).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 123 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "vite-plugin-entry-shaking": "^0.5.1", | ||
| "vite-plugin-no-bundle": "^4.0.0" | ||
| "vite-plugin-no-bundle": "^4.0.0", | ||
| "vitest": "^4.1.5" |
There was a problem hiding this comment.
we may not need vitest, can't we run evalite with jest_
There was a problem hiding this comment.
maybe we should follow a file convention more consistent with the one we use for tests/stories (-eval.ts instead of .eval.ts)
| import {evalite} from 'evalite'; | ||
| import {exactMatch} from 'evalite/scorers/deterministic'; | ||
|
|
||
| evalite('My Eval', { |
There was a problem hiding this comment.
Yes I think we should test different tools
| "eval:dev": "evalite watch", | ||
| "eval": "evalite" |
There was a problem hiding this comment.
I think we only need one eval script, If you want to run it with watch you can just run yarn eval watch
| "eslint": "^8.57.0", | ||
| "eslint-plugin-mistica-local-rules": "workspace:*", | ||
| "eslint-plugin-storybook": "^10.2.8", | ||
| "evalite": "beta", |
There was a problem hiding this comment.
there isn't an stable release yet?
There was a problem hiding this comment.
I got confused by this too. They have a stable version but it is kinda stale and they are working actively in their beta 1.0 version. anyways the beta is quite old too (2 months)
| "examples" | ||
| "examples", | ||
| "evals" | ||
| ] |
There was a problem hiding this comment.
You should probably exclude evals folder from the published mistica code too
| "eslint": "^8.57.0", | ||
| "eslint-plugin-mistica-local-rules": "workspace:*", | ||
| "eslint-plugin-storybook": "^10.2.8", | ||
| "evalite": "beta", |
There was a problem hiding this comment.
I got confused by this too. They have a stable version but it is kinda stale and they are working actively in their beta 1.0 version. anyways the beta is quite old too (2 months)
Summary
Adds Evalite (
evalite@beta) and Vitest as dev dependencies so we can run scored evaluations (tasks, prompts, or LLM outputs) with a local UI and trace storage.This PR introduces the plumbing only (dependencies, scripts, sample eval, README). Follow-up PRs are expected to add new eval suites by dropping additional
*.eval.tsfiles underevals/(or extending existing ones), without further tooling changes.How to use
Watch mode (development) —
yarn eval:devRuns Evalite in watch mode, re-runs when eval files change, and serves the results UI (default http://localhost:3006 per Evalite docs).
Single run (e.g. CI or quick check) —
yarn evalRuns all evals once and exits with a non-zero status if scores are below the threshold.
Authoring — Add or edit files matching
evals/**/*.eval.ts. Seeevals/my-eval.eval.tsfor a minimal example usingevalite()and deterministic scorers.Scorers — For string-only scorers (e.g.
exactMatch), import fromevalite/scorers/deterministic. The mainevalite/scorersentry pulls in LLM-judge scorers that need the optionalaipeer dependency.Docs
Ref: N/A