chore: add Evalite for optional LLM and task evals by ieduardogf · Pull Request #1538 · Telefonica/mistica-web

ieduardogf · 2026-04-21T16:26:18Z

Summary

Adds Evalite (evalite@beta) and Vitest as dev dependencies so we can run scored evaluations (tasks, prompts, or LLM outputs) with a local UI and trace storage.

This PR introduces the plumbing only (dependencies, scripts, sample eval, README). Follow-up PRs are expected to add new eval suites by dropping additional *.eval.ts files under evals/ (or extending existing ones), without further tooling changes.

How to use

Watch mode (development) — yarn eval:dev
Runs Evalite in watch mode, re-runs when eval files change, and serves the results UI (default http://localhost:3006 per Evalite docs).
Single run (e.g. CI or quick check) — yarn eval
Runs all evals once and exits with a non-zero status if scores are below the threshold.
Authoring — Add or edit files matching evals/**/*.eval.ts. See evals/my-eval.eval.ts for a minimal example using evalite() and deterministic scorers.
Scorers — For string-only scorers (e.g. exactMatch), import from evalite/scorers/deterministic. The main evalite/scorers entry pulls in LLM-judge scorers that need the optional ai peer dependency.

Docs

README Development section lists the new scripts and links the Evalite quickstart.

Ref: N/A

Add evalite@beta, vitest, yarn scripts, sample eval under evals/, and README notes. Made-with: Cursor

github-actions · 2026-04-21T16:29:17Z

Size stats

	master	this branch	diff
Total JS	16.1 MB	16.1 MB	0 B
JS without icons	2.01 MB	2.01 MB	0 B
Lib overhead	92.5 kB	92.5 kB	0 B
Lib overhead (gzip)	19.9 kB	19.9 kB	0 B

github-actions · 2026-04-21T16:32:03Z

Deploy preview for mistica-web ready!

Project:	`mistica-web`
Status:	✅ Deploy successful!
Preview URL:	https://mistica-ewew51csv-flows-projects-65bb050e.vercel.app
Latest Commit:	`cb9e417`
Inspect:	View deployment

Deployed with vercel-action

github-actions · 2026-04-21T16:35:28Z

Accessibility report
✔️ No issues found

ℹ️ You can run this locally by executing yarn audit-accessibility.

The evals/ directory imports from evalite subpath exports (e.g. evalite/scorers/deterministic) which require a modern moduleResolution. Excluding evals/ from tsconfig.production.json prevents gen-ts-defs from trying to compile dev-only eval files during the library build. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

github-actions · 2026-04-21T20:13:40Z

Screenshot tests report

✔️ All passing

Marcosld · 2026-04-22T07:13:41Z

+import {evalite} from 'evalite';
+import {exactMatch} from 'evalite/scorers/deterministic';
+
+evalite('My Eval', {


I thought we were going to test this lib against its competitors before pushing a installation+configuration to the repo. I mean, there are other options that could potentially be better, perhaps we should test them before commiting (?)

Yes I think we should test different tools

Copilot

Pull request overview

Adds Evalite-based evaluation tooling (plus Vitest dependency) to the repo so developers can author and run local/CI-friendly scored eval suites under evals/, with minimal initial scaffolding.

Changes:

Add evalite (beta) and vitest dev dependencies plus yarn eval / yarn eval:dev scripts.
Introduce a minimal example eval file under evals/ and document the workflow in the README.
Exclude evals/ from the production TS declaration build config and vendor new Yarn cache artifacts.

Reviewed changes

Copilot reviewed 4 out of 123 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
tsconfig.production.json	Excludes `evals/` from the production declaration emit config.
package.json	Adds Evalite/Vitest dev deps and new `eval` scripts.
evals/my-eval.eval.ts	Adds a minimal example Evalite suite using deterministic scoring.
README.md	Documents how to run/author evals and notes scorer import guidance.
.yarn/cache/why-is-node-running-npm-2.3.0-011cf61a18-58ebbf406e.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/token-types-npm-6.1.2-1f6e70d865-ddade9c99f.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/tinyrainbow-npm-3.1.0-35ba47f8ae-dbb16b4aa5.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/stream-shift-npm-1.0.3-c1c29210c7-a24c0a3f66.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/stackback-npm-0.0.2-73273dc92e-2d4dc4e64e.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/split2-npm-4.2.0-16aa3883ba-05d5410254.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/siginfo-npm-2.0.0-9bbac931f8-8aa5a98640.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/set-cookie-parser-npm-2.7.2-e1a4d1221b-9e1b09e718.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/real-require-npm-0.2.0-7f69dbc7b6-fa060f19f2.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/quick-format-unescaped-npm-4.0.4-7e22c9b7dc-7bc32b9935.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/on-exit-leak-free-npm-2.1.2-0d0c5ad67d-6ce7acdc7b.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/lodash.truncate-npm-4.4.2-bc50fe1663-b463d8a382.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/js-levenshtein-npm-1.1.6-ab883e61a3-409f052a7f.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/is-stream-npm-4.0.1-328fd196cc-cbea3f1fc2.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/get-port-npm-7.2.0-7f76d3f2ea-f8785ccdcc.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/fast-querystring-npm-1.1.2-81dfb4019b-7149f82ee9.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/fast-decode-uri-component-npm-1.0.1-578ba9fecf-427a48fe09.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/duplexify-npm-4.1.3-f0053971e9-9636a02734.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/atomic-sleep-npm-1.0.0-17d8a762a3-b95275afb2.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/abstract-logging-npm-2.0.1-b805b8edfa-6967d15e5a.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/@standard-schema-spec-npm-1.1.0-d3e5ccd2e2-6245ebef5e.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/@lukeed-ms-npm-2.0.2-5e69b6e173-6ae47ed3eb.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/@fastify-forwarded-npm-3.0.1-03d48a4e5e-2cde644dc3.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/@fastify-accept-negotiator-npm-2.0.1-d797505dde-7a2db0bb9f.zip	Yarn offline cache update (transitive dependency).
.yarn/cache/@borewit-text-codec-npm-0.2.2-11871252cc-7b4852c38b.zip	Yarn offline cache update (transitive dependency).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-22T11:39:15Z

+- `yarn eval:dev` / `yarn eval`: [Evalite](https://v1.evalite.dev/guides/quickstart/) for task/LLM evals in
+  `evals/*.eval.ts` (use `evalite/scorers/deterministic` for scorers like `exactMatch` unless you add the `ai`
+  package for LLM-judge scorers from `evalite/scorers`)


README says eval files live in evals/*.eval.ts, but the PR description and intended workflow mention evals/**/*.eval.ts (and follow-up suites likely want nested folders). Suggest updating the README glob/path wording to match the intended recursive pattern (or adjust the tooling expectation if it’s actually non-recursive).

Copilot

Pull request overview

Copilot reviewed 4 out of 123 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

atabel · 2026-04-24T06:56:16Z

        "vite-plugin-entry-shaking": "^0.5.1",
-        "vite-plugin-no-bundle": "^4.0.0"
+        "vite-plugin-no-bundle": "^4.0.0",
+        "vitest": "^4.1.5"


we may not need vitest, can't we run evalite with jest_

atabel · 2026-04-24T06:58:46Z

maybe we should follow a file convention more consistent with the one we use for tests/stories (-eval.ts instead of .eval.ts)

atabel · 2026-04-24T06:59:24Z

+import {evalite} from 'evalite';
+import {exactMatch} from 'evalite/scorers/deterministic';
+
+evalite('My Eval', {


Yes I think we should test different tools

atabel · 2026-04-24T07:00:40Z

+        "eval:dev": "evalite watch",
+        "eval": "evalite"


I think we only need one eval script, If you want to run it with watch you can just run yarn eval watch

atabel · 2026-04-24T07:01:03Z

        "eslint": "^8.57.0",
        "eslint-plugin-mistica-local-rules": "workspace:*",
        "eslint-plugin-storybook": "^10.2.8",
+        "evalite": "beta",


there isn't an stable release yet?

I got confused by this too. They have a stable version but it is kinda stale and they are working actively in their beta 1.0 version. anyways the beta is quite old too (2 months)

Marcosld · 2026-04-23T07:21:08Z

-        "examples"
+        "examples",
+        "evals"
    ]


You should probably exclude evals folder from the published mistica code too

Marcosld · 2026-04-24T12:49:08Z

        "eslint": "^8.57.0",
        "eslint-plugin-mistica-local-rules": "workspace:*",
        "eslint-plugin-storybook": "^10.2.8",
+        "evalite": "beta",


I got confused by this too. They have a stable version but it is kinda stale and they are working actively in their beta 1.0 version. anyways the beta is quite old too (2 months)

chore: add Evalite for optional LLM and task evals

ac45995

Add evalite@beta, vitest, yarn scripts, sample eval under evals/, and README notes. Made-with: Cursor

ieduardogf added the AI AI Generated label Apr 21, 2026

ieduardogf requested review from Marcosld, atabel, aweell and yceballost and removed request for aweell April 21, 2026 16:26

ieduardogf added the Mistica Intelligence label Apr 21, 2026

Marcosld reviewed Apr 22, 2026

View reviewed changes

ieduardogf requested review from Copilot April 22, 2026 11:34

Copilot started reviewing on behalf of ieduardogf April 22, 2026 11:35 View session

Copilot AI reviewed Apr 22, 2026

View reviewed changes

ieduardogf marked this pull request as draft April 23, 2026 08:17

atabel reviewed Apr 24, 2026

View reviewed changes

Marcosld reviewed Apr 24, 2026

View reviewed changes

Uh oh!

Conversation

ieduardogf commented Apr 21, 2026

Summary

How to use

Docs

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions Bot commented Apr 21, 2026 •

edited

Loading

github-actions Bot commented Apr 21, 2026 •

edited

Loading

github-actions Bot commented Apr 21, 2026 •

edited

Loading