You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-**Visual rendering**: Check `painters/dom/src/features/feature-registry.ts`to find the feature module, then modify it. If no module exists yet, create one (see layout-engine CLAUDE.md). Feed data via `pm-adapter/`
83
84
-**Style resolution**: Modify `style-engine/` — called by pm-adapter during conversion
@@ -111,13 +112,31 @@ Many packages use `.js` files with JSDoc `@typedef` for type definitions (e.g.,
111
112
-`pnpm dev` - Start dev server (from examples/)
112
113
-`pnpm run generate:all` - Generate all derived artifacts (schemas, SDK clients, tool catalogs, reference docs)
113
114
115
+
## AI Eval Suite
116
+
117
+
The `evals/` directory contains a Promptfoo-based evaluation suite for validating AI tool call quality.
118
+
119
+
| Command | What it does | Cost |
120
+
|---------|-------------|------|
121
+
|`pnpm --filter @superdoc-testing/evals run eval`| Run deterministic evals (reading + argument tests) |~$0.30 |
122
+
|`pnpm --filter @superdoc-testing/evals run eval:reading`| Run reading tool tests only |~$0.15 |
123
+
|`pnpm --filter @superdoc-testing/evals run eval:gdpval`| Run GDPval benchmark (Model+SuperDoc vs Model-Only) |~$1-2 |
124
+
|`pnpm --filter @superdoc-testing/evals run eval:view`| Open Promptfoo web UI with results | Free |
125
+
|`pnpm --filter @superdoc-testing/evals run baseline:save <label>`| Save versioned results snapshot | Free |
126
+
127
+
Tool definitions are extracted from `packages/sdk/tools/` via `evals/tools/extract.mjs`. Run `pnpm run generate:all` first if SDK artifacts are missing.
128
+
129
+
Test files are YAML in `evals/tests/`. Each test has a `vars.task` prompt and JavaScript assertions that check tool call structure (Level 1: tool selection + argument accuracy, not execution).
130
+
131
+
The system prompt at `evals/prompts/agent.txt` is a copy of the proven prompt from `examples/eval-demo/lib/agent.ts`. Update both when changing the prompt.
132
+
114
133
## Generated Artifacts
115
134
116
135
These directories are produced by `pnpm run generate:all`:
117
136
118
137
| Directory | In git? | What it contains |
119
138
|-----------|---------|-----------------|
120
-
|`packages/document-api/generated/`| No (gitignored) | Agent tool schemas, JSON schemas, manifest|
139
+
|`packages/document-api/generated/`| No (gitignored) | Agent artifacts, JSON schemas |
121
140
|`apps/cli/generated/`| No (gitignored) | SDK contract JSON exported from CLI metadata |
122
141
|`packages/sdk/langs/node/src/generated/`| No (gitignored) | Node SDK generated client code |
123
142
|`packages/sdk/langs/python/superdoc/generated/`| No (gitignored) | Python SDK generated client code |
0 commit comments