Eval: xlsx evals use soft markers that could match incidentally

## Problem

The xlsx eval tasks check for generic terms as expected output markers:

| Task | Marker | Risk |
|------|--------|------|
| xlsx-openpyxl | \`"openpyxl"\` | Common Python library name |
| xlsx-formulas | \`"formula"\` | Generic Excel term |
| xlsx-financial | \`"blue"\` | Common color word |
| xlsx-verify | \`"recalc.py"\` | Slightly more specific but still a tool name |

Compare with synthetic skills that use unambiguous markers:
- \`SKILLJACK_GREETING_SUCCESS\`
- \`SKILLJACK_CODE_FORMATTED\`
- \`SKILLJACK_TEMPLATE_LOADED\`

A model could mention "openpyxl", "formula", or "blue" in a response about Excel without having actually followed the xlsx skill instructions. The word "blue" in particular is highly likely to appear incidentally.

## Suggestion

Since the xlsx skill is a real production skill (not synthetic), injecting artificial markers isn't ideal. Options:
1. **Use more specific compound markers** — e.g., check for \`"blue text"\` AND \`"openpyxl"\` together
2. **Use regex patterns** — the EvalConfig already supports RegExp
3. **Accept the trade-off** and document that xlsx evals have weaker assertion guarantees than synthetic skill evals

## Files

- \`evals/tasks/xlsx-*.json\` (4 files)
- \`evals/lib/eval-checker.ts\` (already supports RegExp in expectedOutput)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval: xlsx evals use soft markers that could match incidentally #55

Problem

Suggestion

Files

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Task	Marker	Risk
xlsx-openpyxl	`"openpyxl"`	Common Python library name
xlsx-formulas	`"formula"`	Generic Excel term
xlsx-financial	`"blue"`	Common color word
xlsx-verify	`"recalc.py"`	Slightly more specific but still a tool name

Eval: xlsx evals use soft markers that could match incidentally #55

Description

Problem

Suggestion

Files

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions