Eval: add batch runner and aggregate reporting

## Problem

There's no way to run all 7 eval tasks at once. Each must be run individually, and there's no aggregate pass/fail summary across tasks.

Currently available:
- `npm run eval:greeting` (individual)
- `npm run eval:code-style` (individual)
- etc.

Missing:
- `npm run eval:all` — run all tasks, produce a summary table
- Aggregate result file combining all task outcomes

## Suggestion

Add a batch runner that:
1. Discovers all task configs from `evals/tasks/*.json`
2. Runs each sequentially (or with configurable parallelism)
3. Produces a summary table (task, mode, pass/fail, cost, duration)
4. Saves an aggregate result JSON
5. Exits with non-zero code if any task fails

Could be a new `evals/eval-all.ts` or a `--all` flag on the existing harness.

## Files

- `evals/eval.ts`
- `package.json` (new script needed)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval: add batch runner and aggregate reporting #52

Problem

Suggestion

Files

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Eval: add batch runner and aggregate reporting #52

Description

Problem

Suggestion

Files

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions