Skip to content

feat(skillet): Skill generated by skillet's new agent-orchestrator pipeline#143

Draft
gricha wants to merge 1 commit into
mainfrom
gricha/skillet-skill-from-new-pipeline
Draft

feat(skillet): Skill generated by skillet's new agent-orchestrator pipeline#143
gricha wants to merge 1 commit into
mainfrom
gricha/skillet-skill-from-new-pipeline

Conversation

@gricha
Copy link
Copy Markdown
Member

@gricha gricha commented May 5, 2026

This PR adds the skillet skill — the meta-skill that routes a
user to the right skillet CLI subcommand — produced clean-room
by skillet's new bundled-agent pipeline.

This PR is just the skill artifact, isolated for review. The
pipeline that produced it (the rewrite that replaces skillet's
multi-phase TypeScript pipeline with a small set of bundled
authoring agents) lives in getsentry/skillet#2.

Generation stats

  • Input: spec.yaml (12 behaviors/must_nots, hand-curated)
  • Pipeline: skill-writer + eval-writer in parallel, then
    skill-validator + evals-validator in parallel
  • Wall-clock: 175 seconds end-to-end
  • Tool calls: 6 (skill-writer) + 21 (eval-writer)
  • Validators: both ok=true, 0 findings on first pass — no
    re-passes needed

Files

  • SKILL.md (118 lines) — router for skillet CLI commands;
    imperative voice, decision table for command selection,
    Don't section for must_nots
  • spec.yaml — source of truth (skillet-managed; this is what
    the agents read)
  • evals/_judges.ts — 13 canonical named judges
  • evals/<id>.eval.ts — one file per spec entry, 14 cases
    total. Uses upstream vitest-evals + skillet's harness via
    @sentry/skillet/evals

Eval results against the produced SKILL.md

./dist/cli.js eval skills/skillet:

  • First run: 14/14 passed
  • Second run: 13/14 (one judge variance — UsesScopedPackageJudge
    graded a clarifying-questions response 0.0 because the scoped
    package mention came at the end, not the start; the agent
    technically did the right thing, the judge was overstrict)

Known shape vs. what skillet's legacy pipeline produced

This branch's diff vs. the legacy-produced version (on
getsentry/skillet main) is ~263 insertions / ~286 deletions.
Tighter prose, same describeEval ids, same case shapes.

Two cases lost workspace fixtures in favor of judge-only
assertions — a slight regression to address by tightening
agents/eval-writer/references/eval-contract.md in the
skillet repo.

Reviewing this PR

You're looking at the skill artifact end-state. To replicate
the generation:

git clone https://github.com/getsentry/skillet
cd skillet
git checkout experimental/agent-orchestration
npm install && npm run build
rm -rf skills/skillet/{SKILL.md,evals}
./dist/cli.js improve skills/skillet

That regenerates the same files (modulo agent variance).

This is the `skillet` skill — the meta-skill that routes a user's
intent to the right `skillet` CLI subcommand — produced clean-room
by skillet's new bundled-agent pipeline (skill-writer +
eval-writer + skill-validator + evals-validator).

Generated from `spec.yaml` in 175 seconds. No re-passes needed:
both validators returned ok=true with 0 findings on first pass.
14 eval cases across 12 spec entries; the regenerated evals run
13-14/14 against the produced SKILL.md.

This PR is *just the skill artifact* — the pipeline that
produced it lives in getsentry/skillet#2.

Files:
- SKILL.md (118 lines) — router for skillet CLI commands
- spec.yaml — 12 behaviors/must_nots, source of truth
- evals/_judges.ts — 13 canonical judges
- evals/<id>.eval.ts — one per spec entry, all using the
  vitest-evals + @sentry/skillet/evals harness shape

Co-Authored-By: Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant