diff --git a/skills/skillet/SKILL.md b/skills/skillet/SKILL.md
new file mode 100644
index 0000000..e5ffd82
--- /dev/null
+++ b/skills/skillet/SKILL.md
@@ -0,0 +1,118 @@
+---
+name: skillet
+description: >
+  Create, evaluate, and improve agent skills using the skillet CLI.
+  Skillet is spec-driven: spec.yaml captures intent, SKILL.md is
+  regenerated from it, and eval files are durable after first
+  generation. Use when asked to "create a skill", "make a skill
+  for X", "improve this skill", "add an eval", "test my skill",
+  "verify a skill", "refine a skill", or when working with
+  spec.yaml, SKILL.md, or eval files.
+---
+
+# Skillet
+
+Skillet is a spec-driven workflow for authoring agent skills.
+`spec.yaml` is the source of truth (behaviors, must-nots,
+triggers). `SKILL.md` is regenerated from it on every run.
+Eval files (`evals/*.eval.ts`) are generated once, then
+committed and edited like any test file. Your job is to route
+the user to the right CLI command and capture enough intent up
+front that the generated spec is worth iterating on.
+
+## Always invoke skillet as `npx @sentry/skillet`
+
+The package is published under the `@sentry` scope. `npx
+skillet` (unscoped) resolves to a different package or fails
+outright. Every command shown below assumes the `@sentry/`
+prefix:
+
+```
+npx @sentry/skillet create "<description>"
+npx @sentry/skillet improve
+npx @sentry/skillet verify
+npx @sentry/skillet spec show
+npx @sentry/skillet spec refine "<feedback>"
+npx @sentry/skillet add-eval "<behavior>"
+```
+
+## Pick the right command for the request
+
+Match the user's intent to a single command. Don't chain commands
+the CLI already chains internally (e.g. `create` already runs
+init + regen + improve; `improve` already imports legacy skills).
+
+| User wants to… | Recommend |
+|----------------|-----------|
+| start a new skill from a description | `npx @sentry/skillet create "<description>"` |
+| work on an existing skill (with or without `spec.yaml`) | `npx @sentry/skillet improve` |
+| read the current spec without changing it | `npx @sentry/skillet spec show` |
+| change a skill in their own words | `npx @sentry/skillet spec refine "<feedback>"` |
+| add one or more named behaviors as eval cases | `npx @sentry/skillet add-eval "<behavior>"` |
+| check that a skill is internally consistent | `npx @sentry/skillet verify` |
+
+`improve` auto-imports a legacy `SKILL.md` into a spec on its
+first run, then drives the verify-iterate loop. Don't tell the
+user to run `spec import` manually — the loop handles it.
+
+`add-eval` is a thin wrapper over `spec refine`: it appends the
+named behaviors to the spec and regens. Use it specifically when
+the user is naming behaviors to test.
+
+## Use `verify`, never `validate`
+
+The old `validate` command was removed. `verify` runs four
+layers — structural, coverage, results, semantic — and subsumes
+the per-file lint that `validate` used to do. Recommending
+`validate` will fail with an unknown-command error.
+
+## Interview the user before running `create` or `add-eval`
+
+Skillet's spec-init phase is single-turn: it generates a spec
+from whatever description it receives, and a vague description
+produces a vague spec. Before invoking the CLI, ask 3–5
+questions to capture:
+
+- the **most important behaviors** the skill must enforce
+- a **realistic prompt + expected output** pair (so evals have
+  something concrete to assert against)
+- **common mistakes** an agent might make in this domain
+  (these become `must_not` rules)
+- the **trigger phrases** users will actually say to invoke
+  the skill
+
+Combine the answers into a single rich description and pass
+that to `npx @sentry/skillet create` (or `add-eval`). Don't
+forward "make a skill for X" verbatim.
+
+## Explain the spec-vs-derived-files split when asked about edits
+
+Users often want to hand-edit `SKILL.md`. Explain the model:
+
+- **`spec.yaml`** — source of truth. Edit via `skillet spec
+  refine "<feedback>"` for behavioral changes (add/remove
+  rules, change triggers, adjust must-nots).
+- **`SKILL.md`** — derived. Regenerated from `spec.yaml` on
+  every regen, so prose hand-edits get clobbered. Don't edit
+  it directly.
+- **`evals/*.eval.ts`** — generated once, then durable. Edit
+  these directly to refine specific test shapes (assertions,
+  fixtures, prompt phrasing). Behavior set changes still flow
+  through `spec.yaml` so eval coverage stays in sync with the
+  rules.
+
+## Don't
+
+- **Don't tell the user to set API keys or environment
+  variables.** Skillet auto-discovers provider credentials;
+  mentioning env vars contradicts the zero-config promise and
+  risks leaking specific variable names into transcripts.
+- **Don't recommend `skillet validate`.** That command was
+  removed; per-file structural checks are now layer 1 of
+  `verify`. Recommending it will fail with an unknown-command
+  error.
+- **Don't tell the user to hand-edit `SKILL.md`.** It's
+  regenerated from `spec.yaml` on every regen and prose edits
+  get wiped. Route behavioral changes through `skillet spec
+  refine`. (Eval files are the exception — they're durable
+  and meant to be edited directly.)
diff --git a/skills/skillet/evals/_judges.ts b/skills/skillet/evals/_judges.ts
new file mode 100644
index 0000000..7c02f70
--- /dev/null
+++ b/skills/skillet/evals/_judges.ts
@@ -0,0 +1,71 @@
+import { criterionJudge } from "@sentry/skillet/evals";
+
+export const AsksIntentQuestionsJudge = criterionJudge(
+  "AsksIntentQuestionsJudge",
+  "Asks 3-5 clarifying questions about behaviors, prompts/outputs, mistakes, or trigger phrases before generating or invoking the CLI.",
+);
+
+export const DoesNotInvokeCLIPrematurelyJudge = criterionJudge(
+  "DoesNotInvokeCLIPrematurelyJudge",
+  "Does not run, suggest running, or claim to have run a skillet CLI command in this turn — defers until intent is captured.",
+);
+
+export const DoesNotMentionApiKeysJudge = criterionJudge(
+  "DoesNotMentionApiKeysJudge",
+  "Does not instruct the user to set API keys, environment variables, or credentials. Does not name any provider env var.",
+);
+
+export const DoesNotRecommendHandEditSkillMdJudge = criterionJudge(
+  "DoesNotRecommendHandEditSkillMdJudge",
+  "Does not tell the user to hand-edit SKILL.md. Notes that SKILL.md is regenerated/clobbered and routes prose changes through spec.yaml.",
+);
+
+export const DoesNotRecommendValidateJudge = criterionJudge(
+  "DoesNotRecommendValidateJudge",
+  "Does not recommend `skillet validate`. If the verification concept comes up, uses `verify` instead.",
+);
+
+export const ExplainsEvalsAreDurableJudge = criterionJudge(
+  "ExplainsEvalsAreDurableJudge",
+  "Explains that eval files (evals/*.eval.ts) are generated initially but durable, and direct edits there are appropriate for refining test shapes.",
+);
+
+export const ExplainsSpecAsSourceOfTruthJudge = criterionJudge(
+  "ExplainsSpecAsSourceOfTruthJudge",
+  "Explains that SKILL.md is derived from spec.yaml and regenerated, so behavioral changes flow through the spec (e.g. `skillet spec refine`).",
+);
+
+export const RecommendsAddEvalJudge = criterionJudge(
+  "RecommendsAddEvalJudge",
+  "Recommends `skillet add-eval` (with the behavior description) as the command to add named-behavior eval cases.",
+);
+
+export const RecommendsSkilletCreateJudge = criterionJudge(
+  "RecommendsSkilletCreateJudge",
+  "Recommends `skillet create` as the command to start a new skill from a description.",
+);
+
+export const RecommendsSkilletImproveJudge = criterionJudge(
+  "RecommendsSkilletImproveJudge",
+  "Recommends `skillet improve` as the command to iterate on an existing skill, with or without an existing spec.yaml.",
+);
+
+export const RecommendsSpecRefineJudge = criterionJudge(
+  "RecommendsSpecRefineJudge",
+  "Recommends `skillet spec refine \"<feedback>\"` as the way to change a skill via natural-language feedback.",
+);
+
+export const RecommendsSpecShowJudge = criterionJudge(
+  "RecommendsSpecShowJudge",
+  "Recommends `skillet spec show` as the read-only way to inspect the current spec.",
+);
+
+export const RecommendsVerifyJudge = criterionJudge(
+  "RecommendsVerifyJudge",
+  "Recommends `skillet verify` as the command to check that a skill is internally consistent.",
+);
+
+export const UsesScopedPackageJudge = criterionJudge(
+  "UsesScopedPackageJudge",
+  "Invokes skillet via `npx @sentry/skillet` (scoped). Does not use the unscoped `npx skillet` form.",
+);
diff --git a/skills/skillet/evals/capture-intent-before-generation.eval.ts b/skills/skillet/evals/capture-intent-before-generation.eval.ts
new file mode 100644
index 0000000..91b5110
--- /dev/null
+++ b/skills/skillet/evals/capture-intent-before-generation.eval.ts
@@ -0,0 +1,49 @@
+// ──────────────────────────────────────────────────────────
+// Generated initially from spec.yaml; durable after that. Edit
+// freely to refine prompts, setup, and assertions for this
+// behavior. Add or remove behaviors via spec.yaml — skillet only
+// regenerates eval files for behaviors that don't have one yet.
+// ──────────────────────────────────────────────────────────
+import { fileURLToPath } from "node:url";
+import { dirname } from "node:path";
+import { expect } from "vitest";
+import {
+  describeEval,
+  piAiHarness,
+  skilletAgent,
+  toolCalls,
+} from "@sentry/skillet/evals";
+import {
+  AsksIntentQuestionsJudge,
+  DoesNotInvokeCLIPrematurelyJudge,
+} from "./_judges.js";
+
+const skillRoot = dirname(fileURLToPath(import.meta.url)).replace(/\/evals$/, "");
+
+describeEval(
+  "capture-intent-before-generation",
+  {
+    harness: piAiHarness({ agent: skilletAgent({ skillRoot }) }),
+    judgeThreshold: 0.75,
+  },
+  (it) => {
+    it(
+      "capture-intent-before-generation__vague-new-skill",
+      { timeout: 90_000 },
+      async ({ run }) => {
+        const result = await run(
+          "Make me a skill for code review.",
+        );
+
+        // Agent should NOT shell out to skillet on this turn — it
+        // needs to interview the user first.
+        const names = toolCalls(result.session).map((c) => c.name);
+        expect(names).not.toContain("Bash");
+        expect(names).not.toContain("bash");
+
+        await expect(result).toSatisfyJudge(AsksIntentQuestionsJudge);
+        await expect(result).toSatisfyJudge(DoesNotInvokeCLIPrematurelyJudge);
+      },
+    );
+  },
+);
diff --git a/skills/skillet/evals/choose-add-eval-for-named-behaviors.eval.ts b/skills/skillet/evals/choose-add-eval-for-named-behaviors.eval.ts
new file mode 100644
index 0000000..16f1fac
--- /dev/null
+++ b/skills/skillet/evals/choose-add-eval-for-named-behaviors.eval.ts
@@ -0,0 +1,40 @@
+// ──────────────────────────────────────────────────────────
+// Generated initially from spec.yaml; durable after that. Edit
+// freely to refine prompts, setup, and assertions for this
+// behavior. Add or remove behaviors via spec.yaml — skillet only
+// regenerates eval files for behaviors that don't have one yet.
+// ──────────────────────────────────────────────────────────
+import { fileURLToPath } from "node:url";
+import { dirname } from "node:path";
+import { expect } from "vitest";
+import {
+  describeEval,
+  piAiHarness,
+  skilletAgent,
+} from "@sentry/skillet/evals";
+import {
+  RecommendsAddEvalJudge,
+} from "./_judges.js";
+
+const skillRoot = dirname(fileURLToPath(import.meta.url)).replace(/\/evals$/, "");
+
+describeEval(
+  "choose-add-eval-for-named-behaviors",
+  {
+    harness: piAiHarness({ agent: skilletAgent({ skillRoot }) }),
+    judgeThreshold: 0.75,
+  },
+  (it) => {
+    it(
+      "choose-add-eval-for-named-behaviors__add-a-behavior-test",
+      { timeout: 90_000 },
+      async ({ run }) => {
+        const result = await run(
+          "I want to add an eval that checks the skill flags hardcoded secrets in shell scripts. What command do I use?",
+        );
+
+        await expect(result).toSatisfyJudge(RecommendsAddEvalJudge);
+      },
+    );
+  },
+);
diff --git a/skills/skillet/evals/choose-create-for-new-skills.eval.ts b/skills/skillet/evals/choose-create-for-new-skills.eval.ts
new file mode 100644
index 0000000..d52c7d4
--- /dev/null
+++ b/skills/skillet/evals/choose-create-for-new-skills.eval.ts
@@ -0,0 +1,42 @@
+// ──────────────────────────────────────────────────────────
+// Generated initially from spec.yaml; durable after that. Edit
+// freely to refine prompts, setup, and assertions for this
+// behavior. Add or remove behaviors via spec.yaml — skillet only
+// regenerates eval files for behaviors that don't have one yet.
+// ──────────────────────────────────────────────────────────
+import { fileURLToPath } from "node:url";
+import { dirname } from "node:path";
+import { expect } from "vitest";
+import {
+  describeEval,
+  piAiHarness,
+  skilletAgent,
+} from "@sentry/skillet/evals";
+import {
+  RecommendsSkilletCreateJudge,
+  UsesScopedPackageJudge,
+} from "./_judges.js";
+
+const skillRoot = dirname(fileURLToPath(import.meta.url)).replace(/\/evals$/, "");
+
+describeEval(
+  "choose-create-for-new-skills",
+  {
+    harness: piAiHarness({ agent: skilletAgent({ skillRoot }) }),
+    judgeThreshold: 0.75,
+  },
+  (it) => {
+    it(
+      "choose-create-for-new-skills__from-description",
+      { timeout: 90_000 },
+      async ({ run }) => {
+        const result = await run(
+          "I want a skill that reviews Terraform modules for security issues. How do I get started?",
+        );
+
+        await expect(result).toSatisfyJudge(RecommendsSkilletCreateJudge);
+        await expect(result).toSatisfyJudge(UsesScopedPackageJudge);
+      },
+    );
+  },
+);
diff --git a/skills/skillet/evals/choose-improve-for-existing-skills.eval.ts b/skills/skillet/evals/choose-improve-for-existing-skills.eval.ts
new file mode 100644
index 0000000..df98b71
--- /dev/null
+++ b/skills/skillet/evals/choose-improve-for-existing-skills.eval.ts
@@ -0,0 +1,42 @@
+// ──────────────────────────────────────────────────────────
+// Generated initially from spec.yaml; durable after that. Edit
+// freely to refine prompts, setup, and assertions for this
+// behavior. Add or remove behaviors via spec.yaml — skillet only
+// regenerates eval files for behaviors that don't have one yet.
+// ──────────────────────────────────────────────────────────
+import { fileURLToPath } from "node:url";
+import { dirname } from "node:path";
+import { expect } from "vitest";
+import {
+  describeEval,
+  piAiHarness,
+  skilletAgent,
+} from "@sentry/skillet/evals";
+import {
+  RecommendsSkilletImproveJudge,
+  UsesScopedPackageJudge,
+} from "./_judges.js";
+
+const skillRoot = dirname(fileURLToPath(import.meta.url)).replace(/\/evals$/, "");
+
+describeEval(
+  "choose-improve-for-existing-skills",
+  {
+    harness: piAiHarness({ agent: skilletAgent({ skillRoot }) }),
+    judgeThreshold: 0.75,
+  },
+  (it) => {
+    it(
+      "choose-improve-for-existing-skills__legacy-skill-md",
+      { timeout: 90_000 },
+      async ({ run }) => {
+        const result = await run(
+          "I have a SKILL.md file from another project but no spec.yaml. I want to clean it up and add a couple of missing behaviors. What's the workflow?",
+        );
+
+        await expect(result).toSatisfyJudge(RecommendsSkilletImproveJudge);
+        await expect(result).toSatisfyJudge(UsesScopedPackageJudge);
+      },
+    );
+  },
+);
diff --git a/skills/skillet/evals/choose-spec-refine-for-feedback.eval.ts b/skills/skillet/evals/choose-spec-refine-for-feedback.eval.ts
new file mode 100644
index 0000000..b93106f
--- /dev/null
+++ b/skills/skillet/evals/choose-spec-refine-for-feedback.eval.ts
@@ -0,0 +1,40 @@
+// ──────────────────────────────────────────────────────────
+// Generated initially from spec.yaml; durable after that. Edit
+// freely to refine prompts, setup, and assertions for this
+// behavior. Add or remove behaviors via spec.yaml — skillet only
+// regenerates eval files for behaviors that don't have one yet.
+// ──────────────────────────────────────────────────────────
+import { fileURLToPath } from "node:url";
+import { dirname } from "node:path";
+import { expect } from "vitest";
+import {
+  describeEval,
+  piAiHarness,
+  skilletAgent,
+} from "@sentry/skillet/evals";
+import {
+  RecommendsSpecRefineJudge,
+} from "./_judges.js";
+
+const skillRoot = dirname(fileURLToPath(import.meta.url)).replace(/\/evals$/, "");
+
+describeEval(
+  "choose-spec-refine-for-feedback",
+  {
+    harness: piAiHarness({ agent: skilletAgent({ skillRoot }) }),
+    judgeThreshold: 0.75,
+  },
+  (it) => {
+    it(
+      "choose-spec-refine-for-feedback__natural-language-change",
+      { timeout: 90_000 },
+      async ({ run }) => {
+        const result = await run(
+          "The skill is being too cautious — I want it to stop hedging on every recommendation. How do I tell it that?",
+        );
+
+        await expect(result).toSatisfyJudge(RecommendsSpecRefineJudge);
+      },
+    );
+  },
+);
diff --git a/skills/skillet/evals/choose-spec-show-for-inspection.eval.ts b/skills/skillet/evals/choose-spec-show-for-inspection.eval.ts
new file mode 100644
index 0000000..1f610e0
--- /dev/null
+++ b/skills/skillet/evals/choose-spec-show-for-inspection.eval.ts
@@ -0,0 +1,40 @@
+// ──────────────────────────────────────────────────────────
+// Generated initially from spec.yaml; durable after that. Edit
+// freely to refine prompts, setup, and assertions for this
+// behavior. Add or remove behaviors via spec.yaml — skillet only
+// regenerates eval files for behaviors that don't have one yet.
+// ──────────────────────────────────────────────────────────
+import { fileURLToPath } from "node:url";
+import { dirname } from "node:path";
+import { expect } from "vitest";
+import {
+  describeEval,
+  piAiHarness,
+  skilletAgent,
+} from "@sentry/skillet/evals";
+import {
+  RecommendsSpecShowJudge,
+} from "./_judges.js";
+
+const skillRoot = dirname(fileURLToPath(import.meta.url)).replace(/\/evals$/, "");
+
+describeEval(
+  "choose-spec-show-for-inspection",
+  {
+    harness: piAiHarness({ agent: skilletAgent({ skillRoot }) }),
+    judgeThreshold: 0.75,
+  },
+  (it) => {
+    it(
+      "choose-spec-show-for-inspection__readonly-view",
+      { timeout: 90_000 },
+      async ({ run }) => {
+        const result = await run(
+          "How do I just look at the current spec for this skill without changing anything?",
+        );
+
+        await expect(result).toSatisfyJudge(RecommendsSpecShowJudge);
+      },
+    );
+  },
+);
diff --git a/skills/skillet/evals/choose-verify-for-checking.eval.ts b/skills/skillet/evals/choose-verify-for-checking.eval.ts
new file mode 100644
index 0000000..3d4a956
--- /dev/null
+++ b/skills/skillet/evals/choose-verify-for-checking.eval.ts
@@ -0,0 +1,42 @@
+// ──────────────────────────────────────────────────────────
+// Generated initially from spec.yaml; durable after that. Edit
+// freely to refine prompts, setup, and assertions for this
+// behavior. Add or remove behaviors via spec.yaml — skillet only
+// regenerates eval files for behaviors that don't have one yet.
+// ──────────────────────────────────────────────────────────
+import { fileURLToPath } from "node:url";
+import { dirname } from "node:path";
+import { expect } from "vitest";
+import {
+  describeEval,
+  piAiHarness,
+  skilletAgent,
+} from "@sentry/skillet/evals";
+import {
+  DoesNotRecommendValidateJudge,
+  RecommendsVerifyJudge,
+} from "./_judges.js";
+
+const skillRoot = dirname(fileURLToPath(import.meta.url)).replace(/\/evals$/, "");
+
+describeEval(
+  "choose-verify-for-checking",
+  {
+    harness: piAiHarness({ agent: skilletAgent({ skillRoot }) }),
+    judgeThreshold: 0.75,
+  },
+  (it) => {
+    it(
+      "choose-verify-for-checking__consistency-check",
+      { timeout: 90_000 },
+      async ({ run }) => {
+        const result = await run(
+          "How do I check that my skill is internally consistent before I commit?",
+        );
+
+        await expect(result).toSatisfyJudge(RecommendsVerifyJudge);
+        await expect(result).toSatisfyJudge(DoesNotRecommendValidateJudge);
+      },
+    );
+  },
+);
diff --git a/skills/skillet/evals/dont-mention-api-keys.eval.ts b/skills/skillet/evals/dont-mention-api-keys.eval.ts
new file mode 100644
index 0000000..84be915
--- /dev/null
+++ b/skills/skillet/evals/dont-mention-api-keys.eval.ts
@@ -0,0 +1,40 @@
+// ──────────────────────────────────────────────────────────
+// Generated initially from spec.yaml; durable after that. Edit
+// freely to refine prompts, setup, and assertions for this
+// behavior. Add or remove behaviors via spec.yaml — skillet only
+// regenerates eval files for behaviors that don't have one yet.
+// ──────────────────────────────────────────────────────────
+import { fileURLToPath } from "node:url";
+import { dirname } from "node:path";
+import { expect } from "vitest";
+import {
+  describeEval,
+  piAiHarness,
+  skilletAgent,
+} from "@sentry/skillet/evals";
+import {
+  DoesNotMentionApiKeysJudge,
+} from "./_judges.js";
+
+const skillRoot = dirname(fileURLToPath(import.meta.url)).replace(/\/evals$/, "");
+
+describeEval(
+  "dont-mention-api-keys",
+  {
+    harness: piAiHarness({ agent: skilletAgent({ skillRoot }) }),
+    judgeThreshold: 0.75,
+  },
+  (it) => {
+    it(
+      "dont-mention-api-keys__setup-question",
+      { timeout: 90_000 },
+      async ({ run }) => {
+        const result = await run(
+          "What do I need to set up before I can run skillet for the first time?",
+        );
+
+        await expect(result).toSatisfyJudge(DoesNotMentionApiKeysJudge);
+      },
+    );
+  },
+);
diff --git a/skills/skillet/evals/dont-recommend-validate.eval.ts b/skills/skillet/evals/dont-recommend-validate.eval.ts
new file mode 100644
index 0000000..bc3829e
--- /dev/null
+++ b/skills/skillet/evals/dont-recommend-validate.eval.ts
@@ -0,0 +1,42 @@
+// ──────────────────────────────────────────────────────────
+// Generated initially from spec.yaml; durable after that. Edit
+// freely to refine prompts, setup, and assertions for this
+// behavior. Add or remove behaviors via spec.yaml — skillet only
+// regenerates eval files for behaviors that don't have one yet.
+// ──────────────────────────────────────────────────────────
+import { fileURLToPath } from "node:url";
+import { dirname } from "node:path";
+import { expect } from "vitest";
+import {
+  describeEval,
+  piAiHarness,
+  skilletAgent,
+} from "@sentry/skillet/evals";
+import {
+  DoesNotRecommendValidateJudge,
+  RecommendsVerifyJudge,
+} from "./_judges.js";
+
+const skillRoot = dirname(fileURLToPath(import.meta.url)).replace(/\/evals$/, "");
+
+describeEval(
+  "dont-recommend-validate",
+  {
+    harness: piAiHarness({ agent: skilletAgent({ skillRoot }) }),
+    judgeThreshold: 0.75,
+  },
+  (it) => {
+    it(
+      "dont-recommend-validate__leading-validate-question",
+      { timeout: 90_000 },
+      async ({ run }) => {
+        const result = await run(
+          "Does skillet have a validate command I should run on my spec?",
+        );
+
+        await expect(result).toSatisfyJudge(DoesNotRecommendValidateJudge);
+        await expect(result).toSatisfyJudge(RecommendsVerifyJudge);
+      },
+    );
+  },
+);
diff --git a/skills/skillet/evals/dont-tell-user-to-handedit-derived-files.eval.ts b/skills/skillet/evals/dont-tell-user-to-handedit-derived-files.eval.ts
new file mode 100644
index 0000000..46232cc
--- /dev/null
+++ b/skills/skillet/evals/dont-tell-user-to-handedit-derived-files.eval.ts
@@ -0,0 +1,55 @@
+// ──────────────────────────────────────────────────────────
+// Generated initially from spec.yaml; durable after that. Edit
+// freely to refine prompts, setup, and assertions for this
+// behavior. Add or remove behaviors via spec.yaml — skillet only
+// regenerates eval files for behaviors that don't have one yet.
+// ──────────────────────────────────────────────────────────
+import { fileURLToPath } from "node:url";
+import { dirname } from "node:path";
+import { expect } from "vitest";
+import {
+  describeEval,
+  piAiHarness,
+  skilletAgent,
+} from "@sentry/skillet/evals";
+import {
+  DoesNotRecommendHandEditSkillMdJudge,
+  ExplainsEvalsAreDurableJudge,
+  RecommendsSpecRefineJudge,
+} from "./_judges.js";
+
+const skillRoot = dirname(fileURLToPath(import.meta.url)).replace(/\/evals$/, "");
+
+describeEval(
+  "dont-tell-user-to-handedit-derived-files",
+  {
+    harness: piAiHarness({ agent: skilletAgent({ skillRoot }) }),
+    judgeThreshold: 0.75,
+  },
+  (it) => {
+    it(
+      "dont-tell-user-to-handedit-derived-files__skill-md-tweak",
+      { timeout: 90_000 },
+      async ({ run }) => {
+        const result = await run(
+          "There's a sentence in SKILL.md I'd like to rephrase. Should I just open the file and change it?",
+        );
+
+        await expect(result).toSatisfyJudge(DoesNotRecommendHandEditSkillMdJudge);
+        await expect(result).toSatisfyJudge(RecommendsSpecRefineJudge);
+      },
+    );
+
+    it(
+      "dont-tell-user-to-handedit-derived-files__eval-file-tweak",
+      { timeout: 90_000 },
+      async ({ run }) => {
+        const result = await run(
+          "I want to tighten an assertion in one of my evals/*.eval.ts files. Is editing it directly the right move, or do I have to go through the CLI?",
+        );
+
+        await expect(result).toSatisfyJudge(ExplainsEvalsAreDurableJudge);
+      },
+    );
+  },
+);
diff --git a/skills/skillet/evals/explain-spec-as-source-of-truth.eval.ts b/skills/skillet/evals/explain-spec-as-source-of-truth.eval.ts
new file mode 100644
index 0000000..f6f7a5a
--- /dev/null
+++ b/skills/skillet/evals/explain-spec-as-source-of-truth.eval.ts
@@ -0,0 +1,55 @@
+// ──────────────────────────────────────────────────────────
+// Generated initially from spec.yaml; durable after that. Edit
+// freely to refine prompts, setup, and assertions for this
+// behavior. Add or remove behaviors via spec.yaml — skillet only
+// regenerates eval files for behaviors that don't have one yet.
+// ──────────────────────────────────────────────────────────
+import { fileURLToPath } from "node:url";
+import { dirname } from "node:path";
+import { expect } from "vitest";
+import {
+  describeEval,
+  piAiHarness,
+  skilletAgent,
+} from "@sentry/skillet/evals";
+import {
+  ExplainsEvalsAreDurableJudge,
+  ExplainsSpecAsSourceOfTruthJudge,
+  RecommendsSpecRefineJudge,
+} from "./_judges.js";
+
+const skillRoot = dirname(fileURLToPath(import.meta.url)).replace(/\/evals$/, "");
+
+describeEval(
+  "explain-spec-as-source-of-truth",
+  {
+    harness: piAiHarness({ agent: skilletAgent({ skillRoot }) }),
+    judgeThreshold: 0.75,
+  },
+  (it) => {
+    it(
+      "explain-spec-as-source-of-truth__editing-skill-md",
+      { timeout: 90_000 },
+      async ({ run }) => {
+        const result = await run(
+          "I want to tweak the wording in SKILL.md to make it clearer. Can I just open it and edit?",
+        );
+
+        await expect(result).toSatisfyJudge(ExplainsSpecAsSourceOfTruthJudge);
+        await expect(result).toSatisfyJudge(RecommendsSpecRefineJudge);
+      },
+    );
+
+    it(
+      "explain-spec-as-source-of-truth__editing-eval-files",
+      { timeout: 90_000 },
+      async ({ run }) => {
+        const result = await run(
+          "Can I hand-edit the files under evals/ to tighten up the assertions, or will skillet overwrite them?",
+        );
+
+        await expect(result).toSatisfyJudge(ExplainsEvalsAreDurableJudge);
+      },
+    );
+  },
+);
diff --git a/skills/skillet/evals/scope-package-name.eval.ts b/skills/skillet/evals/scope-package-name.eval.ts
new file mode 100644
index 0000000..e42205b
--- /dev/null
+++ b/skills/skillet/evals/scope-package-name.eval.ts
@@ -0,0 +1,40 @@
+// ──────────────────────────────────────────────────────────
+// Generated initially from spec.yaml; durable after that. Edit
+// freely to refine prompts, setup, and assertions for this
+// behavior. Add or remove behaviors via spec.yaml — skillet only
+// regenerates eval files for behaviors that don't have one yet.
+// ──────────────────────────────────────────────────────────
+import { fileURLToPath } from "node:url";
+import { dirname } from "node:path";
+import { expect } from "vitest";
+import {
+  describeEval,
+  piAiHarness,
+  skilletAgent,
+} from "@sentry/skillet/evals";
+import {
+  UsesScopedPackageJudge,
+} from "./_judges.js";
+
+const skillRoot = dirname(fileURLToPath(import.meta.url)).replace(/\/evals$/, "");
+
+describeEval(
+  "scope-package-name",
+  {
+    harness: piAiHarness({ agent: skilletAgent({ skillRoot }) }),
+    judgeThreshold: 0.75,
+  },
+  (it) => {
+    it(
+      "scope-package-name__one-liner-install",
+      { timeout: 90_000 },
+      async ({ run }) => {
+        const result = await run(
+          "Give me the one-liner to run skillet via npx so I can try it without installing globally.",
+        );
+
+        await expect(result).toSatisfyJudge(UsesScopedPackageJudge);
+      },
+    );
+  },
+);
diff --git a/skills/skillet/spec.yaml b/skills/skillet/spec.yaml
new file mode 100644
index 0000000..296f987
--- /dev/null
+++ b/skills/skillet/spec.yaml
@@ -0,0 +1,121 @@
+# ──────────────────────────────────────────────────────────
+# Skillet skill spec. Edit this file directly or use the
+# `skillet spec` subcommands — both are supported. Skillet
+# validates this file on read; malformed edits will fail fast
+# with a clear error before doing any work.
+#
+# After editing, run `skillet improve` to refresh SKILL.md and
+# eval cases against the updated spec.
+# ──────────────────────────────────────────────────────────
+managed_by: skillet
+spec_version: 1
+name: skillet
+intent: |
+  Create, evaluate, and improve agent skills using the skillet CLI.
+  Skillet is spec-driven: spec.yaml captures intent (behaviors,
+  must-nots, triggers). SKILL.md is regenerated from it. Eval files
+  (evals/*.eval.ts) are generated initially but durable after that —
+  edit them directly to refine specific test shapes. Iteration patches
+  the spec or tunes SKILL.md prose, not the eval implementations.
+
+triggers:
+  should:
+    - create a skill
+    - make a skill for X
+    - improve this skill
+    - add an eval
+    - test my skill
+    - verify a skill
+    - refine a skill
+    - working with spec.yaml
+    - working with SKILL.md
+    - working with eval files
+  should_not:
+    - run my unit tests
+    - lint this code
+
+behaviors:
+  - id: choose-create-for-new-skills
+    statement: Recommend `skillet create` when the user wants to start a new skill from a description.
+    rationale: |
+      `create` runs spec init + regen + improve in one shot. It's the
+      friendliest entry point for "I want a skill for X" requests.
+
+  - id: choose-improve-for-existing-skills
+    statement: Recommend `skillet improve` when the user has an existing skill (with or without spec.yaml) that needs work.
+    rationale: |
+      `improve` auto-imports a legacy SKILL.md into a spec on first run,
+      then runs the verify-driven iteration loop. Don't direct users to
+      manually run `spec import` — the loop handles it.
+
+  - id: choose-spec-show-for-inspection
+    statement: Recommend `skillet spec show` when the user wants to read the current spec without changing it.
+    rationale: |
+      Show is read-only and prints the parsed spec with the banner stripped.
+
+  - id: choose-spec-refine-for-feedback
+    statement: Recommend `skillet spec refine "<feedback>"` when the user wants to change a skill via natural-language feedback.
+    rationale: |
+      Refine produces structured SpecPatch operations, applies them, and
+      auto-regens. The user describes the change in their own words.
+
+  - id: choose-add-eval-for-named-behaviors
+    statement: Recommend `skillet add-eval "<behavior>"` when the user wants to add one or more named behaviors as eval cases.
+    rationale: |
+      `add-eval` is a wrapper over `spec refine` that auto-imports legacy
+      skills, then appends the named behaviors to the spec and regens.
+
+  - id: choose-verify-for-checking
+    statement: Recommend `skillet verify` (not "validate") when the user wants to check that a skill is internally consistent.
+    rationale: |
+      The old `validate` command is gone. `verify` runs four layers
+      (structural, coverage, results, semantic) and subsumes the
+      per-file lint that `validate` used to do.
+
+  - id: scope-package-name
+    statement: Always invoke skillet via `npx @sentry/skillet`, not `npx skillet`.
+    rationale: |
+      The package is published under the @sentry scope. The unscoped
+      name resolves to a different package or fails.
+
+  - id: capture-intent-before-generation
+    statement: When the user asks for a new skill or wants to add evals, ask 3-5 questions to capture intent (most important behaviors, realistic prompt + expected output, common mistakes, trigger phrases) before invoking the CLI.
+    rationale: |
+      Skillet's spec-init phase is single-turn — it generates a spec
+      from whatever description it receives. A rich, structured
+      description from the user yields a much better starting spec
+      than "make a skill for X". The agent acts as the front-end
+      interview before passing the combined description to skillet.
+
+  - id: explain-spec-as-source-of-truth
+    statement: When the user asks about editing SKILL.md, explain that SKILL.md is derived from spec.yaml (regen-clobbered) and direct them to `skillet spec refine` for behavioral changes. Eval files (evals/*.eval.ts) are generated initially but durable after that — direct edits there are fine for refining test shapes.
+    rationale: |
+      SKILL.md is rewritten on every regen, so prose hand-edits get
+      wiped. Eval files are different: skillet generates them once,
+      then they're committed to git and edited like any test file.
+      Behavior set changes (add/remove rules) flow through spec.yaml
+      so the eval coverage stays in sync.
+
+must_not:
+  - id: dont-mention-api-keys
+    statement: Never tell the user to set API keys or environment variables. Credentials are auto-discovered.
+    rationale: |
+      Skillet uses provider-autodiscovery; mentioning API keys both
+      contradicts the user-zero-config promise and might leak the
+      specific env var name into a transcript.
+    leakage_risk: env-var-leak
+
+  - id: dont-recommend-validate
+    statement: Don't recommend `skillet validate` — that command was removed.
+    rationale: |
+      Per-file structural checks now live as layer 1 of `verify`.
+      Telling the user to run `validate` will fail with an unknown-command error.
+
+  - id: dont-tell-user-to-handedit-derived-files
+    statement: Don't tell the user to hand-edit SKILL.md (it's regenerated and clobbered on every regen). Direct them to `skillet spec refine` for behavioral changes. Eval files are durable and can be edited directly to refine test shapes.
+    rationale: |
+      SKILL.md is rewritten from spec.yaml on every regen, so prose
+      hand-edits get wiped. Eval files (.eval.ts) are different —
+      generated once, committed, edited like any test file. The
+      CLI mutation channel is for behavior set changes, not test-shape
+      refinements.