Skip to content

Commit 2fa2896

Browse files
authored
Merge pull request #3 from moonrunnerkc/v1-phase3d-release-prep
skillcheck v1.0.1: post-1.0.0 implementation, docs corrections, guide-parity flags
2 parents 07788b9 + 129cb0a commit 2fa2896

29 files changed

Lines changed: 1611 additions & 48 deletions

.gitignore

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,3 +11,13 @@ CLAUDE.md
1111
.copilot-instructions.md
1212
# Field test artifacts (kept locally, not in git)
1313
runs/
14+
15+
# Launch post drafts (kept locally, never push)
16+
launch-post/
17+
18+
# Verification venvs (created by end-to-end verification runs)
19+
.venv-verify/
20+
21+
# Agent scratchpads (kept locally; never push)
22+
.codex
23+
review.md

CHANGELOG.md

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,33 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
## [Unreleased]
99

1010
### Added
11-
- Release prep artifacts for v1.0.0: `RELEASE_NOTES_v1.0.0.md`, `LAUNCH_POST_v1.0.md`, `LAUNCH_CHECKLIST.md`.
11+
- `scripts/summarize_batch.py` and `tests/test_batch15_summarize.py`: maintainer-facing tool that consumes a directory of skillcheck batch-run artifacts (one directory per repo, one subdirectory per skill, paired `*.json` / `*.txt` reports per phase) and writes `summary.csv` plus `findings.md`. Invoked as `python scripts/summarize_batch.py <batch_dir>`. Not exposed as a console script, not wired into the GitHub Action; the action runs skillcheck against one path, this consumes outputs across many. Documented under Maintainer Notes in the README.
12+
- `tests/test_readme_test_count_claim.py`: parses the README's "N tests cover ..." sentence and asserts it matches `pytest --collect-only`. The next time the suite grows without bumping the README number, CI fails. Closes the recurring drift pattern that v1.0.1 had to correct twice.
13+
14+
### Changed
15+
- README test count bumped from 663 to 664 to include the new drift-guard test.
16+
17+
## [1.0.1] - 2026-04-28
18+
19+
End-to-end verification against `anthropics/skills` surfaced documentation drift in the published v1.0.0 README and a batch of post-tag implementation work that had not been committed. v1.0.1 commits that work, ships the docs corrections, and adds guide-parity flags. Behavior change: warning-only runs now return exit code 2 (was 0).
20+
21+
### Changed
22+
- Warning-only CLI reports now return exit code 2. Exit code 1 remains errors; exit code 3 remains semantic drift. README Exit Codes table row 0 updated to "no errors and no warnings".
23+
- README test count corrected from 653 to 663.
24+
- README JSON-stability promise updated from "0.x series" to "v1.x series".
25+
- README field-test numbers reframed as April 2026 snapshots against `anthropics/skills`, with a note that they will drift as upstream evolves.
26+
- `action.yml` `format` input description clarified: accepted but ignored at runtime; the action always invokes skillcheck with `--format json`.
27+
- Development extras now include `ruff>=0.6`, `mypy>=1.10`, and `types-PyYAML>=6.0`.
28+
29+
### Added
30+
- `--semantic`: guide-compatible shortcut that enables semantic-adjacent validation. In standalone mode it runs heuristic graph analysis; with ingested agent responses it merges those diagnostics.
31+
- `--agent-reason`: guide-compatible agent-workflow shortcut. Emits a combined critique and graph prompt packet so the calling agent can run both reasoning steps and feed JSON back through `--ingest-critique` and `--ingest-graph`.
32+
- `--format md` and `--format agent`: Markdown report output and agent-oriented next-action output.
33+
- `skillcheck.toml` config loading: top-level defaults for format, thresholds, target agent, strict VS Code mode, skip flags, ignored rule prefixes, graph analysis, semantic mode, history, and agent variants.
34+
- Experimental `--activation-hypotheses`: generates likely natural-language routing triggers plus a discoverability entropy score. Routing caveat included in every report.
35+
- Machine-readable diagnostic metadata: JSON diagnostics now include `source` and `confidence` fields.
36+
- GitHub Action inputs for the v1.0 modes: `semantic`, `analyze-graph`, `ingest-critique`, `critique-agent`, `ingest-graph`, `graph-agent`, `history`, `activation-hypotheses`. The action still always emits JSON internally for PR annotations.
37+
- `tests/test_v1_completion.py`: covers `--format md`, `--format agent`, `--agent-reason`, `--semantic` graph enabling, `--activation-hypotheses` JSON, `skillcheck.toml` loading, and source/confidence in JSON output.
1238

1339
## [1.0.0] - 2026-04-25
1440

@@ -17,6 +43,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1743
- Added `docs/case-study-v1-real-world-runs.md`: full breakdown of the pre-3B field test covering 18 Anthropic skills (symbolic), `mcp-builder` through the full v1.0 pipeline (symbolic + heuristic graph + agent critique + agent graph), and 5 uxuiprinciples skills (strict VS Code mode). Documents three `semantic.contradiction.detected` errors on a skill that passed all symbolic checks, five `graph.capability.orphaned` patterns, and the recurring unknown-field pattern (`license`, `homepage`, `env`) across official catalogs.
1844

1945
### Added
46+
- Release prep artifacts: `RELEASE_NOTES_v1.0.0.md`, `LAUNCH_POST_v1.0.md`, `LAUNCH_CHECKLIST.md`.
2047
- `skills/skillcheck/SKILL.md`: skillcheck's own SKILL.md, validating the tool against itself. Passes symbolic, graph, critique, and history validation with zero errors and zero warnings. Serves as the worked example for the Rules table in the README.
2148
- Self-host integration test suite (`tests/test_self_host.py`): confirms the bundled SKILL.md passes symbolic validation, all five graph analyzers, critique ingestion, agent graph ingestion with divergence analysis, full CLI pipeline, history round-trip, and description scoring threshold.
2249
- `scripts/regen_self_host_fixtures.py`: regenerates `tests/fixtures/self_host/graph_clean.json` from the live heuristic graph after skill edits.

README.md

Lines changed: 29 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,12 @@ skillcheck path/to/SKILL.md --analyze-graph
4949
skillcheck path/to/SKILL.md --emit-critique-prompt > prompt.txt
5050
# Run prompt.txt through your agent. Agent returns JSON. Then:
5151
skillcheck path/to/SKILL.md --ingest-critique response.json
52+
53+
# Agent shortcut: emit critique and graph prompts in one packet
54+
skillcheck path/to/SKILL.md --agent-reason --format agent
55+
56+
# Experimental activation estimates
57+
skillcheck path/to/SKILL.md --activation-hypotheses --format json
5258
```
5359

5460
## Modes
@@ -63,7 +69,7 @@ skillcheck skills/ # recursive scan; finds every file named SKILL.md
6369
skillcheck SKILL.md --format json
6470
```
6571

66-
From the field test on Anthropic's official skills repository (18 skills, `runs/anthropics-corpus/01-symbolic-all.txt`): four of eighteen files failed. `claude-api/SKILL.md` failed with `frontmatter.name.reserved-word` because the name contains the reserved word "claude". `template/SKILL.md` failed with `frontmatter.name.directory-mismatch` (name `template-skill`, directory `template`). Both files look correct on casual inspection.
72+
From the field test on Anthropic's official skills repository (18 skills, `runs/anthropics-corpus/01-symbolic-all.txt`, snapshot taken during v1.0 release prep in April 2026): four of eighteen files failed. `claude-api/SKILL.md` failed with `frontmatter.name.reserved-word` because the name contains the reserved word "claude". `template/SKILL.md` failed with `frontmatter.name.directory-mismatch` (name `template-skill`, directory `template`). Both files look correct on casual inspection.
6773

6874
### Heuristic Graph
6975

@@ -86,7 +92,7 @@ From the field test on `mcp-builder/SKILL.md` (`runs/anthropics-mcp-builder/02-g
8692
has no declared inputs or outputs.
8793
```
8894

89-
Thirteen of fourteen capability headings in that skill had no declared I/O. That is a signal the skill relies entirely on implicit context rather than declared contracts.
95+
Thirteen of fourteen capability headings in that skill had no declared I/O at the time of the field test. That is a signal the skill relies entirely on implicit context rather than declared contracts. Numbers reflect a snapshot of `anthropics/skills` from April 2026 and will drift as upstream evolves; rerun against the current repo to see fresh counts.
9096

9197
### Agent Critique
9298

@@ -98,6 +104,7 @@ skillcheck SKILL.md --emit-critique-prompt > prompt.txt
98104
skillcheck SKILL.md --ingest-critique response.json
99105
skillcheck SKILL.md --ingest-critique - # read from stdin
100106
skillcheck SKILL.md --emit-critique-prompt --critique-agent codex > prompt.txt
107+
skillcheck SKILL.md --agent-reason --format agent # critique + graph prompt packet
101108
```
102109

103110
`--critique-agent` selects a framing variant tuned for each platform (claude, codex, cursor). The schema and exit codes are identical across all variants.
@@ -218,13 +225,16 @@ JSON output (`--format json`):
218225
}
219226
```
220227

221-
The JSON schema is stable. It will not change in a backward-incompatible way within the 0.x series.
228+
Each diagnostic includes `source` and `confidence` fields in JSON output. `source` is one of `spec`, `advisory`, `heuristic`, `agent`, or `history`; `confidence` is `high`, `medium`, or `low`.
229+
230+
The JSON schema is stable. It will not change in a backward-incompatible way within the v1.x series.
222231

223232
## Options
224233

225234
| Flag | Default | Description |
226235
|---|---|---|
227-
| `--format {text,json}` | `text` | Output format |
236+
| `--format {text,json,md,agent}` | `text` | Output format |
237+
| `--config PATH` | nearest `skillcheck.toml` | Load config defaults from TOML |
228238
| `--max-lines N` | `500` | Override the line-count threshold |
229239
| `--max-tokens N` | `8000` | Override the token-count threshold |
230240
| `--ignore PREFIX` | | Suppress rules matching this prefix; can be repeated |
@@ -235,6 +245,8 @@ The JSON schema is stable. It will not change in a backward-incompatible way wit
235245
| `--min-desc-score N` | | Minimum description quality score (0-100); below this triggers a warning |
236246
| `--target-agent {claude,vscode,all}` | `all` | Scope compatibility checks to a specific agent |
237247
| `--strict-vscode` | `false` | Promote VS Code compatibility issues to errors |
248+
| `--semantic` | `false` | Enable semantic-adjacent validation; standalone mode runs heuristic graph analysis |
249+
| `--agent-reason` | `false` | Emit a combined critique + graph prompt packet for the calling agent |
238250
| `--emit-critique-prompt` | `false` | Print agent self-critique prompt to stdout and exit 0 |
239251
| `--ingest-critique PATH` | | Read agent critique JSON from PATH or `-` for stdin; merge with symbolic results |
240252
| `--critique-agent NAME` | `claude` | Prompt variant: `claude`, `codex`, or `cursor`. Requires `--emit-critique-prompt` or `--ingest-critique` |
@@ -245,15 +257,16 @@ The JSON schema is stable. It will not change in a backward-incompatible way wit
245257
| `--graph-agent NAME` | `claude` | Prompt variant for graph extraction: `claude`, `codex`, or `cursor`. Requires `--emit-graph-prompt` or `--ingest-graph` |
246258
| `--history` | `false` | Append a validation record to `.skillcheck-history.json` next to the skill |
247259
| `--show-history` | `false` | Print the validation ledger and exit 0 |
260+
| `--activation-hypotheses` | `false` | Experimental emit mode for likely natural-language activation triggers |
248261
| `--version` | | Show version and exit |
249262

250263
## Exit Codes
251264

252265
| Code | Meaning | Example invocation |
253266
|---|---|---|
254-
| `0` | No errors; warnings and info are allowed | `skillcheck skills/skillcheck/SKILL.md` |
267+
| `0` | No errors and no warnings | `skillcheck skills/skillcheck/SKILL.md` |
255268
| `1` | One or more errors found | `skillcheck SKILL.md` when the name is invalid |
256-
| `2` | Input error: missing file or empty directory | `skillcheck path/that/does/not/exist` |
269+
| `2` | Warning-only report or input error | `skillcheck SKILL.md --max-lines 1` |
257270
| `3` | Symbolic passed but ingested critique found semantic errors | `skillcheck SKILL.md --ingest-critique response.json` when the agent reported contradictions |
258271

259272
Exit code 1 takes priority over 3 when symbolic errors also exist.
@@ -307,7 +320,7 @@ Source tags: `spec` rules derive from the agentskills.io specification or agent-
307320

308321
## Case Study
309322

310-
We ran skillcheck against three corpora: Anthropic's official skills repository (18 skills), the `mcp-builder` skill through the full v1.0 pipeline, and five skills from the uxuiprinciples/agent-skills collection. Full run artifacts: `runs/anthropics-corpus/`, `runs/anthropics-mcp-builder/`, `runs/uxuiprinciples-corpus/`.
323+
We ran skillcheck against three corpora during v1.0 release prep (April 2026 snapshots): Anthropic's official skills repository (18 skills), the `mcp-builder` skill through the full v1.0 pipeline, and five skills from the uxuiprinciples/agent-skills collection. Full run artifacts: `runs/anthropics-corpus/`, `runs/anthropics-mcp-builder/`, `runs/uxuiprinciples-corpus/`.
311324

312325
The symbolic run of the Anthropic corpus returned four failures from eighteen files (exit 1). All four files look correct on review: two had second-person voice in the description, one used "claude" as part of the name (reserved word per spec), and the template skill had a name/directory mismatch. The deeper finding came from running `mcp-builder` through the critique pipeline: the symbolic run passed (exit 0), but the ingested agent critique returned exit 3 with three `semantic.contradiction.detected` errors. The skill's frontmatter offers Python and TypeScript as equal options; its body unconditionally recommends TypeScript in Phase 1.3. That inconsistency means any agent following the Python path hits an unresolved decision point. No static linter catches it. See [docs/case-study-v1-real-world-runs.md](docs/case-study-v1-real-world-runs.md) for the full breakdown.
313326

@@ -334,7 +347,7 @@ pip install -e ".[dev]"
334347
python3 -m pytest tests/ -q
335348
```
336349

337-
653 tests cover all rule modules, CLI exit codes, graph analyzers, divergence detection, critique parsing, history round-trips, and the full self-host pipeline against `skills/skillcheck/SKILL.md`. Fixtures are in `tests/fixtures/`; every rule has at least one positive and one negative test case.
350+
664 tests cover all rule modules, CLI exit codes, graph analyzers, divergence detection, critique parsing, history round-trips, and the full self-host pipeline against `skills/skillcheck/SKILL.md`. Fixtures are in `tests/fixtures/`; every rule has at least one positive and one negative test case. `tests/test_readme_test_count_claim.py` asserts this count matches `pytest --collect-only`, so any future suite change has to update the number in the same commit or CI fails.
338351

339352
## Maintainer Notes
340353

@@ -346,6 +359,14 @@ make regen-self-host-fixtures
346359

347360
This runs `scripts/regen_self_host_fixtures.py`, which extracts a fresh heuristic graph and writes it to `tests/fixtures/self_host/graph_clean.json`.
348361

362+
To summarize a batch of skillcheck JSON outputs across many repos (the layout the field-test runs use, with one directory per repo, one subdirectory per skill, and `01-symbolic.json` / `02-strict-vscode.json` / `03-graph-analyze.json` / `04-graph-extracted.json` / `08-critique-report.json` / `09-graph-agent-report.json` / `10-full-pipeline.json` per skill), run:
363+
364+
```bash
365+
python scripts/summarize_batch.py path/to/batch-dir
366+
```
367+
368+
It writes `summary.csv` and `findings.md` next to the batch directory. The script is intended for benchmark and field-test workflows; it is not part of the CLI surface and is not exposed as a console script.
369+
349370
To add a new rule: implement `def check_something(skill: ParsedSkill) -> list[Diagnostic]` in the appropriate module under `src/skillcheck/rules/`, register it in `src/skillcheck/rules/__init__.py`, add at least one positive and one negative fixture, and add a row to the Rules table above. Full conventions are in [`.github/CLAUDE.md`](.github/CLAUDE.md).
350371

351372
## License

RELEASE_NOTES_v1.0.1.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
# skillcheck 1.0.1
2+
3+
skillcheck v1.0.1 commits a batch of post-v1.0.0 implementation work that had been sitting uncommitted, ships the docs corrections an end-to-end verification surfaced, and aligns the README, CHANGELOG, and CLI surface so they describe the same release.
4+
5+
There is one behavior change relative to v1.0.0: warning-only runs now return exit code 2. Errors return 1; semantic drift returns 3. CI consumers that previously relied on warning-only exiting 0 must update.
6+
7+
## Changed
8+
9+
- Warning-only CLI reports now return exit code 2. Exit code 1 remains errors; exit code 3 remains semantic drift.
10+
- README Exit Codes table row 0 now reads "no errors and no warnings".
11+
- README test count corrected from 653 to 663.
12+
- README JSON-stability promise updated from "0.x series" to "v1.x series".
13+
- README field-test numbers reframed as April 2026 snapshots against `anthropics/skills`, with a note that they will drift as upstream evolves.
14+
- `action.yml` `format` input description clarified: accepted but ignored at runtime; the action always invokes skillcheck with `--format json` so it can parse diagnostics for PR annotations and the step summary.
15+
- Development extras now include `ruff>=0.6`, `mypy>=1.10`, and `types-PyYAML>=6.0`.
16+
17+
## Added
18+
19+
- `--semantic`: guide-compatible shortcut that enables semantic-adjacent validation. In standalone mode it runs heuristic graph analysis; with ingested agent responses it merges those diagnostics.
20+
- `--agent-reason`: guide-compatible agent-workflow shortcut. Emits a combined critique and graph prompt packet so the calling agent can run both reasoning steps and feed JSON back through `--ingest-critique` and `--ingest-graph`.
21+
- `--format md` and `--format agent`: Markdown report output and agent-oriented next-action output.
22+
- `skillcheck.toml` config loading: top-level defaults for format, thresholds, target agent, strict VS Code mode, skip flags, ignored rule prefixes, graph analysis, semantic mode, history, and agent variants. CLI flags always win; the loader fills unset values.
23+
- Experimental `--activation-hypotheses`: generates likely natural-language routing triggers plus a discoverability entropy score. Routing caveat included in every report.
24+
- Machine-readable diagnostic metadata: JSON diagnostics now include `source` and `confidence` fields.
25+
- GitHub Action inputs for the v1.0 modes: `semantic`, `analyze-graph`, `ingest-critique`, `critique-agent`, `ingest-graph`, `graph-agent`, `history`, `activation-hypotheses`. The action still always emits JSON internally for PR annotations.
26+
27+
## Why this is a patch and not a minor
28+
29+
Every addition above either documents existing behavior, refines a flag, or is gated behind a new opt-in flag. There is one breaking-ish change: warning-only runs now exit 2 instead of 0. Strict semver would call that a minor bump. The judgment call here: v1.0.0 shipped with documentation that already implied the v2-style exit codes (and the v1.0.1 README makes it explicit), the prior "warnings exit 0" behavior was undocumented in the released README, and the change matches what users running this in CI would expect. If your CI pipeline depended on the old behavior, pin to `@v1.0.0` rather than `@v1` until you can update.
30+
31+
## Verification
32+
33+
After installing `skillcheck==1.0.1`:
34+
35+
```bash
36+
skillcheck --version
37+
# skillcheck 1.0.1
38+
39+
skillcheck skills/skillcheck/SKILL.md --analyze-graph
40+
# exit 0 with no errors and no warnings (only INFO diagnostics)
41+
```
42+
43+
End-to-end verification was run against `anthropics/skills` at commit `5128e186` (18 SKILL.md files). All 26 documented flags exercised; all four exit codes (0, 1, 2, 3) reproduced; the action entrypoint produced byte-identical JSON to the CLI. Full report: see the v1.0.1 verification artifacts.
44+
45+
## Links
46+
47+
- PyPI: https://pypi.org/project/skillcheck/1.0.1/ (available after publish)
48+
- GitHub Release: https://github.com/moonrunnerkc/skillcheck/releases/tag/v1.0.1
49+
- agentskills.io specification: https://agentskills.io/specification
50+
- README: https://github.com/moonrunnerkc/skillcheck/blob/main/README.md

0 commit comments

Comments
 (0)