You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+28-1Lines changed: 28 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,33 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
8
8
## [Unreleased]
9
9
10
10
### Added
11
-
- Release prep artifacts for v1.0.0: `RELEASE_NOTES_v1.0.0.md`, `LAUNCH_POST_v1.0.md`, `LAUNCH_CHECKLIST.md`.
11
+
-`scripts/summarize_batch.py` and `tests/test_batch15_summarize.py`: maintainer-facing tool that consumes a directory of skillcheck batch-run artifacts (one directory per repo, one subdirectory per skill, paired `*.json` / `*.txt` reports per phase) and writes `summary.csv` plus `findings.md`. Invoked as `python scripts/summarize_batch.py <batch_dir>`. Not exposed as a console script, not wired into the GitHub Action; the action runs skillcheck against one path, this consumes outputs across many. Documented under Maintainer Notes in the README.
12
+
-`tests/test_readme_test_count_claim.py`: parses the README's "N tests cover ..." sentence and asserts it matches `pytest --collect-only`. The next time the suite grows without bumping the README number, CI fails. Closes the recurring drift pattern that v1.0.1 had to correct twice.
13
+
14
+
### Changed
15
+
- README test count bumped from 663 to 664 to include the new drift-guard test.
16
+
17
+
## [1.0.1] - 2026-04-28
18
+
19
+
End-to-end verification against `anthropics/skills` surfaced documentation drift in the published v1.0.0 README and a batch of post-tag implementation work that had not been committed. v1.0.1 commits that work, ships the docs corrections, and adds guide-parity flags. Behavior change: warning-only runs now return exit code 2 (was 0).
20
+
21
+
### Changed
22
+
- Warning-only CLI reports now return exit code 2. Exit code 1 remains errors; exit code 3 remains semantic drift. README Exit Codes table row 0 updated to "no errors and no warnings".
23
+
- README test count corrected from 653 to 663.
24
+
- README JSON-stability promise updated from "0.x series" to "v1.x series".
25
+
- README field-test numbers reframed as April 2026 snapshots against `anthropics/skills`, with a note that they will drift as upstream evolves.
26
+
-`action.yml``format` input description clarified: accepted but ignored at runtime; the action always invokes skillcheck with `--format json`.
27
+
- Development extras now include `ruff>=0.6`, `mypy>=1.10`, and `types-PyYAML>=6.0`.
28
+
29
+
### Added
30
+
-`--semantic`: guide-compatible shortcut that enables semantic-adjacent validation. In standalone mode it runs heuristic graph analysis; with ingested agent responses it merges those diagnostics.
31
+
-`--agent-reason`: guide-compatible agent-workflow shortcut. Emits a combined critique and graph prompt packet so the calling agent can run both reasoning steps and feed JSON back through `--ingest-critique` and `--ingest-graph`.
32
+
-`--format md` and `--format agent`: Markdown report output and agent-oriented next-action output.
33
+
-`skillcheck.toml` config loading: top-level defaults for format, thresholds, target agent, strict VS Code mode, skip flags, ignored rule prefixes, graph analysis, semantic mode, history, and agent variants.
34
+
- Experimental `--activation-hypotheses`: generates likely natural-language routing triggers plus a discoverability entropy score. Routing caveat included in every report.
35
+
- Machine-readable diagnostic metadata: JSON diagnostics now include `source` and `confidence` fields.
36
+
- GitHub Action inputs for the v1.0 modes: `semantic`, `analyze-graph`, `ingest-critique`, `critique-agent`, `ingest-graph`, `graph-agent`, `history`, `activation-hypotheses`. The action still always emits JSON internally for PR annotations.
37
+
-`tests/test_v1_completion.py`: covers `--format md`, `--format agent`, `--agent-reason`, `--semantic` graph enabling, `--activation-hypotheses` JSON, `skillcheck.toml` loading, and source/confidence in JSON output.
12
38
13
39
## [1.0.0] - 2026-04-25
14
40
@@ -17,6 +43,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
17
43
- Added `docs/case-study-v1-real-world-runs.md`: full breakdown of the pre-3B field test covering 18 Anthropic skills (symbolic), `mcp-builder` through the full v1.0 pipeline (symbolic + heuristic graph + agent critique + agent graph), and 5 uxuiprinciples skills (strict VS Code mode). Documents three `semantic.contradiction.detected` errors on a skill that passed all symbolic checks, five `graph.capability.orphaned` patterns, and the recurring unknown-field pattern (`license`, `homepage`, `env`) across official catalogs.
-`skills/skillcheck/SKILL.md`: skillcheck's own SKILL.md, validating the tool against itself. Passes symbolic, graph, critique, and history validation with zero errors and zero warnings. Serves as the worked example for the Rules table in the README.
21
48
- Self-host integration test suite (`tests/test_self_host.py`): confirms the bundled SKILL.md passes symbolic validation, all five graph analyzers, critique ingestion, agent graph ingestion with divergence analysis, full CLI pipeline, history round-trip, and description scoring threshold.
22
49
-`scripts/regen_self_host_fixtures.py`: regenerates `tests/fixtures/self_host/graph_clean.json` from the live heuristic graph after skill edits.
@@ -63,7 +69,7 @@ skillcheck skills/ # recursive scan; finds every file named SKILL.md
63
69
skillcheck SKILL.md --format json
64
70
```
65
71
66
-
From the field test on Anthropic's official skills repository (18 skills, `runs/anthropics-corpus/01-symbolic-all.txt`): four of eighteen files failed. `claude-api/SKILL.md` failed with `frontmatter.name.reserved-word` because the name contains the reserved word "claude". `template/SKILL.md` failed with `frontmatter.name.directory-mismatch` (name `template-skill`, directory `template`). Both files look correct on casual inspection.
72
+
From the field test on Anthropic's official skills repository (18 skills, `runs/anthropics-corpus/01-symbolic-all.txt`, snapshot taken during v1.0 release prep in April 2026): four of eighteen files failed. `claude-api/SKILL.md` failed with `frontmatter.name.reserved-word` because the name contains the reserved word "claude". `template/SKILL.md` failed with `frontmatter.name.directory-mismatch` (name `template-skill`, directory `template`). Both files look correct on casual inspection.
67
73
68
74
### Heuristic Graph
69
75
@@ -86,7 +92,7 @@ From the field test on `mcp-builder/SKILL.md` (`runs/anthropics-mcp-builder/02-g
86
92
has no declared inputs or outputs.
87
93
```
88
94
89
-
Thirteen of fourteen capability headings in that skill had no declared I/O. That is a signal the skill relies entirely on implicit context rather than declared contracts.
95
+
Thirteen of fourteen capability headings in that skill had no declared I/O at the time of the field test. That is a signal the skill relies entirely on implicit context rather than declared contracts. Numbers reflect a snapshot of `anthropics/skills` from April 2026 and will drift as upstream evolves; rerun against the current repo to see fresh counts.
`--critique-agent` selects a framing variant tuned for each platform (claude, codex, cursor). The schema and exit codes are identical across all variants.
The JSON schema is stable. It will not change in a backward-incompatible way within the 0.x series.
228
+
Each diagnostic includes `source` and `confidence` fields in JSON output. `source` is one of `spec`, `advisory`, `heuristic`, `agent`, or `history`; `confidence` is `high`, `medium`, or `low`.
229
+
230
+
The JSON schema is stable. It will not change in a backward-incompatible way within the v1.x series.
222
231
223
232
## Options
224
233
225
234
| Flag | Default | Description |
226
235
|---|---|---|
227
-
| `--format {text,json}` | `text` | Output format |
236
+
| `--format {text,json,md,agent}` | `text` | Output format |
| `3` | Symbolic passed but ingested critique found semantic errors | `skillcheck SKILL.md --ingest-critique response.json` when the agent reported contradictions |
258
271
259
272
Exit code 1 takes priority over 3 when symbolic errors also exist.
@@ -307,7 +320,7 @@ Source tags: `spec` rules derive from the agentskills.io specification or agent-
307
320
308
321
## Case Study
309
322
310
-
We ran skillcheck against three corpora: Anthropic's official skills repository (18 skills), the `mcp-builder` skill through the full v1.0 pipeline, and five skills from the uxuiprinciples/agent-skills collection. Full run artifacts: `runs/anthropics-corpus/`, `runs/anthropics-mcp-builder/`, `runs/uxuiprinciples-corpus/`.
323
+
We ran skillcheck against three corpora during v1.0 release prep (April 2026 snapshots): Anthropic's official skills repository (18 skills), the `mcp-builder` skill through the full v1.0 pipeline, and five skills from the uxuiprinciples/agent-skills collection. Full run artifacts: `runs/anthropics-corpus/`, `runs/anthropics-mcp-builder/`, `runs/uxuiprinciples-corpus/`.
311
324
312
325
The symbolic run of the Anthropic corpus returned four failures from eighteen files (exit 1). All four files look correct on review: two had second-person voice in the description, one used "claude" as part of the name (reserved word per spec), and the template skill had a name/directory mismatch. The deeper finding came from running `mcp-builder` through the critique pipeline: the symbolic run passed (exit 0), but the ingested agent critique returned exit 3 with three `semantic.contradiction.detected` errors. The skill's frontmatter offers Python and TypeScript as equal options; its body unconditionally recommends TypeScript in Phase 1.3. That inconsistency means any agent following the Python path hits an unresolved decision point. No static linter catches it. See [docs/case-study-v1-real-world-runs.md](docs/case-study-v1-real-world-runs.md) for the full breakdown.
313
326
@@ -334,7 +347,7 @@ pip install -e ".[dev]"
334
347
python3 -m pytest tests/ -q
335
348
```
336
349
337
-
653 tests cover all rule modules, CLI exit codes, graph analyzers, divergence detection, critique parsing, history round-trips, and the full self-host pipeline against `skills/skillcheck/SKILL.md`. Fixtures are in `tests/fixtures/`; every rule has at least one positive and one negative test case.
350
+
664 tests cover all rule modules, CLI exit codes, graph analyzers, divergence detection, critique parsing, history round-trips, and the full self-host pipeline against `skills/skillcheck/SKILL.md`. Fixtures are in `tests/fixtures/`; every rule has at least one positive and one negative test case. `tests/test_readme_test_count_claim.py` asserts this count matches `pytest --collect-only`, so any future suite change has to update the number in the same commit or CI fails.
338
351
339
352
## Maintainer Notes
340
353
@@ -346,6 +359,14 @@ make regen-self-host-fixtures
346
359
347
360
This runs `scripts/regen_self_host_fixtures.py`, which extracts a fresh heuristic graph and writes it to `tests/fixtures/self_host/graph_clean.json`.
348
361
362
+
To summarize a batch of skillcheck JSON outputs across many repos (the layout the field-test runs use, with one directory per repo, one subdirectory per skill, and `01-symbolic.json` / `02-strict-vscode.json` / `03-graph-analyze.json` / `04-graph-extracted.json` / `08-critique-report.json` / `09-graph-agent-report.json` / `10-full-pipeline.json` per skill), run:
It writes `summary.csv` and `findings.md` next to the batch directory. The script is intended for benchmark and field-test workflows; it is not part of the CLI surface and is not exposed as a console script.
369
+
349
370
To add a new rule: implement `def check_something(skill: ParsedSkill) -> list[Diagnostic]` in the appropriate module under `src/skillcheck/rules/`, register it in `src/skillcheck/rules/__init__.py`, add at least one positive and one negative fixture, and add a row to the Rules table above. Full conventions are in [`.github/CLAUDE.md`](.github/CLAUDE.md).
skillcheck v1.0.1 commits a batch of post-v1.0.0 implementation work that had been sitting uncommitted, ships the docs corrections an end-to-end verification surfaced, and aligns the README, CHANGELOG, and CLI surface so they describe the same release.
4
+
5
+
There is one behavior change relative to v1.0.0: warning-only runs now return exit code 2. Errors return 1; semantic drift returns 3. CI consumers that previously relied on warning-only exiting 0 must update.
- README Exit Codes table row 0 now reads "no errors and no warnings".
11
+
- README test count corrected from 653 to 663.
12
+
- README JSON-stability promise updated from "0.x series" to "v1.x series".
13
+
- README field-test numbers reframed as April 2026 snapshots against `anthropics/skills`, with a note that they will drift as upstream evolves.
14
+
-`action.yml``format` input description clarified: accepted but ignored at runtime; the action always invokes skillcheck with `--format json` so it can parse diagnostics for PR annotations and the step summary.
15
+
- Development extras now include `ruff>=0.6`, `mypy>=1.10`, and `types-PyYAML>=6.0`.
16
+
17
+
## Added
18
+
19
+
-`--semantic`: guide-compatible shortcut that enables semantic-adjacent validation. In standalone mode it runs heuristic graph analysis; with ingested agent responses it merges those diagnostics.
20
+
-`--agent-reason`: guide-compatible agent-workflow shortcut. Emits a combined critique and graph prompt packet so the calling agent can run both reasoning steps and feed JSON back through `--ingest-critique` and `--ingest-graph`.
21
+
-`--format md` and `--format agent`: Markdown report output and agent-oriented next-action output.
22
+
-`skillcheck.toml` config loading: top-level defaults for format, thresholds, target agent, strict VS Code mode, skip flags, ignored rule prefixes, graph analysis, semantic mode, history, and agent variants. CLI flags always win; the loader fills unset values.
23
+
- Experimental `--activation-hypotheses`: generates likely natural-language routing triggers plus a discoverability entropy score. Routing caveat included in every report.
24
+
- Machine-readable diagnostic metadata: JSON diagnostics now include `source` and `confidence` fields.
25
+
- GitHub Action inputs for the v1.0 modes: `semantic`, `analyze-graph`, `ingest-critique`, `critique-agent`, `ingest-graph`, `graph-agent`, `history`, `activation-hypotheses`. The action still always emits JSON internally for PR annotations.
26
+
27
+
## Why this is a patch and not a minor
28
+
29
+
Every addition above either documents existing behavior, refines a flag, or is gated behind a new opt-in flag. There is one breaking-ish change: warning-only runs now exit 2 instead of 0. Strict semver would call that a minor bump. The judgment call here: v1.0.0 shipped with documentation that already implied the v2-style exit codes (and the v1.0.1 README makes it explicit), the prior "warnings exit 0" behavior was undocumented in the released README, and the change matches what users running this in CI would expect. If your CI pipeline depended on the old behavior, pin to `@v1.0.0` rather than `@v1` until you can update.
# exit 0 with no errors and no warnings (only INFO diagnostics)
41
+
```
42
+
43
+
End-to-end verification was run against `anthropics/skills` at commit `5128e186` (18 SKILL.md files). All 26 documented flags exercised; all four exit codes (0, 1, 2, 3) reproduced; the action entrypoint produced byte-identical JSON to the CLI. Full report: see the v1.0.1 verification artifacts.
44
+
45
+
## Links
46
+
47
+
- PyPI: https://pypi.org/project/skillcheck/1.0.1/ (available after publish)
0 commit comments