You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+23-2Lines changed: 23 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,12 +7,33 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
8
8
## [Unreleased]
9
9
10
+
## [1.1.0] - 2026-04-28
11
+
12
+
External audit against v1.0.1 surfaced eight repo defects ranging from documentation drift to a CI-confusing exit-code conflation. v1.1.0 ships fixes for all eight, reverses one v1.0.1 behavior change that turned out wrong, and tightens the description scorer's vague-word rubric. The minor bump is driven by the exit-code semantics change (now distinguishes warning-only from input error) and the new `--warnings-as-errors` flag.
13
+
14
+
### Behavior change
15
+
16
+
- Warning-only CLI reports now return exit code 0 by default, reversing v1.0.1's "warnings exit 2" decision. Exit code 2 is now reserved for tool-misuse / input errors (missing path, conflicting flags, empty directory) so CI consumers can distinguish them. Pass `--warnings-as-errors` to escalate warning-only runs to exit code 1 for stricter gates. Errors remain 1; semantic drift remains 3.
17
+
10
18
### Added
19
+
20
+
-`--warnings-as-errors` flag: escalate warning-only runs to exit 1 for CI configurations that want warnings to block.
11
21
-`scripts/summarize_batch.py` and `tests/test_batch15_summarize.py`: maintainer-facing tool that consumes a directory of skillcheck batch-run artifacts (one directory per repo, one subdirectory per skill, paired `*.json` / `*.txt` reports per phase) and writes `summary.csv` plus `findings.md`. Invoked as `python scripts/summarize_batch.py <batch_dir>`. Not exposed as a console script, not wired into the GitHub Action; the action runs skillcheck against one path, this consumes outputs across many. Documented under Maintainer Notes in the README.
12
22
-`tests/test_readme_test_count_claim.py`: parses the README's "N tests cover ..." sentence and asserts it matches `pytest --collect-only`. The next time the suite grows without bumping the README number, CI fails. Closes the recurring drift pattern that v1.0.1 had to correct twice.
13
23
14
24
### Changed
15
-
- README test count bumped from 663 to 664 to include the new drift-guard test.
25
+
26
+
-`action.yml` install step pins `skillcheck>=1.0.1` so consumers fail loudly on unpublished v1 features instead of silently running v0.2.0.
27
+
- Description scorer rubric documented and tightened: dropped `comprehensive`, `robust`, and `flexible` from `_VAGUE_WORDS` because each can describe a concrete attribute when qualified ("comprehensive coverage of N file formats", "robust against malformed input"). The inclusion rubric is now documented inline. Verified against `anthropics/skills` (17 SKILL.md files): zero score changes, because none of those skills use the dropped words. The rubric edit is a no-op against the current corpus; the new regression tests are forward-looking guards against scoring drift if the list is ever re-expanded.
28
+
- Description scorer verb matching: collapsed `_ACTION_VERBS` from 86 entries (base + 3rd-person duplicates) to 42 base forms. Added `_is_action_verb()` to handle stem normalization across `-s`, `-es`, and `-ies` endings. Adding a new verb now only requires the base form.
29
+
- README test count bumped from 663 to 667 to include the drift-guard test, two description-scorer regression tests, and the `--warnings-as-errors` test.
30
+
- README field-test citations: replaced seven gitignored `runs/...` path references with the exact `skillcheck` commands needed to reproduce each finding. Readers can now verify the claims without access to private artifacts.
31
+
- README exit-code table reflects the new semantics; flag table documents `--warnings-as-errors`.
32
+
33
+
### Removed
34
+
35
+
- Top-level `git-commit-crafter` SKILL.md from the repo root. It was unrelated to skillcheck and confused first-time readers; the canonical example lives at `skills/skillcheck/SKILL.md`.
36
+
- False `@v0` tag claim from the README. Only `@v0.2.0` was ever pushed; the action-install snippet no longer suggests a tag that does not exist. CHANGELOG entries that referenced `@v0` corrected to `@v0.2.0`.
-**GitHub Action**: composite action (`moonrunnerkc/skillcheck@v0`) with PR annotations, job summary table, and JSON output. All CLI flags exposed as action inputs. Three lines of YAML to add to any CI pipeline.
95
+
-**GitHub Action**: composite action (`moonrunnerkc/skillcheck@v0.2.0`) with PR annotations, job summary table, and JSON output. All CLI flags exposed as action inputs. Three lines of YAML to add to any CI pipeline.
75
96
-**`__main__.py` entry point**: `python -m skillcheck` now works as an alternative to the console script.
76
97
-**File reference validation**: parses markdown body for `[text](path)`, ``, and `source:`/`file:`/`include:` directives; verifies referenced files exist on disk; warns when references exceed one directory level from SKILL.md.
77
98
-**Progressive disclosure budget**: three-tier token budgeting: metadata/frontmatter at ~100 tokens, body at <5,000 tokens, resources loaded on demand. Flags oversized code blocks (>50 lines), large tables (>20 rows), and embedded base64.
Copy file name to clipboardExpand all lines: README.md
+13-12Lines changed: 13 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -69,7 +69,7 @@ skillcheck skills/ # recursive scan; finds every file named SKILL.md
69
69
skillcheck SKILL.md --format json
70
70
```
71
71
72
-
From the field test on Anthropic's official skills repository (18 skills, `runs/anthropics-corpus/01-symbolic-all.txt`, snapshot taken during v1.0 release prep in April 2026): four of eighteen files failed. `claude-api/SKILL.md` failed with `frontmatter.name.reserved-word` because the name contains the reserved word "claude". `template/SKILL.md` failed with `frontmatter.name.directory-mismatch` (name `template-skill`, directory `template`). Both files look correct on casual inspection.
72
+
From the field test on Anthropic's official skills repository (18 skills, snapshot taken during v1.0 release prep in April 2026): four of eighteen files failed. `claude-api/SKILL.md` failed with `frontmatter.name.reserved-word` because the name contains the reserved word "claude". `template/SKILL.md` failed with `frontmatter.name.directory-mismatch` (name `template-skill`, directory `template`). Both files look correct on casual inspection. Reproduce: clone `anthropics/skills` and run `skillcheck skills/ --format text`.
Graph nodes: `Capability` (section headings), `Input` (backtick references required by capabilities), `Output` (backtick references produced by capabilities). Analyzers fire on orphaned capabilities with no declared I/O, unused inputs, unproduced outputs, capabilities with no description body, and `allowed-tools` entries not backtick-referenced in the body.
85
85
86
-
From the field test on `mcp-builder/SKILL.md` (`runs/anthropics-mcp-builder/02-graph-analyze.txt`):
86
+
From the field test on `mcp-builder/SKILL.md` (reproduce: `skillcheck skills/mcp-builder/SKILL.md --analyze-graph`):
87
87
88
88
```
89
89
line 18 ⚠ warning graph.capability.orphaned Capability 'Understand Modern MCP Design'
`--critique-agent` selects a framing variant tuned for each platform (claude, codex, cursor). The schema and exit codes are identical across all variants.
111
111
112
-
From the field test (`runs/anthropics-mcp-builder/04-critique-report.txt`): the symbolic run on `mcp-builder/SKILL.md` passed (exit 0), but the ingested critique returned exit 3 with three `semantic.contradiction.detected` errors. One:
112
+
From the field test on `mcp-builder/SKILL.md`: the symbolic run passed (exit 0), but the ingested critique returned exit 3 with three `semantic.contradiction.detected` errors. One:
113
113
114
114
```
115
115
✗ error semantic.contradiction.detected Contradiction between 'Frontmatter
When `--history` is active and the current run fails on content that matched a prior passing run, skillcheck emits `history.skill.regressed` (WARNING). This surfaces rule tightening or new agent findings without requiring manual output comparison.
151
151
152
-
From the field test (`runs/anthropics-mcp-builder/08-history.txt`):
152
+
From the field test (reproduce: `skillcheck skills/mcp-builder/SKILL.md --history && skillcheck skills/mcp-builder/SKILL.md --show-history`):
153
153
154
154
```
155
155
History ledger: SKILL.md
@@ -172,7 +172,7 @@ Three lines to add skillcheck to any CI pipeline:
172
172
path: skills/
173
173
```
174
174
175
-
Pin to `@v1` for the latest patch within the v1.0 major-version line, or `@v1.0.0` for an immutable release. The `@v0` tag remains in place for existing CI configurations.
175
+
Pin to `@v1` for the latest patch within the v1.0 major-version line, or `@v1.0.0` for an immutable release.
176
176
177
177
Failures block the PR. Errors and warnings appear as inline diff annotations on the changed files. The workflow run page gets a Markdown summary table. For the complete list of action inputs and outputs, see [`action.yml`](action.yml).
178
178
@@ -188,7 +188,7 @@ The v1.0 graph and critique modes are available as action inputs. Example with s
188
188
189
189
## Output
190
190
191
-
Text output (default), excerpt from `runs/anthropics-corpus/01-symbolic-all.txt`:
191
+
Text output (default), excerpt from a run against the Anthropic skills corpus:
192
192
193
193
```
194
194
✗ FAIL skills/claude-api/SKILL.md
@@ -245,6 +245,7 @@ The JSON schema is stable. It will not change in a backward-incompatible way wit
245
245
| `--min-desc-score N` | | Minimum description quality score (0-100); below this triggers a warning |
246
246
| `--target-agent {claude,vscode,all}` | `all` | Scope compatibility checks to a specific agent |
247
247
| `--strict-vscode` | `false` | Promote VS Code compatibility issues to errors |
248
+
| `--warnings-as-errors` | `false` | Escalate warning-only runs to exit code 1 (default for warning-only is 0) |
| `3` | Symbolic passed but ingested critique found semantic errors | `skillcheck SKILL.md --ingest-critique response.json` when the agent reported contradictions |
271
272
272
-
Exit code 1 takes priority over 3 when symbolic errors also exist.
273
+
Pass `--warnings-as-errors` to escalate warning-only runs to exit 1 for stricter CI gates. Exit code 1 takes priority over 3 when symbolic errors also exist; code 2 is reserved for tool-misuse cases so CI can distinguish them from skill-content findings.
273
274
274
275
## Rules
275
276
@@ -320,7 +321,7 @@ Source tags: `spec` rules derive from the agentskills.io specification or agent-
320
321
321
322
## Case Study
322
323
323
-
We ran skillcheck against three corpora during v1.0 release prep (April 2026 snapshots): Anthropic's official skills repository (18 skills), the `mcp-builder` skill through the full v1.0 pipeline, and five skills from the uxuiprinciples/agent-skills collection. Full run artifacts: `runs/anthropics-corpus/`, `runs/anthropics-mcp-builder/`, `runs/uxuiprinciples-corpus/`.
324
+
We ran skillcheck against three corpora during v1.0 release prep (April 2026 snapshots): Anthropic's official skills repository (18 skills), the `mcp-builder` skill through the full v1.0 pipeline, and five skills from the uxuiprinciples/agent-skills collection. To reproduce, clone each upstream repo and run `skillcheck <path>` (the case study below records the exact invocations).
324
325
325
326
The symbolic run of the Anthropic corpus returned four failures from eighteen files (exit 1). All four files look correct on review: two had second-person voice in the description, one used "claude" as part of the name (reserved word per spec), and the template skill had a name/directory mismatch. The deeper finding came from running `mcp-builder` through the critique pipeline: the symbolic run passed (exit 0), but the ingested agent critique returned exit 3 with three `semantic.contradiction.detected` errors. The skill's frontmatter offers Python and TypeScript as equal options; its body unconditionally recommends TypeScript in Phase 1.3. That inconsistency means any agent following the Python path hits an unresolved decision point. No static linter catches it. See [docs/case-study-v1-real-world-runs.md](docs/case-study-v1-real-world-runs.md) for the full breakdown.
326
327
@@ -347,7 +348,7 @@ pip install -e ".[dev]"
347
348
python3 -m pytest tests/ -q
348
349
```
349
350
350
-
664 tests cover all rule modules, CLI exit codes, graph analyzers, divergence detection, critique parsing, history round-trips, and the full self-host pipeline against `skills/skillcheck/SKILL.md`. Fixtures are in `tests/fixtures/`; every rule has at least one positive and one negative test case. `tests/test_readme_test_count_claim.py` asserts this count matches `pytest --collect-only`, so any future suite change has to update the number in the same commit or CI fails.
351
+
667 tests cover all rule modules, CLI exit codes, graph analyzers, divergence detection, critique parsing, history round-trips, and the full self-host pipeline against `skills/skillcheck/SKILL.md`. Fixtures are in `tests/fixtures/`; every rule has at least one positive and one negative test case. `tests/test_readme_test_count_claim.py` asserts this count matches `pytest --collect-only`, so any future suite change has to update the number in the same commit or CI fails.
An external audit against v1.0.1 surfaced eight repo defects: an unpinned GitHub Action install, gitignored evidence paths cited in the README, a top-level SKILL.md describing an unrelated skill, a missing `@v0` tag the README claimed existed, exit-code 2 conflating tool-misuse with warning-only reports, an oversized `cli.py`, and a vague-word list that flagged context-dependent terms like "comprehensive". v1.1.0 fixes all of them and reverses one v1.0.1 behavior change that turned out wrong.
4
+
5
+
## Behavior change
6
+
7
+
Warning-only runs now return exit code **0** by default. v1.0.1 made them return 2; that conflated valid runs that produced warnings with tool-misuse cases (missing path, conflicting flags, empty directory). CI consumers couldn't tell the difference. v1.1.0 splits them: warnings exit 0, input errors exit 2, errors stay at 1, semantic drift stays at 3. The new `--warnings-as-errors` flag escalates warning-only runs to exit 1 for pipelines that want warnings to block.
8
+
9
+
If your CI relied on v1.0.1's "warnings exit 2" behavior, add `--warnings-as-errors` to your skillcheck invocation, or pin to `@v1.0.1` until you can update.
10
+
11
+
## Added
12
+
13
+
-`--warnings-as-errors` flag.
14
+
- Two regression tests guarding the description-scorer rubric.
15
+
16
+
## Changed
17
+
18
+
-`action.yml` install step pins `skillcheck>=1.0.1`. Until v1.1.0 is uploaded to PyPI, this fails loudly on unpublished v1 features rather than silently resolving to v0.2.0.
19
+
- Description scorer no longer penalizes `comprehensive`, `robust`, or `flexible` in descriptions. Each can describe a concrete attribute when qualified ("comprehensive coverage of N file formats", "robust against malformed input"). The inclusion rubric is now documented inline. Verified against `anthropics/skills`: zero score changes across 17 files, because none of those skills use the dropped words. The rubric edit is a no-op against the current corpus; the two new regression tests are forward-looking guards, not regression evidence.
20
+
- Description scorer verb matching collapsed from 86 entries (base + 3rd-person duplicates) to 42 base forms with stem normalization. Adding a new verb now only requires the base form.
- README exit-code table documents the new semantics; flag table documents `--warnings-as-errors`.
23
+
- README test count: 663 → 667.
24
+
25
+
## Removed
26
+
27
+
- Top-level `git-commit-crafter` SKILL.md from the repo root.
28
+
- False `@v0` tag claim from the README and CHANGELOG.
29
+
30
+
## Why this is a minor and not a patch
31
+
32
+
The exit-code semantics change is observable in CI and not opt-in. Adding `--warnings-as-errors` is also a public-surface addition. Either alone would be a minor bump under semver; together they aren't a patch.
33
+
34
+
## Audit items not closed
35
+
36
+
-**PyPI publish**: the v1.1.0 sdist and wheel are built and pass `twine check`, but PyPI upload requires authenticated credentials and happens out-of-band. Until that runs, `pip install skillcheck` continues to ship v0.2.0. The pinned action install will refuse to run.
37
+
-**`cli.py` line count**: the audit asked for a refactor toward `main()` under 100 lines and `cli.py` under 700. An attempted helper extraction met the `main()` target but pushed total file size from 1127 to 1172. The refactor was reverted; the file remains at its pre-audit size, with the audit's "deliberate choice" path left open for a follow-up.
0 commit comments