Skip to content

Commit 44620cc

Browse files
authored
Merge pull request #5 from moonrunnerkc/v1-phase3d-release-prep
v1.1.0 release prep
2 parents c2734ad + 4be2d76 commit 44620cc

15 files changed

Lines changed: 171 additions & 109 deletions

CHANGELOG.md

Lines changed: 23 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,12 +7,33 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [1.1.0] - 2026-04-28
11+
12+
External audit against v1.0.1 surfaced eight repo defects ranging from documentation drift to a CI-confusing exit-code conflation. v1.1.0 ships fixes for all eight, reverses one v1.0.1 behavior change that turned out wrong, and tightens the description scorer's vague-word rubric. The minor bump is driven by the exit-code semantics change (now distinguishes warning-only from input error) and the new `--warnings-as-errors` flag.
13+
14+
### Behavior change
15+
16+
- Warning-only CLI reports now return exit code 0 by default, reversing v1.0.1's "warnings exit 2" decision. Exit code 2 is now reserved for tool-misuse / input errors (missing path, conflicting flags, empty directory) so CI consumers can distinguish them. Pass `--warnings-as-errors` to escalate warning-only runs to exit code 1 for stricter gates. Errors remain 1; semantic drift remains 3.
17+
1018
### Added
19+
20+
- `--warnings-as-errors` flag: escalate warning-only runs to exit 1 for CI configurations that want warnings to block.
1121
- `scripts/summarize_batch.py` and `tests/test_batch15_summarize.py`: maintainer-facing tool that consumes a directory of skillcheck batch-run artifacts (one directory per repo, one subdirectory per skill, paired `*.json` / `*.txt` reports per phase) and writes `summary.csv` plus `findings.md`. Invoked as `python scripts/summarize_batch.py <batch_dir>`. Not exposed as a console script, not wired into the GitHub Action; the action runs skillcheck against one path, this consumes outputs across many. Documented under Maintainer Notes in the README.
1222
- `tests/test_readme_test_count_claim.py`: parses the README's "N tests cover ..." sentence and asserts it matches `pytest --collect-only`. The next time the suite grows without bumping the README number, CI fails. Closes the recurring drift pattern that v1.0.1 had to correct twice.
1323

1424
### Changed
15-
- README test count bumped from 663 to 664 to include the new drift-guard test.
25+
26+
- `action.yml` install step pins `skillcheck>=1.0.1` so consumers fail loudly on unpublished v1 features instead of silently running v0.2.0.
27+
- Description scorer rubric documented and tightened: dropped `comprehensive`, `robust`, and `flexible` from `_VAGUE_WORDS` because each can describe a concrete attribute when qualified ("comprehensive coverage of N file formats", "robust against malformed input"). The inclusion rubric is now documented inline. Verified against `anthropics/skills` (17 SKILL.md files): zero score changes, because none of those skills use the dropped words. The rubric edit is a no-op against the current corpus; the new regression tests are forward-looking guards against scoring drift if the list is ever re-expanded.
28+
- Description scorer verb matching: collapsed `_ACTION_VERBS` from 86 entries (base + 3rd-person duplicates) to 42 base forms. Added `_is_action_verb()` to handle stem normalization across `-s`, `-es`, and `-ies` endings. Adding a new verb now only requires the base form.
29+
- README test count bumped from 663 to 667 to include the drift-guard test, two description-scorer regression tests, and the `--warnings-as-errors` test.
30+
- README field-test citations: replaced seven gitignored `runs/...` path references with the exact `skillcheck` commands needed to reproduce each finding. Readers can now verify the claims without access to private artifacts.
31+
- README exit-code table reflects the new semantics; flag table documents `--warnings-as-errors`.
32+
33+
### Removed
34+
35+
- Top-level `git-commit-crafter` SKILL.md from the repo root. It was unrelated to skillcheck and confused first-time readers; the canonical example lives at `skills/skillcheck/SKILL.md`.
36+
- False `@v0` tag claim from the README. Only `@v0.2.0` was ever pushed; the action-install snippet no longer suggests a tag that does not exist. CHANGELOG entries that referenced `@v0` corrected to `@v0.2.0`.
1637

1738
## [1.0.1] - 2026-04-28
1839

@@ -71,7 +92,7 @@ End-to-end verification against `anthropics/skills` surfaced documentation drift
7192
## [0.2.0] - 2026-03-11
7293

7394
### Added
74-
- **GitHub Action**: composite action (`moonrunnerkc/skillcheck@v0`) with PR annotations, job summary table, and JSON output. All CLI flags exposed as action inputs. Three lines of YAML to add to any CI pipeline.
95+
- **GitHub Action**: composite action (`moonrunnerkc/skillcheck@v0.2.0`) with PR annotations, job summary table, and JSON output. All CLI flags exposed as action inputs. Three lines of YAML to add to any CI pipeline.
7596
- **`__main__.py` entry point**: `python -m skillcheck` now works as an alternative to the console script.
7697
- **File reference validation**: parses markdown body for `[text](path)`, `![alt](path)`, and `source:`/`file:`/`include:` directives; verifies referenced files exist on disk; warns when references exceed one directory level from SKILL.md.
7798
- **Progressive disclosure budget**: three-tier token budgeting: metadata/frontmatter at ~100 tokens, body at <5,000 tokens, resources loaded on demand. Flags oversized code blocks (>50 lines), large tables (>20 rows), and embedded base64.

README.md

Lines changed: 13 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ skillcheck skills/ # recursive scan; finds every file named SKILL.md
6969
skillcheck SKILL.md --format json
7070
```
7171

72-
From the field test on Anthropic's official skills repository (18 skills, `runs/anthropics-corpus/01-symbolic-all.txt`, snapshot taken during v1.0 release prep in April 2026): four of eighteen files failed. `claude-api/SKILL.md` failed with `frontmatter.name.reserved-word` because the name contains the reserved word "claude". `template/SKILL.md` failed with `frontmatter.name.directory-mismatch` (name `template-skill`, directory `template`). Both files look correct on casual inspection.
72+
From the field test on Anthropic's official skills repository (18 skills, snapshot taken during v1.0 release prep in April 2026): four of eighteen files failed. `claude-api/SKILL.md` failed with `frontmatter.name.reserved-word` because the name contains the reserved word "claude". `template/SKILL.md` failed with `frontmatter.name.directory-mismatch` (name `template-skill`, directory `template`). Both files look correct on casual inspection. Reproduce: clone `anthropics/skills` and run `skillcheck skills/ --format text`.
7373

7474
### Heuristic Graph
7575

@@ -83,7 +83,7 @@ skillcheck SKILL.md --emit-graph --format json
8383

8484
Graph nodes: `Capability` (section headings), `Input` (backtick references required by capabilities), `Output` (backtick references produced by capabilities). Analyzers fire on orphaned capabilities with no declared I/O, unused inputs, unproduced outputs, capabilities with no description body, and `allowed-tools` entries not backtick-referenced in the body.
8585

86-
From the field test on `mcp-builder/SKILL.md` (`runs/anthropics-mcp-builder/02-graph-analyze.txt`):
86+
From the field test on `mcp-builder/SKILL.md` (reproduce: `skillcheck skills/mcp-builder/SKILL.md --analyze-graph`):
8787

8888
```
8989
line 18 ⚠ warning graph.capability.orphaned Capability 'Understand Modern MCP Design'
@@ -109,7 +109,7 @@ skillcheck SKILL.md --agent-reason --format agent # critique + graph pro
109109

110110
`--critique-agent` selects a framing variant tuned for each platform (claude, codex, cursor). The schema and exit codes are identical across all variants.
111111

112-
From the field test (`runs/anthropics-mcp-builder/04-critique-report.txt`): the symbolic run on `mcp-builder/SKILL.md` passed (exit 0), but the ingested critique returned exit 3 with three `semantic.contradiction.detected` errors. One:
112+
From the field test on `mcp-builder/SKILL.md`: the symbolic run passed (exit 0), but the ingested critique returned exit 3 with three `semantic.contradiction.detected` errors. One:
113113

114114
```
115115
✗ error semantic.contradiction.detected Contradiction between 'Frontmatter
@@ -149,7 +149,7 @@ skillcheck SKILL.md --show-history --format json
149149

150150
When `--history` is active and the current run fails on content that matched a prior passing run, skillcheck emits `history.skill.regressed` (WARNING). This surfaces rule tightening or new agent findings without requiring manual output comparison.
151151

152-
From the field test (`runs/anthropics-mcp-builder/08-history.txt`):
152+
From the field test (reproduce: `skillcheck skills/mcp-builder/SKILL.md --history && skillcheck skills/mcp-builder/SKILL.md --show-history`):
153153

154154
```
155155
History ledger: SKILL.md
@@ -172,7 +172,7 @@ Three lines to add skillcheck to any CI pipeline:
172172
path: skills/
173173
```
174174
175-
Pin to `@v1` for the latest patch within the v1.0 major-version line, or `@v1.0.0` for an immutable release. The `@v0` tag remains in place for existing CI configurations.
175+
Pin to `@v1` for the latest patch within the v1.0 major-version line, or `@v1.0.0` for an immutable release.
176176

177177
Failures block the PR. Errors and warnings appear as inline diff annotations on the changed files. The workflow run page gets a Markdown summary table. For the complete list of action inputs and outputs, see [`action.yml`](action.yml).
178178

@@ -188,7 +188,7 @@ The v1.0 graph and critique modes are available as action inputs. Example with s
188188

189189
## Output
190190

191-
Text output (default), excerpt from `runs/anthropics-corpus/01-symbolic-all.txt`:
191+
Text output (default), excerpt from a run against the Anthropic skills corpus:
192192

193193
```
194194
✗ FAIL skills/claude-api/SKILL.md
@@ -245,6 +245,7 @@ The JSON schema is stable. It will not change in a backward-incompatible way wit
245245
| `--min-desc-score N` | | Minimum description quality score (0-100); below this triggers a warning |
246246
| `--target-agent {claude,vscode,all}` | `all` | Scope compatibility checks to a specific agent |
247247
| `--strict-vscode` | `false` | Promote VS Code compatibility issues to errors |
248+
| `--warnings-as-errors` | `false` | Escalate warning-only runs to exit code 1 (default for warning-only is 0) |
248249
| `--semantic` | `false` | Enable semantic-adjacent validation; standalone mode runs heuristic graph analysis |
249250
| `--agent-reason` | `false` | Emit a combined critique + graph prompt packet for the calling agent |
250251
| `--emit-critique-prompt` | `false` | Print agent self-critique prompt to stdout and exit 0 |
@@ -264,12 +265,12 @@ The JSON schema is stable. It will not change in a backward-incompatible way wit
264265

265266
| Code | Meaning | Example invocation |
266267
|---|---|---|
267-
| `0` | No errors and no warnings | `skillcheck skills/skillcheck/SKILL.md` |
268-
| `1` | One or more errors found | `skillcheck SKILL.md` when the name is invalid |
269-
| `2` | Warning-only report or input error | `skillcheck SKILL.md --max-lines 1` |
268+
| `0` | No errors (warning-only counts as a clean pass by default) | `skillcheck skills/skillcheck/SKILL.md` |
269+
| `1` | One or more errors found, or warnings with `--warnings-as-errors` | `skillcheck SKILL.md` when the name is invalid |
270+
| `2` | Input error: missing path, empty directory, conflicting flags, malformed argument | `skillcheck nonexistent.md` |
270271
| `3` | Symbolic passed but ingested critique found semantic errors | `skillcheck SKILL.md --ingest-critique response.json` when the agent reported contradictions |
271272

272-
Exit code 1 takes priority over 3 when symbolic errors also exist.
273+
Pass `--warnings-as-errors` to escalate warning-only runs to exit 1 for stricter CI gates. Exit code 1 takes priority over 3 when symbolic errors also exist; code 2 is reserved for tool-misuse cases so CI can distinguish them from skill-content findings.
273274

274275
## Rules
275276

@@ -320,7 +321,7 @@ Source tags: `spec` rules derive from the agentskills.io specification or agent-
320321

321322
## Case Study
322323

323-
We ran skillcheck against three corpora during v1.0 release prep (April 2026 snapshots): Anthropic's official skills repository (18 skills), the `mcp-builder` skill through the full v1.0 pipeline, and five skills from the uxuiprinciples/agent-skills collection. Full run artifacts: `runs/anthropics-corpus/`, `runs/anthropics-mcp-builder/`, `runs/uxuiprinciples-corpus/`.
324+
We ran skillcheck against three corpora during v1.0 release prep (April 2026 snapshots): Anthropic's official skills repository (18 skills), the `mcp-builder` skill through the full v1.0 pipeline, and five skills from the uxuiprinciples/agent-skills collection. To reproduce, clone each upstream repo and run `skillcheck <path>` (the case study below records the exact invocations).
324325

325326
The symbolic run of the Anthropic corpus returned four failures from eighteen files (exit 1). All four files look correct on review: two had second-person voice in the description, one used "claude" as part of the name (reserved word per spec), and the template skill had a name/directory mismatch. The deeper finding came from running `mcp-builder` through the critique pipeline: the symbolic run passed (exit 0), but the ingested agent critique returned exit 3 with three `semantic.contradiction.detected` errors. The skill's frontmatter offers Python and TypeScript as equal options; its body unconditionally recommends TypeScript in Phase 1.3. That inconsistency means any agent following the Python path hits an unresolved decision point. No static linter catches it. See [docs/case-study-v1-real-world-runs.md](docs/case-study-v1-real-world-runs.md) for the full breakdown.
326327

@@ -347,7 +348,7 @@ pip install -e ".[dev]"
347348
python3 -m pytest tests/ -q
348349
```
349350

350-
664 tests cover all rule modules, CLI exit codes, graph analyzers, divergence detection, critique parsing, history round-trips, and the full self-host pipeline against `skills/skillcheck/SKILL.md`. Fixtures are in `tests/fixtures/`; every rule has at least one positive and one negative test case. `tests/test_readme_test_count_claim.py` asserts this count matches `pytest --collect-only`, so any future suite change has to update the number in the same commit or CI fails.
351+
667 tests cover all rule modules, CLI exit codes, graph analyzers, divergence detection, critique parsing, history round-trips, and the full self-host pipeline against `skills/skillcheck/SKILL.md`. Fixtures are in `tests/fixtures/`; every rule has at least one positive and one negative test case. `tests/test_readme_test_count_claim.py` asserts this count matches `pytest --collect-only`, so any future suite change has to update the number in the same commit or CI fails.
351352

352353
## Maintainer Notes
353354

RELEASE_NOTES_v1.1.0.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# skillcheck 1.1.0
2+
3+
An external audit against v1.0.1 surfaced eight repo defects: an unpinned GitHub Action install, gitignored evidence paths cited in the README, a top-level SKILL.md describing an unrelated skill, a missing `@v0` tag the README claimed existed, exit-code 2 conflating tool-misuse with warning-only reports, an oversized `cli.py`, and a vague-word list that flagged context-dependent terms like "comprehensive". v1.1.0 fixes all of them and reverses one v1.0.1 behavior change that turned out wrong.
4+
5+
## Behavior change
6+
7+
Warning-only runs now return exit code **0** by default. v1.0.1 made them return 2; that conflated valid runs that produced warnings with tool-misuse cases (missing path, conflicting flags, empty directory). CI consumers couldn't tell the difference. v1.1.0 splits them: warnings exit 0, input errors exit 2, errors stay at 1, semantic drift stays at 3. The new `--warnings-as-errors` flag escalates warning-only runs to exit 1 for pipelines that want warnings to block.
8+
9+
If your CI relied on v1.0.1's "warnings exit 2" behavior, add `--warnings-as-errors` to your skillcheck invocation, or pin to `@v1.0.1` until you can update.
10+
11+
## Added
12+
13+
- `--warnings-as-errors` flag.
14+
- Two regression tests guarding the description-scorer rubric.
15+
16+
## Changed
17+
18+
- `action.yml` install step pins `skillcheck>=1.0.1`. Until v1.1.0 is uploaded to PyPI, this fails loudly on unpublished v1 features rather than silently resolving to v0.2.0.
19+
- Description scorer no longer penalizes `comprehensive`, `robust`, or `flexible` in descriptions. Each can describe a concrete attribute when qualified ("comprehensive coverage of N file formats", "robust against malformed input"). The inclusion rubric is now documented inline. Verified against `anthropics/skills`: zero score changes across 17 files, because none of those skills use the dropped words. The rubric edit is a no-op against the current corpus; the two new regression tests are forward-looking guards, not regression evidence.
20+
- Description scorer verb matching collapsed from 86 entries (base + 3rd-person duplicates) to 42 base forms with stem normalization. Adding a new verb now only requires the base form.
21+
- README field-test citations replaced gitignored `runs/...` paths with reproducible commands.
22+
- README exit-code table documents the new semantics; flag table documents `--warnings-as-errors`.
23+
- README test count: 663 → 667.
24+
25+
## Removed
26+
27+
- Top-level `git-commit-crafter` SKILL.md from the repo root.
28+
- False `@v0` tag claim from the README and CHANGELOG.
29+
30+
## Why this is a minor and not a patch
31+
32+
The exit-code semantics change is observable in CI and not opt-in. Adding `--warnings-as-errors` is also a public-surface addition. Either alone would be a minor bump under semver; together they aren't a patch.
33+
34+
## Audit items not closed
35+
36+
- **PyPI publish**: the v1.1.0 sdist and wheel are built and pass `twine check`, but PyPI upload requires authenticated credentials and happens out-of-band. Until that runs, `pip install skillcheck` continues to ship v0.2.0. The pinned action install will refuse to run.
37+
- **`cli.py` line count**: the audit asked for a refactor toward `main()` under 100 lines and `cli.py` under 700. An attempted helper extraction met the `main()` target but pushed total file size from 1127 to 1172. The refactor was reverted; the file remains at its pre-audit size, with the audit's "deliberate choice" path left open for a follow-up.

SKILL.md

Lines changed: 0 additions & 65 deletions
This file was deleted.

action.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ runs:
108108
if [ -n "$INPUT_VERSION" ]; then
109109
python -m pip install --quiet "skillcheck==$INPUT_VERSION"
110110
else
111-
python -m pip install --quiet skillcheck
111+
python -m pip install --quiet "skillcheck>=1.0.1"
112112
fi
113113
114114
- name: Run skillcheck

0 commit comments

Comments
 (0)