Skip to content

Commit 0eee715

Browse files
authored
Merge pull request #6 from moonrunnerkc/v1.2.0-false-positive-fixes
release: v1.2.0
2 parents 44620cc + 182f9c4 commit 0eee715

36 files changed

Lines changed: 1006 additions & 447 deletions

CHANGELOG.md

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,35 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [1.2.0] - 2026-04-29
11+
12+
Backward compatibility: previously-passing skills still pass. Some previously-failing skills now warn instead of error and produce exit code 0 instead of 1.
13+
14+
### Added
15+
16+
- `template.detected` info-level rule and `src/skillcheck/template_detection.py` module.
17+
- `ECOSYSTEM_FIELDS` classification for `license`, `repository`, `homepage`, and `template`.
18+
- Config support for `[frontmatter] extension_fields` in `skillcheck.toml`.
19+
20+
### Changed
21+
22+
- `frontmatter.name.reserved-word` demoted from ERROR to WARNING; source tag changed from `spec` to `advisory`; message rewritten.
23+
- `frontmatter.description.person-voice` demoted from ERROR to WARNING; messages rewritten to acknowledge the heuristic.
24+
- Budget-message phrasing aligned with the spec's "recommended" language across `sizing.*` and `disclosure.*` rules.
25+
26+
### Fixed
27+
28+
- `frontmatter.field.unknown` no longer fires on `license`, `repository`, `homepage`, or `template`; these now produce info-level `frontmatter.field.ecosystem` diagnostics or are silent for user extensions.
29+
- Templates (placeholder content, `template: true` flag, or files under `template/` or `templates/` directories) no longer trigger deployment-blocking checks (`frontmatter.name.directory-mismatch`, `compat.vscode-dirname`, `description.quality-score`).
30+
31+
### Internal
32+
33+
- Renamed `config.KNOWN_FRONTMATTER_FIELDS` to `config.SPEC_FIELDS`.
34+
- New `template.detected` rule wired into `rules/__init__.py`.
35+
- Frontmatter rule implementation split into smaller modules while preserving `skillcheck.rules.frontmatter` imports.
36+
- Root `SKILL.md` restored so `skillcheck SKILL.md` self-validation works from the repository root.
37+
- New fixture set under `tests/fixtures/` covering ecosystem fields, user extensions, template detection, and demoted severities.
38+
1039
## [1.1.0] - 2026-04-28
1140

1241
External audit against v1.0.1 surfaced eight repo defects ranging from documentation drift to a CI-confusing exit-code conflation. v1.1.0 ships fixes for all eight, reverses one v1.0.1 behavior change that turned out wrong, and tightens the description scorer's vague-word rubric. The minor bump is driven by the exit-code semantics change (now distinguishes warning-only from input error) and the new `--warnings-as-errors` flag.
@@ -24,7 +53,7 @@ External audit against v1.0.1 surfaced eight repo defects ranging from documenta
2453
### Changed
2554

2655
- `action.yml` install step pins `skillcheck>=1.0.1` so consumers fail loudly on unpublished v1 features instead of silently running v0.2.0.
27-
- Description scorer rubric documented and tightened: dropped `comprehensive`, `robust`, and `flexible` from `_VAGUE_WORDS` because each can describe a concrete attribute when qualified ("comprehensive coverage of N file formats", "robust against malformed input"). The inclusion rubric is now documented inline. Verified against `anthropics/skills` (17 SKILL.md files): zero score changes, because none of those skills use the dropped words. The rubric edit is a no-op against the current corpus; the new regression tests are forward-looking guards against scoring drift if the list is ever re-expanded.
56+
- Description scorer rubric documented and tightened: dropped `comprehensive`, `flexible`, and the malformed-input term from `_VAGUE_WORDS` because each can describe a concrete attribute when qualified ("comprehensive coverage of N file formats", "handles malformed input"). The inclusion rubric is now documented inline. Verified against `anthropics/skills` (17 SKILL.md files): zero score changes, because none of those skills use the dropped words. The rubric edit is a no-op against the current corpus; the new regression tests are forward-looking guards against scoring drift if the list is ever re-expanded.
2857
- Description scorer verb matching: collapsed `_ACTION_VERBS` from 86 entries (base + 3rd-person duplicates) to 42 base forms. Added `_is_action_verb()` to handle stem normalization across `-s`, `-es`, and `-ies` endings. Adding a new verb now only requires the base form.
2958
- README test count bumped from 663 to 667 to include the drift-guard test, two description-scorer regression tests, and the `--warnings-as-errors` test.
3059
- README field-test citations: replaced seven gitignored `runs/...` path references with the exact `skillcheck` commands needed to reproduce each finding. Readers can now verify the claims without access to private artifacts.

README.md

Lines changed: 31 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ skillcheck skills/ # recursive scan; finds every file named SKILL.md
6969
skillcheck SKILL.md --format json
7070
```
7171

72-
From the field test on Anthropic's official skills repository (18 skills, snapshot taken during v1.0 release prep in April 2026): four of eighteen files failed. `claude-api/SKILL.md` failed with `frontmatter.name.reserved-word` because the name contains the reserved word "claude". `template/SKILL.md` failed with `frontmatter.name.directory-mismatch` (name `template-skill`, directory `template`). Both files look correct on casual inspection. Reproduce: clone `anthropics/skills` and run `skillcheck skills/ --format text`.
72+
From the field test on Anthropic's official skills repository (18 skills, April 2026 snapshot): v1.1.0 produced four failures from advisory checks. v1.2.0 demotes the `claude-api` reserved-name and person-voice findings to warnings, treats `license` as ecosystem-common metadata, and detects `template/SKILL.md` as a placeholder file. Reproduce: clone `anthropics/skills` and run `skillcheck skills/ --format text`.
7373

7474
### Heuristic Graph
7575

@@ -191,31 +191,31 @@ The v1.0 graph and critique modes are available as action inputs. Example with s
191191
Text output (default), excerpt from a run against the Anthropic skills corpus:
192192

193193
```
194-
✗ FAIL skills/claude-api/SKILL.md
195-
line 2 ✗ error frontmatter.name.reserved-word Name contains reserved word 'claude': 'claude-api'.
194+
✔ PASS skills/claude-api/SKILL.md
195+
line 2 ⚠ warning frontmatter.name.reserved-word Name contains the term 'claude' which may collide with platform-reserved namespaces. Verify with the target agent's documentation.
196196
name: claude-api
197-
line 4 ⚠ warning frontmatter.field.unknown Unknown frontmatter field 'license'.
197+
line 4 · info frontmatter.field.ecosystem Field 'license' is ecosystem-common but not in the agentskills.io spec. Add it to skillcheck.toml under [frontmatter] extension_fields if intentional.
198198
199-
Checked 18 files: 14 passed, 4 failed, 24 warnings
199+
Checked 18 files: 18 passed, 0 failed, 29 warnings
200200
```
201201

202202
JSON output (`--format json`):
203203

204204
```json
205205
{
206-
"version": "1.0.0",
206+
"version": "1.2.0",
207207
"files_checked": 18,
208-
"files_passed": 14,
209-
"files_failed": 4,
208+
"files_passed": 18,
209+
"files_failed": 0,
210210
"results": [
211211
{
212212
"path": "skills/claude-api/SKILL.md",
213213
"valid": false,
214214
"diagnostics": [
215215
{
216216
"rule": "frontmatter.name.reserved-word",
217-
"severity": "error",
218-
"message": "Name contains reserved word 'claude': 'claude-api'.",
217+
"severity": "warning",
218+
"message": "Name contains the term 'claude' which may collide with platform-reserved namespaces. Verify with the target agent's documentation.",
219219
"line": 2,
220220
"context": "name: claude-api"
221221
}
@@ -286,14 +286,15 @@ Source tags: `spec` rules derive from the agentskills.io specification or agent-
286286
| `frontmatter.name.invalid-chars` | error | spec | Lowercase, numbers, hyphens only |
287287
| `frontmatter.name.leading-trailing-hyphen` | error | spec | No leading or trailing hyphens |
288288
| `frontmatter.name.consecutive-hyphens` | error | spec | No consecutive hyphens |
289-
| `frontmatter.name.reserved-word` | error | advisory | Not a reserved word (`claude`, `anthropic`) |
289+
| `frontmatter.name.reserved-word` | warning | advisory | Name contains a term that may collide with platform-reserved namespaces |
290290
| `frontmatter.name.directory-mismatch` | error | spec | Name must match parent directory (VS Code requirement) |
291291
| `frontmatter.description.required` | error | spec | `description` field must exist |
292292
| `frontmatter.description.type` | error | advisory | `description` must be a string (catches YAML coercion) |
293293
| `frontmatter.description.empty` | error | spec | Description must not be blank |
294294
| `frontmatter.description.max-length` | error | spec | 1024 character maximum |
295295
| `frontmatter.description.xml-tags` | error | advisory | No XML or HTML tags in description |
296-
| `frontmatter.description.person-voice` | error | advisory | No first or second-person pronouns |
296+
| `frontmatter.description.person-voice` | warning | advisory | First or second-person voice may reduce routing clarity |
297+
| `frontmatter.field.ecosystem` | info | advisory | Field is ecosystem-common but not in the agentskills.io spec |
297298
| `frontmatter.field.unknown` | warning | advisory | Field not in the known spec list |
298299
| `frontmatter.yaml-anchors` | warning | advisory | YAML anchors and aliases can silently copy values |
299300
| `description.quality-score` | info | advisory | Scores description 0-100 for agent discoverability |
@@ -309,6 +310,7 @@ Source tags: `spec` rules derive from the agentskills.io specification or agent-
309310
| `compat.claude-only` | info | spec | Field only works in Claude Code |
310311
| `compat.vscode-dirname` | info / error | spec | Name does not match parent directory (VS Code); promotes to error with `--strict-vscode` |
311312
| `compat.unverified` | info | advisory | Field behavior unverified in Codex or Cursor |
313+
| `template.detected` | info | advisory | Placeholder file detected; deployment-blocking checks are skipped |
312314
| `graph.capability.orphaned` | warning | heuristic | Capability heading has no declared inputs or outputs |
313315
| `graph.input.unused` | warning | heuristic | Body-declared input not required by any capability |
314316
| `graph.output.unproduced` | warning | heuristic | Declared output not produced by any capability |
@@ -319,11 +321,24 @@ Source tags: `spec` rules derive from the agentskills.io specification or agent-
319321
| `history.write.failed` | warning | history | Could not write the ledger file; validation exit code unaffected |
320322
| `history.read.failed` | warning | history | Could not read the ledger file; validation continues without regression check |
321323

324+
## Templates
325+
326+
Template files are detected by `template: true` frontmatter, placeholder-like descriptions, or a parent directory named `template` or `templates`. When detected, skillcheck emits `template.detected` and skips deployment-blocking checks that do not apply before copy-and-fill use: directory-name match, VS Code dirname, and description quality scoring. Sizing, disclosure, references, and other content checks still run.
327+
328+
## Extension fields
329+
330+
Use `[frontmatter] extension_fields` in `skillcheck.toml` for organization-specific metadata that should be accepted without diagnostics.
331+
332+
```toml
333+
[frontmatter]
334+
extension_fields = ["my-org-tag", "internal-id"]
335+
```
336+
322337
## Case Study
323338

324339
We ran skillcheck against three corpora during v1.0 release prep (April 2026 snapshots): Anthropic's official skills repository (18 skills), the `mcp-builder` skill through the full v1.0 pipeline, and five skills from the uxuiprinciples/agent-skills collection. To reproduce, clone each upstream repo and run `skillcheck <path>` (the case study below records the exact invocations).
325340

326-
The symbolic run of the Anthropic corpus returned four failures from eighteen files (exit 1). All four files look correct on review: two had second-person voice in the description, one used "claude" as part of the name (reserved word per spec), and the template skill had a name/directory mismatch. The deeper finding came from running `mcp-builder` through the critique pipeline: the symbolic run passed (exit 0), but the ingested agent critique returned exit 3 with three `semantic.contradiction.detected` errors. The skill's frontmatter offers Python and TypeScript as equal options; its body unconditionally recommends TypeScript in Phase 1.3. That inconsistency means any agent following the Python path hits an unresolved decision point. No static linter catches it. See [docs/case-study-v1-real-world-runs.md](docs/case-study-v1-real-world-runs.md) for the full breakdown.
341+
The v1.1.0 symbolic run of the Anthropic corpus returned four failures from eighteen files (exit 1). v1.2.0 reclassifies those findings: `canvas-design` and `theme-factory` now warn for person-voice wording, `claude-api` warns for a possible reserved namespace collision, and `template/SKILL.md` is detected as a placeholder. The deeper finding came from running `mcp-builder` through the critique pipeline: the symbolic run passed (exit 0), but the ingested agent critique returned exit 3 with three `semantic.contradiction.detected` errors. The skill's frontmatter offers Python and TypeScript as equal options; its body unconditionally recommends TypeScript in Phase 1.3. That inconsistency means any agent following the Python path hits an unresolved decision point. No static linter catches it. See [docs/case-study-v1-real-world-runs.md](docs/case-study-v1-real-world-runs.md) for the full breakdown.
327342

328343
See also: [docs/case-study-silent-skill-failure.md](docs/case-study-silent-skill-failure.md) (the v0.2.0 case study: a deploy skill that silently disappeared in VS Code due to a name/directory mismatch).
329344

@@ -335,6 +350,8 @@ Cross-agent compatibility data for Codex and Cursor comes from available documen
335350

336351
Description quality scoring uses heuristics, not an LLM. It catches structural problems (missing action verbs, no trigger phrases, vague words) but cannot evaluate whether instructions are semantically coherent. Agent critique mode addresses that gap.
337352

353+
Template detection favors recall over precision. A real skill with placeholder-like description text may be flagged as `template.detected`; rename the description or add enough concrete routing context before deployment.
354+
338355
The heuristic graph extractor uses heading structure and backtick references as proxies for capability declarations. Skills that express capabilities entirely in prose will produce sparse graphs with many `graph.capability.orphaned` warnings. Agent graph mode (`--emit-graph-prompt` / `--ingest-graph`) addresses this but requires a calling agent.
339356

340357
Agent critique and graph modes validate the agent's JSON response against the expected schema and convert it to diagnostics. skillcheck trusts the agent's reasoning; it does not second-guess findings that pass schema validation. The quality of the output depends on the quality of the calling agent.
@@ -348,7 +365,7 @@ pip install -e ".[dev]"
348365
python3 -m pytest tests/ -q
349366
```
350367

351-
667 tests cover all rule modules, CLI exit codes, graph analyzers, divergence detection, critique parsing, history round-trips, and the full self-host pipeline against `skills/skillcheck/SKILL.md`. Fixtures are in `tests/fixtures/`; every rule has at least one positive and one negative test case. `tests/test_readme_test_count_claim.py` asserts this count matches `pytest --collect-only`, so any future suite change has to update the number in the same commit or CI fails.
368+
683 tests cover all rule modules, CLI exit codes, graph analyzers, divergence detection, critique parsing, history round-trips, and the full self-host pipeline against `skills/skillcheck/SKILL.md`. Fixtures are in `tests/fixtures/`; every rule has at least one positive and one negative test case. `tests/test_readme_test_count_claim.py` asserts this count matches `pytest --collect-only`, so any future suite change has to update the number in the same commit or CI fails.
352369

353370
## Maintainer Notes
354371

SKILL.md

Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
---
2+
name: skillcheck
3+
description: Validates SKILL.md files for spec-facing structure, sizing, references, cross-agent compatibility, and agent-authored critique or graph diagnostics.
4+
version: "1.2.0"
5+
author: brad
6+
---
7+
8+
Use this skill when validating a `SKILL.md` file or a directory of skill files with the `skillcheck` CLI.
9+
10+
## Validate
11+
12+
Run the default symbolic checks:
13+
14+
```bash
15+
skillcheck SKILL.md
16+
skillcheck skills/ --format json
17+
```
18+
19+
The report includes errors, warnings, and info diagnostics. Exit code 0 means no errors. Exit code 1 means at least one error, or warnings with `--warnings-as-errors`. Exit code 2 means input or argument error. Exit code 3 means symbolic validation passed but ingested agent critique found semantic errors.
20+
21+
## Agent workflows
22+
23+
For semantic self-critique, emit a prompt and ingest the returned JSON:
24+
25+
```bash
26+
skillcheck SKILL.md --emit-critique-prompt > prompt.txt
27+
skillcheck SKILL.md --ingest-critique response.json
28+
```
29+
30+
For capability graph work, use:
31+
32+
```bash
33+
skillcheck SKILL.md --analyze-graph
34+
skillcheck SKILL.md --emit-graph --format json
35+
```
36+
37+
## Configuration
38+
39+
`skillcheck.toml` can set CLI defaults and frontmatter extension fields:
40+
41+
```toml
42+
[frontmatter]
43+
extension_fields = ["my-org-tag", "internal-id"]
44+
```

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "hatchling.build"
44

55
[project]
66
name = "skillcheck"
7-
version = "1.1.0"
7+
version = "1.2.0"
88
description = "Cross-agent skill quality gate for SKILL.md files conforming to the agentskills.io specification"
99
readme = "README.md"
1010
license = { text = "MIT" }

skills/skillcheck/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
name: skillcheck
33
description: Validates and scores SKILL.md files against the agentskills.io specification; use when linting skills for cross-agent compatibility, description quality, or capability graph structure.
4-
version: "1.1.0"
4+
version: "1.2.0"
55
author: brad
66
---
77

src/skillcheck/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
from skillcheck.parser import ParsedSkill, ParseError
33
from skillcheck.result import Diagnostic, Severity, ValidationResult
44

5-
__version__ = "1.1.0"
5+
__version__ = "1.2.0"
66

77
__all__ = [
88
"validate",

0 commit comments

Comments
 (0)