Skip to content

Commit 87bc553

Browse files
suyask-msftclaudeSaurabh Badenkalsaurabhrb
authored
refactor: split dv-python-sdk into dv-data/dv-query, harden all skills through end-to-end validation (#32)
* Add safety and guardrails documentation Public-facing doc covering the plugin's safety model: supported operations, authentication, least-privilege enforcement, confirmation flows, data residency, logging, telemetry policy, irreversible operations, and planned improvements. Clearly distinguishes platform-enforced controls (Dataverse security roles, MCP authorization layers) from agent-level guardrails (skill instructions). Adds Safety & Security section to README with key trust signals and link to the full doc. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix telemetry claim in README to acknowledge AI host data flow Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Address PR review feedback on safety doc Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Address review feedback on skill instruction language Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Split dv-python-sdk into dv-data and dv-query dv-python-sdk was a 545-line skill covering five distinct concerns. As S05-S09 are built out it would grow to 900+ lines, loading unnecessary context in every session. dv-data (new): - Record CRUD, CreateMultiple, UpdateMultiple, UpsertMultiple - CSV import with type inference and lookup resolution - Alternate key upserts, continue-on-error batch - File column uploads (chunked) - Table/column/relationship schema creation - @odata.bind rules and error handling Triggers: "create", "insert", "import", "bulk", "upsert", "write", "upload" dv-query (new): - OData filter/select/expand/orderby/top/paging - GUID-free display (formatted values) - Fuzzy record lookup and "my" scoping (WhoAmI) - $apply aggregation (Web API path documented) - N:N $expand (Web API path documented) - Change tracking / delta queries ($deltatoken) - Data quality patterns (null rate, duplicates, orphan detection) - Jupyter/pandas notebook handoff Triggers: "query", "filter", "find", "show me", "aggregate", "analyze", "profiling", "notebook", "pandas", "GUID-free" dv-overview updated: - Tool capability table: Python SDK row split into dv-data and dv-query rows - Available Skills index: dv-python-sdk replaced with dv-data and dv-query - All dv-python-sdk references updated to dv-data or dv-query as appropriate Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Update skill cross-references from dv-python-sdk to dv-data and dv-query * Bump plugin version to 1.1.0 and update descriptions * Clean up dv-data and dv-query to validated content only, add skill boundary cross-references * Update dv-data, dv-query, dv-connect for SDK b6-b8 changes - Add context manager pattern to dv-data and dv-query setup - Add client.dataframe.get/create/update() to dv-query pandas section - Add QueryBuilder fluent API section to dv-query - Update Jupyter notebook setup to use client.dataframe.get() - Add pandas to pip install in dv-connect (required dep since b7) - Soften @odata.bind casing warning (SDK preserves casing since b6) * Fix conflicts and factual errors in dv-data and dv-query skills - Fix order_by() syntax in QueryBuilder section (was string syntax, should be column+descending=True) - Remove unnecessary QueryBuilder import (client.query.builder() needs no import) - Fix composable filter method: .filter() -> .where() to match SDK API - Simplify setup blocks: remove confusing dual-pattern with pass placeholder - Remove @odata.bind from dv-query field casing table (read-only skill) - Add guidance on QueryBuilder vs client.records.get() — QB preferred for multi-record, records.get() needed for single-by-GUID * Align MCP vs SDK guidance across dv-overview and dv-query - Add read-volume threshold to MCP/SDK rule: MCP for single-page reads, SDK for bulk/multi-page - Split volume guidance into separate write and read paragraphs - Update SDK checklist to reference QueryBuilder as preferred query API - Remove stale dv-query capabilities from tool table (fuzzy lookup, my scoping, change tracking, data quality patterns) - Update Available Skills table dv-query entry to match actual skill content - Update dv-query frontmatter: proactive framing for bulk reads, not just defensive fallback * Fix cross-skill conflicts and routing errors found in full audit - Remove schema creation from dv-data tool table row (belongs in dv-metadata) - Fix Hard Rule 1 bad redirect: was dv-data, now points to Hard Rule 1 above - Clarify @odata.bind casing: SDK auto-handles record payloads (b6+), raw Web API still manual - SDK checklist now routes each operation to the correct skill - Fix dv-solution import path: was scripts.auth, now uses sys.path.insert + auth pattern - Tool priority summary now consistent with detailed read-volume bullet rule - Replace undefined "single page" with "no paging needed" in dv-query and dv-overview * Fix agent-rail issues found in second full cross-skill audit - Fix get_token() STOP warning: was prohibitive, now routes to Raw Web API list for context - Fix @odata.bind casing: was imprecise (implied auto-correct), now explicit (SDK stops lowercasing, still need correct SchemaName) - Complete PublishXml stub in dv-metadata: was a comment, now full runnable code - Add solution-confirmation-first requirement to dv-metadata header (was only in dv-overview) - Clarify MCP availability check: explicit vs implicit MCP requests are different branches, add explicit label - Define multi-step logic in dv-data: sequential MCP calls are NOT multi-step; script only for bulk/transform/retry/CSV - Fix missing os/sys.path.insert in dv-query N:N expand and apply sections (would fail with NameError) * Fix code correctness and routing errors found in third cross-skill audit - Add missing sys.path.insert to dv-metadata table creation and forms blocks - Clarify PublishXml block requires env/token from form creation setup above - Complete retrieve/modify form stub with full GET + PATCH runnable code - Fix N:N $ref association routing in dv-data: was dv-query (wrong), now raw Web API with URL pattern - Clarify dv-data supports list: remove standalone read, add note that reads within write workflows are acceptable - Add HAVING qualifier to dv-query count table: simple count -> MCP, count+HAVING -> Web API - Fix PAC CLI command in dv-metadata: was add-reference (solution deps), now add-solution-component (components) - Standardize volume guidance phrasing: single page -> no paging needed (matches Hard Rule 2) * fix: audit 4 corrections across dv-connect, dv-data, dv-query, dv-solution - dv-solution: add client setup before publisher discovery in Step 1 - dv-solution: add sys.path.insert and define env in user role check block - dv-connect: fix MCP capabilities table to reflect update_table supports column add - dv-connect: add missing import os to gitignore block - dv-query: note that DataFrame write-back is a write operation, cross-ref dv-data - dv-query: document Jupyter auth exception (no scripts/auth.py in notebooks) - dv-data: add DataFrame write-back section cross-referencing dv-query Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: pre-push static eval corrections in dv-data and dv-solution - dv-data: change illustrative import fragments from python to plain fences to avoid false-positive sys.path.insert eval failures; add clarifying note - dv-solution: add Skill Boundaries table (was missing, only skill without one) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * add: static eval suite for skill file quality checks Implements the P0 evals from the offline evals proposal as a runnable script. Checks all SKILL.md files for: - EVAL-AUTH-01: no 'from scripts.auth import' pattern - EVAL-PY-01: sys.path.insert present and ordered before 'from auth import' - EVAL-PY-04: no all-comment stub blocks presented as runnable code - EVAL-PY-05: get_token() not used in DataverseClient blocks - EVAL-PY-06: load_env() called before os.environ access - EVAL-PAC-02: no 'pac --version' invocations - EVAL-COMPLETE-01: Skill Boundaries table present in every skill Usage: python .github/evals/static_checks.py Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: consolidate pip install to dv-connect, add pandas to tools-setup dv-data had a redundant pip install line -- dv-connect owns installation via the mandatory connect flow. Removed from dv-data. Also added pandas to tools-setup.md to match dv-connect/SKILL.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * add: CLAUDE.md for contributor-facing AI assistant guidance Documents plugin structure, skill authoring rules (auth pattern, no stubs, skill boundaries, MCP/SDK/Web API priority), and the eval check to run before every commit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * add: GitHub Actions workflow for skill static evals Runs on every PR that touches skill files or the eval suite itself. No external dependencies -- stdlib only, completes in seconds. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * add: CodeQL security analysis workflow Runs on push/PR to main and weekly on a schedule. Standard requirement for Microsoft public OSS repos. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: skip CodeQL SARIF upload for fork PRs External contributors run with read-only tokens -- uploading security results requires write access only available to internal PRs. Analysis still runs and the check still passes/fails correctly for all PRs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * refactor: categorize evals and add structure/discoverability checks Reorganizes static_checks.py into 5 named categories (CAT-1 through CAT-5) with one function per category -- easier to extend without touching unrelated checks. New checks added (CAT-4 and CAT-5): - EVAL-STRUCT-01: frontmatter has required name and description fields - EVAL-STRUCT-02: frontmatter name matches directory name - EVAL-STRUCT-03: description contains 'Use when:' routing trigger - EVAL-STRUCT-04: description contains 'Do not use when:' routing trigger - EVAL-COMPLETE-02: Skill Boundaries cross-references point to real skills - EVAL-COMPLETE-03: dv-overview Available Skills table lists every skill - EVAL-COMPLETE-04: no references to removed dv-python-sdk skill Output now groups failures by category prefix for readability. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * skills: add multi-table import patterns and metadata guardrails from BulkImport experiment - dv-data: multi-table FK-ordered import with in-memory lookup maps, chunk helper, idempotent re-run via UpsertItem with alternate keys - dv-metadata: EntityDefinitions startswith() filter limitation, idempotent table creation pattern (catch 0x80040237) - dv-overview: Windows background job monitoring caveat (silent empty output) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: cross-skill consistency audit — correct batching claims, stale counts, routing gaps - dv-data: fix false "built-in batching" claims (SDK sends all records in one POST, does NOT chunk); add ThreadPoolExecutor pattern for large imports; add frontmatter triggers for multi-table/large dataset sections - dv-overview: fix volume guidance batching claim; fix index descriptions (dv-metadata uses SDK not just Web API; remove phantom continue-on-error) - README: update skill count from 5 to 6 after dv-python-sdk split - evals: add EVAL-COMPLETE-05 — README skill count must match actual directories Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: end-to-end validation fixes from 5 BulkImport runs (297K records, 21 tables) Findings from running the same dataset against 5 fresh environments with and without skills — reduced import time from 44 hours to ~1 hour. dv-data: - Default to UpsertItem with alternate keys (not create) for idempotent imports - Adaptive chunking: start at 1K, ramp to 4K on success, back off on payload/timeout failure, cap at last successful size - Cross-table parallelism with ThreadPoolExecutor, sequential within tables (concurrent writes to same table cause SQL deadlocks) - UpsertMultiple fails when key columns appear in record body (bulk-only bug) - Per-table error isolation in ThreadPoolExecutor (one failure must not kill others) - EntitySetName must be queried, not guessed (English pluralization is irregular) - Post-import verification pattern - flush=True on all long-running print statements dv-metadata: - Phased schema creation (tables -> keys -> lookups) to avoid lock contention - Column naming: Src prefix for source IDs to avoid nav property collisions - Alternate key creation via SDK (client.tables.create_alternate_key) - Idempotent creation with retry for deadlocks and lock contention - Publisher/solution creation SDK snippet in preamble (agent skips dv-solution) - Mandatory user confirmation for solution name and publisher prefix dv-query: - Intent-driven routing table (user question -> right tool) - SDK-First Rule for reads (no raw HTTP for queries) - $apply as primary for single-table aggregation - pandas merge with $select for cross-table (always pass select=) - client.query.sql() documented with actual limitations (no JOINs, 5K cap) - QueryBuilder gated behind SDK b8+ (doesn't exist in b7) - $expand must use nested $select to avoid fetching all columns - include_annotations required for formatted values (was silently returning None) dv-overview: - Anti-introspection rule (no dir/inspect — follow skill patterns) - Explicit NEVER list for raw Web API - Inline SDK snippet for publisher/solution creation - Aggregation routing in SDK checklist evals: - EVAL-AUTH-02: every get_token/urllib block must justify why SDK not used Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: correct false auto-batching claim in safety doc, add publisher prefix to irreversible ops - SDK does NOT auto-batch — agent chunks with adaptive sizing (skill-enforced) - Upsert with alternate keys is the idempotent import pattern, not CreateMultiple - Publisher prefix is permanent and listed in irreversible operations table Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * docs: add version bumping instructions to CLAUDE.md Lists all four files that must stay in sync and semver guidance. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix: address review nits — pandas skip condition, urlopen timeout, regex info-string * fix: increase urlopen timeout to 120s to match Dataverse server-side limit * fix: increase urlopen timeout to 150s --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Saurabh Badenkal <sbadenkal@microsoft.com> Co-authored-by: Saurabh Ravindra Badenkal <32964911+saurabhrb@users.noreply.github.com>
1 parent a6dda75 commit 87bc553

18 files changed

Lines changed: 1922 additions & 622 deletions

File tree

.claude-plugin/marketplace.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
"plugins": [
99
{
1010
"name": "dataverse",
11-
"description": "Agent skills for building on, analyzing, and managing Microsoft Dataverse — with Dataverse MCP, PAC CLI, and Dataverse Python SDK.",
11+
"description": "Agent skills for building on, analyzing, and managing Microsoft Dataverse — with Dataverse MCP, PAC CLI, and Python SDK.",
1212
"source": "./.github/plugins/dataverse",
1313
"homepage": "https://github.com/microsoft/Dataverse-skills"
1414
}

.github/evals/static_checks.py

Lines changed: 384 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,384 @@
1+
"""
2+
Static eval suite for Dataverse plugin skill files.
3+
4+
Checks every SKILL.md for code correctness, auth pattern compliance,
5+
cross-skill completeness, and PAC CLI accuracy. Runs with no external
6+
dependencies (stdlib only).
7+
8+
Usage:
9+
python .github/evals/static_checks.py
10+
python .github/evals/static_checks.py --skills-dir path/to/skills
11+
12+
Exit code 0 = all checks passed. Exit code 1 = one or more failures.
13+
14+
--- Eval Categories ---
15+
16+
CAT-1 Python Code Block Validity
17+
Checks that every python-fenced block is runnable as written.
18+
EVAL-PY-01 sys.path.insert present and ordered before 'from auth import'
19+
EVAL-PY-04 No all-comment stub blocks
20+
EVAL-PY-05 get_token() not used in DataverseClient blocks
21+
EVAL-PY-06 load_env() called before os.environ access
22+
23+
CAT-2 Auth Pattern Compliance
24+
Checks that auth imports follow the documented pattern.
25+
EVAL-AUTH-01 No 'from scripts.auth import' pattern
26+
EVAL-AUTH-02 Every get_token/urllib block must justify why SDK is not used
27+
28+
CAT-3 PAC CLI Accuracy
29+
Checks for known-bad PAC CLI invocations.
30+
EVAL-PAC-02 No 'pac --version' invocations
31+
32+
CAT-4 Skill Structure & Discoverability
33+
Checks that every skill has the structural elements agents need to
34+
discover and route to it correctly.
35+
EVAL-STRUCT-01 Frontmatter has required 'name' and 'description' fields
36+
EVAL-STRUCT-02 Frontmatter 'name' matches the skill directory name
37+
EVAL-STRUCT-03 Frontmatter 'description' contains 'Use when:' trigger
38+
EVAL-STRUCT-04 Frontmatter 'description' contains 'Do not use when:' trigger
39+
40+
CAT-5 Cross-Skill Completeness
41+
Checks that skills reference each other correctly and that the
42+
overview index stays in sync with the actual skill set.
43+
EVAL-COMPLETE-01 Skill Boundaries section present in every non-exempt skill
44+
EVAL-COMPLETE-02 Skill Boundaries cross-references point to real skill names
45+
EVAL-COMPLETE-03 dv-overview Available Skills table lists every skill directory
46+
EVAL-COMPLETE-04 No skill references the removed 'dv-python-sdk' skill
47+
EVAL-COMPLETE-05 README.md skill count matches actual number of skill directories
48+
"""
49+
50+
import argparse
51+
import re
52+
import sys
53+
from pathlib import Path
54+
55+
# Skills that intentionally have no Skill Boundaries table.
56+
NO_BOUNDARIES_EXEMPT = {"dv-overview", "dv-connect"}
57+
58+
# Skills that intentionally have no 'Do not use when:' trigger (orchestration skills).
59+
NO_DO_NOT_USE_EXEMPT = {"dv-overview", "dv-connect"}
60+
61+
62+
def extract_fenced_blocks(text, lang="python"):
63+
"""Return list of (index, content) for fenced blocks of the given language."""
64+
pattern = rf"```{re.escape(lang)}[^\n]*\n(.*?)```" if lang else r"```\w*[^\n]*\n(.*?)```"
65+
return list(enumerate(re.findall(pattern, text, re.DOTALL), start=1))
66+
67+
68+
def parse_frontmatter(text):
69+
"""Return the YAML frontmatter block as a raw string, or None if absent."""
70+
m = re.match(r"^---\n(.*?)\n---", text, re.DOTALL)
71+
return m.group(1) if m else None
72+
73+
74+
# ---------------------------------------------------------------------------
75+
# CAT-1 Python Code Block Validity
76+
# ---------------------------------------------------------------------------
77+
78+
def check_python_blocks(name, text):
79+
failures = []
80+
python_blocks = extract_fenced_blocks(text, "python")
81+
82+
for i, block in python_blocks:
83+
label = f"{name} python-block-{i}"
84+
85+
# EVAL-PY-01: sys.path.insert must precede 'from auth import'
86+
if "from auth import" in block:
87+
lines = block.splitlines()
88+
auth_idx = next((j for j, l in enumerate(lines) if "from auth import" in l), None)
89+
path_idx = next((j for j, l in enumerate(lines) if "sys.path.insert" in l), None)
90+
if path_idx is None:
91+
failures.append(
92+
f"EVAL-PY-01 [{label}] 'from auth import' present but no sys.path.insert"
93+
)
94+
elif path_idx > auth_idx:
95+
failures.append(
96+
f"EVAL-PY-01 [{label}] sys.path.insert appears after 'from auth import'"
97+
)
98+
99+
# EVAL-PY-04: no all-comment stub blocks
100+
non_blank = [l for l in block.splitlines() if l.strip()]
101+
if non_blank and all(l.strip().startswith("#") for l in non_blank):
102+
failures.append(
103+
f"EVAL-PY-04 [{label}] block is all comments -- "
104+
f"replace stub with runnable code or remove the python fence"
105+
)
106+
107+
# EVAL-PY-05: get_token() must not appear in SDK blocks
108+
if "DataverseClient(" in block and "get_token" in block:
109+
failures.append(
110+
f"EVAL-PY-05 [{label}] get_token() used in block containing DataverseClient() -- "
111+
f"use get_credential() for SDK operations"
112+
)
113+
114+
# EVAL-PY-06: load_env() must precede os.environ access (except notebook blocks)
115+
if "os.environ[" in block and "load_env" not in block:
116+
if "InteractiveBrowserCredential" not in block:
117+
failures.append(
118+
f"EVAL-PY-06 [{label}] os.environ accessed without calling load_env() first"
119+
)
120+
121+
return failures
122+
123+
124+
# ---------------------------------------------------------------------------
125+
# CAT-2 Auth Pattern Compliance
126+
# ---------------------------------------------------------------------------
127+
128+
def check_auth_patterns(name, text):
129+
failures = []
130+
python_blocks = extract_fenced_blocks(text, "python")
131+
132+
for i, block in python_blocks:
133+
label = f"{name} python-block-{i}"
134+
135+
# EVAL-AUTH-01: no 'from scripts.auth import'
136+
if "from scripts.auth import" in block:
137+
failures.append(
138+
f"EVAL-AUTH-01 [{label}] 'from scripts.auth import' is wrong -- "
139+
f"use sys.path.insert + 'from auth import'"
140+
)
141+
142+
# EVAL-AUTH-02: get_token/urllib blocks must justify why SDK cannot be used
143+
uses_raw_http = ("get_token" in block or "urllib.request" in block)
144+
if uses_raw_http:
145+
has_justification = any(
146+
marker in block
147+
for marker in [
148+
"SDK cannot",
149+
"SDK can't",
150+
"SDK does not support",
151+
"WRONG",
152+
]
153+
)
154+
if not has_justification:
155+
failures.append(
156+
f"EVAL-AUTH-02 [{label}] uses get_token/urllib without justification "
157+
f"comment -- add '# SDK cannot/does not support <reason>' to the import line"
158+
)
159+
160+
return failures
161+
162+
163+
# ---------------------------------------------------------------------------
164+
# CAT-3 PAC CLI Accuracy
165+
# ---------------------------------------------------------------------------
166+
167+
def check_pac_cli(name, text):
168+
failures = []
169+
bash_blocks = (
170+
extract_fenced_blocks(text, "bash")
171+
+ extract_fenced_blocks(text, "sh")
172+
+ extract_fenced_blocks(text, "")
173+
)
174+
175+
# EVAL-PAC-02: pac --version must not appear anywhere
176+
for _i, block in bash_blocks:
177+
if "pac --version" in block:
178+
failures.append(
179+
f"EVAL-PAC-02 [{name}] 'pac --version' found -- "
180+
f"use 'pac' (banner) to check installation, not 'pac --version'"
181+
)
182+
break # one report per file is enough
183+
184+
return failures
185+
186+
187+
# ---------------------------------------------------------------------------
188+
# CAT-4 Skill Structure & Discoverability
189+
# ---------------------------------------------------------------------------
190+
191+
def check_structure(name, text):
192+
failures = []
193+
frontmatter = parse_frontmatter(text)
194+
195+
if frontmatter is None:
196+
failures.append(
197+
f"EVAL-STRUCT-01 [{name}] no YAML frontmatter found (expected --- block at top of file)"
198+
)
199+
return failures # remaining checks depend on frontmatter existing
200+
201+
# EVAL-STRUCT-01: required frontmatter fields
202+
if not re.search(r"^name\s*:", frontmatter, re.MULTILINE):
203+
failures.append(f"EVAL-STRUCT-01 [{name}] frontmatter missing 'name' field")
204+
if not re.search(r"^description\s*:", frontmatter, re.MULTILINE):
205+
failures.append(f"EVAL-STRUCT-01 [{name}] frontmatter missing 'description' field")
206+
207+
# EVAL-STRUCT-02: frontmatter 'name' matches directory name
208+
name_match = re.search(r"^name\s*:\s*(\S+)", frontmatter, re.MULTILINE)
209+
if name_match:
210+
declared_name = name_match.group(1).strip()
211+
if declared_name != name:
212+
failures.append(
213+
f"EVAL-STRUCT-02 [{name}] frontmatter name '{declared_name}' "
214+
f"does not match directory name '{name}'"
215+
)
216+
217+
# EVAL-STRUCT-03: 'Use when:' trigger present in description
218+
if "Use when:" not in frontmatter:
219+
failures.append(
220+
f"EVAL-STRUCT-03 [{name}] frontmatter description missing 'Use when:' routing trigger"
221+
)
222+
223+
# EVAL-STRUCT-04: 'Do not use when:' trigger present in description
224+
if name not in NO_DO_NOT_USE_EXEMPT:
225+
if "Do not use when:" not in frontmatter:
226+
failures.append(
227+
f"EVAL-STRUCT-04 [{name}] frontmatter description missing "
228+
f"'Do not use when:' routing trigger"
229+
)
230+
231+
return failures
232+
233+
234+
# ---------------------------------------------------------------------------
235+
# CAT-5 Cross-Skill Completeness
236+
# ---------------------------------------------------------------------------
237+
238+
def check_completeness(name, text, all_skill_names):
239+
failures = []
240+
241+
# EVAL-COMPLETE-01: Skill Boundaries section required
242+
if name not in NO_BOUNDARIES_EXEMPT:
243+
if not re.search(r"^##\s+skill boundaries", text, re.IGNORECASE | re.MULTILINE):
244+
failures.append(
245+
f"EVAL-COMPLETE-01 [{name}] missing '## Skill boundaries' section"
246+
)
247+
else:
248+
# EVAL-COMPLETE-02: cross-references in Skill Boundaries point to real skill names
249+
boundaries_match = re.search(
250+
r"##\s+skill boundaries(.*?)(?=^##|\Z)", text,
251+
re.IGNORECASE | re.MULTILINE | re.DOTALL
252+
)
253+
if boundaries_match:
254+
boundaries_text = boundaries_match.group(1)
255+
# Find all bold references like **dv-something**
256+
refs = re.findall(r"\*\*(dv-[\w-]+)\*\*", boundaries_text)
257+
for ref in refs:
258+
if ref not in all_skill_names:
259+
failures.append(
260+
f"EVAL-COMPLETE-02 [{name}] Skill Boundaries references "
261+
f"'{ref}' which is not a known skill"
262+
)
263+
264+
# EVAL-COMPLETE-04: no references to removed dv-python-sdk
265+
if "dv-python-sdk" in text:
266+
failures.append(
267+
f"EVAL-COMPLETE-04 [{name}] references removed skill 'dv-python-sdk' -- "
268+
f"update to 'dv-data' or 'dv-query'"
269+
)
270+
271+
return failures
272+
273+
274+
def check_overview_index(overview_path, all_skill_names):
275+
"""EVAL-COMPLETE-03: dv-overview Available Skills table lists every skill directory."""
276+
failures = []
277+
if not overview_path.exists():
278+
failures.append("EVAL-COMPLETE-03 [dv-overview] SKILL.md not found")
279+
return failures
280+
281+
text = overview_path.read_text(encoding="utf-8")
282+
for skill_name in all_skill_names:
283+
if skill_name == "dv-overview":
284+
continue # overview doesn't list itself
285+
if skill_name not in text:
286+
failures.append(
287+
f"EVAL-COMPLETE-03 [dv-overview] Available Skills table missing entry "
288+
f"for '{skill_name}'"
289+
)
290+
291+
return failures
292+
293+
294+
def check_readme_skill_count(skills_dir, all_skill_names):
295+
"""EVAL-COMPLETE-05: README.md skill count matches actual skill directories."""
296+
failures = []
297+
readme_path = skills_dir.parent.parent.parent.parent / "README.md"
298+
if not readme_path.exists():
299+
# README is optional — skip silently if not found
300+
return failures
301+
302+
text = readme_path.read_text(encoding="utf-8")
303+
actual_count = len(all_skill_names)
304+
305+
# Look for "N skills" pattern in README (e.g., "**5 skills**" or "6 skills")
306+
matches = re.findall(r"\*{0,2}(\d+)\s+skills\*{0,2}", text)
307+
for m in matches:
308+
claimed = int(m)
309+
if claimed != actual_count:
310+
failures.append(
311+
f"EVAL-COMPLETE-05 [README.md] claims '{claimed} skills' but "
312+
f"found {actual_count} skill directories"
313+
)
314+
315+
return failures
316+
317+
318+
# ---------------------------------------------------------------------------
319+
# Runner
320+
# ---------------------------------------------------------------------------
321+
322+
def main():
323+
parser = argparse.ArgumentParser(description="Static evals for Dataverse skill files")
324+
parser.add_argument(
325+
"--skills-dir",
326+
default=".github/plugins/dataverse/skills",
327+
help="Path to the skills directory (default: .github/plugins/dataverse/skills)",
328+
)
329+
args = parser.parse_args()
330+
331+
skills_dir = Path(args.skills_dir)
332+
if not skills_dir.exists():
333+
print(f"ERROR: skills directory not found: {skills_dir}", file=sys.stderr)
334+
sys.exit(2)
335+
336+
skill_files = sorted(skills_dir.glob("*/SKILL.md"))
337+
if not skill_files:
338+
print(f"ERROR: no SKILL.md files found under {skills_dir}", file=sys.stderr)
339+
sys.exit(2)
340+
341+
all_skill_names = {f.parent.name for f in skill_files}
342+
all_failures = []
343+
python_block_count = 0
344+
345+
for f in skill_files:
346+
text = f.read_text(encoding="utf-8")
347+
name = f.parent.name
348+
python_block_count += len(re.findall(r"```python\n", text))
349+
350+
all_failures.extend(check_python_blocks(name, text))
351+
all_failures.extend(check_auth_patterns(name, text))
352+
all_failures.extend(check_pac_cli(name, text))
353+
all_failures.extend(check_structure(name, text))
354+
all_failures.extend(check_completeness(name, text, all_skill_names))
355+
356+
# Cross-skill checks — need all files loaded
357+
overview_path = skills_dir / "dv-overview" / "SKILL.md"
358+
all_failures.extend(check_overview_index(overview_path, all_skill_names))
359+
all_failures.extend(check_readme_skill_count(skills_dir, all_skill_names))
360+
361+
if all_failures:
362+
# Group output by category prefix for readability
363+
categories = {}
364+
for f in all_failures:
365+
cat = f.split("-")[0] + "-" + f.split("-")[1] # e.g. EVAL-PY
366+
categories.setdefault(cat, []).append(f)
367+
368+
print(f"FAILED -- {len(all_failures)} issue(s) across {len(skill_files)} skill files:\n")
369+
for cat in sorted(categories):
370+
print(f" [{cat}]")
371+
for issue in categories[cat]:
372+
print(f" FAIL {issue}")
373+
print()
374+
sys.exit(1)
375+
else:
376+
print(
377+
f"PASSED -- {len(skill_files)} skill files, "
378+
f"{python_block_count} Python blocks, "
379+
f"5 categories checked"
380+
)
381+
382+
383+
if __name__ == "__main__":
384+
main()

0 commit comments

Comments
 (0)