Skip to content

DEMO: intentional skill regression for eval pipeline validation (DO NOT MERGE)#61

Draft
saurabhrb wants to merge 1 commit into
mainfrom
users/saurabhrb/evals-bad-skill-demo-v2
Draft

DEMO: intentional skill regression for eval pipeline validation (DO NOT MERGE)#61
saurabhrb wants to merge 1 commit into
mainfrom
users/saurabhrb/evals-bad-skill-demo-v2

Conversation

@saurabhrb
Copy link
Copy Markdown
Contributor

Purpose

Demo branch for validating the eval pipeline catches skill regressions. DO NOT MERGE.

Recreated from a fresh branch off main (replaces closed PR #58) now that the pipeline default branch is main.

What's regressed

dv-data/SKILL.md replaces CreateMultiple bulk-create guidance with a per-record loop antipattern:

  • Bulk create via list-arg replaced with per-record for loop
  • CreateMultiple references removed
  • Chunking/adaptive helpers removed

How it's used

The ADO pipeline DVSkillsPlugin-Evals-PR (32010) runs against this branch. The data_003_skill_contract test asks the agent to report what the skill teaches, and NOT_CONTAINS: assertions catch the regressed content.

Expected result: 2/3 FAIL (data_003 catches the regression; data_001 and data_002 may still pass due to model prior knowledge).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant