update eval-driven-dev skill#1352
Merged
aaronpowell merged 3 commits intogithub:stagedfrom Apr 10, 2026
Merged
Conversation
Contributor
🔍 Skill Validator Results2 resource(s) checked | ✅ All checks passed Full output |
Contributor
There was a problem hiding this comment.
Pull request overview
Updates the eval-driven-dev skill to a streamlined workflow built around pixie.wrap() instrumentation, a Runnable-based harness, and a simplified setup process (including web UI lifecycle).
Changes:
- Refactors the skill workflow (Steps 1–6) to use
wrap()+Runnable+ reference-trace → dataset → evaluators →pixie test+pixie analyze. - Adds a
resources/setup.shhelper to update the skill, install/upgradepixie-qa, initpixie_qa/, and start the web UI. - Replaces older reference docs with a new, more granular set (entry-point, eval-criteria, wrap+trace, evaluators, dataset, test-running, investigation), and updates the skills index.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| skills/eval-driven-dev/SKILL.md | Updates the end-to-end workflow and checkpoints for wrap/Runnable-based eval-driven dev. |
| skills/eval-driven-dev/resources/setup.sh | Adds a setup script to automate skill/package updates and pixie init/start. |
| skills/eval-driven-dev/references/wrap-api.md | Adds wrap/Runnable/CLI API reference documentation. |
| skills/eval-driven-dev/references/testing-api.md | Adds testing/evals API reference documentation. |
| skills/eval-driven-dev/references/evaluators.md | Adds built-in evaluator catalog + create_llm_evaluator reference. |
| skills/eval-driven-dev/references/1-a-entry-point.md | New Step 1a guidance and output template. |
| skills/eval-driven-dev/references/1-b-eval-criteria.md | New Step 1b guidance and output template. |
| skills/eval-driven-dev/references/2-wrap-and-trace.md | New Step 2 guidance for wrap instrumentation + Runnable + reference trace. |
| skills/eval-driven-dev/references/3-define-evaluators.md | New Step 3 guidance for evaluator selection + mapping artifact. |
| skills/eval-driven-dev/references/4-build-dataset.md | New Step 4 guidance for dataset format and construction. |
| skills/eval-driven-dev/references/5-run-tests.md | New Step 5 guidance for running tests and fixing harness/dataset issues. |
| skills/eval-driven-dev/references/6-investigate.md | Updates iteration guidance with a stop gate and analysis-first flow. |
| skills/eval-driven-dev/references/understanding-app.md | Removes the prior Step 1 reference (superseded by new Step 1a/1b + Step 2). |
| skills/eval-driven-dev/references/instrumentation.md | Removes old @observe/enable_storage guidance (superseded by wrap docs). |
| skills/eval-driven-dev/references/run-harness-patterns.md | Removes old harness patterns doc (superseded by Runnable guidance). |
| skills/eval-driven-dev/references/pixie-api.md | Removes prior monolithic API reference (replaced by split references). |
| skills/eval-driven-dev/references/dataset-generation.md | Removes prior dataset guide (replaced by new Step 4). |
| skills/eval-driven-dev/references/eval-tests.md | Removes prior test-writing guide (replaced by new Step 3/5 + API refs). |
| docs/README.skills.md | Updates the skill entry to the new reference set and resources. |
Comments suppressed due to low confidence (2)
skills/eval-driven-dev/SKILL.md:148
- The file ends with an opening Markdown code fence (```), but it is never closed. This breaks rendering for the remainder of the document; add the intended command(s) inside the fence and close it (or remove the fence).
And whenever you restart the workflow, always run the setup.sh script in resources again to ensure the web server is running:
skills/eval-driven-dev/references/6-investigate.md:40
- Typo/grammar in the failure-notes bullets: “The
the captured output…” has an extra “the”. Also, this section usesuv run pixie ...which may not work in poetry/pip projects given the setup script’s multi-manager support; consider usingpixie ...or listing alternatives.
aaronpowell
approved these changes
Apr 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pull Request Checklist
npm startand verified thatREADME.mdis up to date.stagedbranch for this pull request.Description
Update eval-driven-dev skill.
Streamlined the workflow, reducing setup requirement & decision making complexity.
Type of Contribution
Additional Notes
Addressed original feedback around setup & clean up, originally the setup is done in
pixie startfrom pixie-qa package but cleanup is implemented in a bash script. This version moved the cleanup into the pixie-qa package as well.By submitting this pull request, I confirm that my contribution abides by the Code of Conduct and will be licensed under the MIT License.