Skip to content

update eval-driven-dev skill#1352

Merged
aaronpowell merged 3 commits intogithub:stagedfrom
yiouli:staged
Apr 10, 2026
Merged

update eval-driven-dev skill#1352
aaronpowell merged 3 commits intogithub:stagedfrom
yiouli:staged

Conversation

@yiouli
Copy link
Copy Markdown
Contributor

@yiouli yiouli commented Apr 9, 2026

Pull Request Checklist

  • I have read and followed the CONTRIBUTING.md guidelines.
  • I have read and followed the Guidance for submissions involving paid services.
  • My contribution adds a new instruction, prompt, agent, skill, or workflow file in the correct directory.
  • The file follows the required naming convention.
  • The content is clearly structured and follows the example format.
  • I have tested my instructions, prompt, agent, skill, or workflow with GitHub Copilot.
  • I have run npm start and verified that README.md is up to date.
  • I am targeting the staged branch for this pull request.

Description

Update eval-driven-dev skill.

Streamlined the workflow, reducing setup requirement & decision making complexity.


Type of Contribution

  • New instruction file.
  • New prompt file.
  • New agent file.
  • New plugin.
  • New skill file.
  • New agentic workflow.
  • Update to existing instruction, prompt, agent, plugin, skill, or workflow.
  • Other (please specify):

Additional Notes

Addressed original feedback around setup & clean up, originally the setup is done in pixie start from pixie-qa package but cleanup is implemented in a bash script. This version moved the cleanup into the pixie-qa package as well.


By submitting this pull request, I confirm that my contribution abides by the Code of Conduct and will be licensed under the MIT License.

@yiouli yiouli requested a review from aaronpowell as a code owner April 9, 2026 17:15
Copilot AI review requested due to automatic review settings April 9, 2026 17:15
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

🔍 Skill Validator Results

2 resource(s) checked | ✅ All checks passed

Full output
Found 1 skill(s)
[eval-driven-dev] 📊 eval-driven-dev: 2,297 BPE tokens [chars/4: 2,551] (detailed ✓), 12 sections, 1 code blocks
[eval-driven-dev]    ⚠  No numbered workflow steps — agents follow sequenced procedures more reliably.
�[32m✅ All checks passed (1 skill(s))�[0m

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the eval-driven-dev skill to a streamlined workflow built around pixie.wrap() instrumentation, a Runnable-based harness, and a simplified setup process (including web UI lifecycle).

Changes:

  • Refactors the skill workflow (Steps 1–6) to use wrap() + Runnable + reference-trace → dataset → evaluators → pixie test + pixie analyze.
  • Adds a resources/setup.sh helper to update the skill, install/upgrade pixie-qa, init pixie_qa/, and start the web UI.
  • Replaces older reference docs with a new, more granular set (entry-point, eval-criteria, wrap+trace, evaluators, dataset, test-running, investigation), and updates the skills index.

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
skills/eval-driven-dev/SKILL.md Updates the end-to-end workflow and checkpoints for wrap/Runnable-based eval-driven dev.
skills/eval-driven-dev/resources/setup.sh Adds a setup script to automate skill/package updates and pixie init/start.
skills/eval-driven-dev/references/wrap-api.md Adds wrap/Runnable/CLI API reference documentation.
skills/eval-driven-dev/references/testing-api.md Adds testing/evals API reference documentation.
skills/eval-driven-dev/references/evaluators.md Adds built-in evaluator catalog + create_llm_evaluator reference.
skills/eval-driven-dev/references/1-a-entry-point.md New Step 1a guidance and output template.
skills/eval-driven-dev/references/1-b-eval-criteria.md New Step 1b guidance and output template.
skills/eval-driven-dev/references/2-wrap-and-trace.md New Step 2 guidance for wrap instrumentation + Runnable + reference trace.
skills/eval-driven-dev/references/3-define-evaluators.md New Step 3 guidance for evaluator selection + mapping artifact.
skills/eval-driven-dev/references/4-build-dataset.md New Step 4 guidance for dataset format and construction.
skills/eval-driven-dev/references/5-run-tests.md New Step 5 guidance for running tests and fixing harness/dataset issues.
skills/eval-driven-dev/references/6-investigate.md Updates iteration guidance with a stop gate and analysis-first flow.
skills/eval-driven-dev/references/understanding-app.md Removes the prior Step 1 reference (superseded by new Step 1a/1b + Step 2).
skills/eval-driven-dev/references/instrumentation.md Removes old @observe/enable_storage guidance (superseded by wrap docs).
skills/eval-driven-dev/references/run-harness-patterns.md Removes old harness patterns doc (superseded by Runnable guidance).
skills/eval-driven-dev/references/pixie-api.md Removes prior monolithic API reference (replaced by split references).
skills/eval-driven-dev/references/dataset-generation.md Removes prior dataset guide (replaced by new Step 4).
skills/eval-driven-dev/references/eval-tests.md Removes prior test-writing guide (replaced by new Step 3/5 + API refs).
docs/README.skills.md Updates the skill entry to the new reference set and resources.
Comments suppressed due to low confidence (2)

skills/eval-driven-dev/SKILL.md:148

  • The file ends with an opening Markdown code fence (```), but it is never closed. This breaks rendering for the remainder of the document; add the intended command(s) inside the fence and close it (or remove the fence).
And whenever you restart the workflow, always run the setup.sh script in resources again to ensure the web server is running:

skills/eval-driven-dev/references/6-investigate.md:40

  • Typo/grammar in the failure-notes bullets: “The the captured output …” has an extra “the”. Also, this section uses uv run pixie ... which may not work in poetry/pip projects given the setup script’s multi-manager support; consider using pixie ... or listing alternatives.

Comment thread skills/eval-driven-dev/SKILL.md Outdated
Comment thread skills/eval-driven-dev/resources/setup.sh Outdated
Comment thread skills/eval-driven-dev/references/wrap-api.md Outdated
Comment thread skills/eval-driven-dev/references/testing-api.md Outdated
Comment thread skills/eval-driven-dev/references/evaluators.md Outdated
Comment thread skills/eval-driven-dev/references/5-run-tests.md
Comment thread skills/eval-driven-dev/references/4-build-dataset.md Outdated
Comment thread skills/eval-driven-dev/references/2-wrap-and-trace.md Outdated
Comment thread skills/eval-driven-dev/references/4-build-dataset.md
@aaronpowell aaronpowell merged commit 5f59ddb into github:staged Apr 10, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants