feat: improve aqua-finetuning skill score 79% → 96%#1381
Conversation
Hey @qiuosier 👋 I ran your skills through `tessl skill review` at work and found some targeted improvements. Here's the full before/after: | Skill | Before | After | Change | |-------|--------|-------|--------| | aqua-finetuning | 79% | 96% | **+17%** | | aqua-deployment | 88% | 88% | — | | aqua-model-lifecycle | 88% | 88% | — | | aqua-evaluation | 85% | 85% | — | | aqua-metrics | 85% | 85% | — | | aqua-cli | 84% | 84% | — | | oci-data-science | 81% | 81% | — | | aqua-troubleshooting | 79% | 79% | — | <details> <summary>Changes summary</summary> **aqua-finetuning (+17%)** — picked this skill because it had the most improvement headroom alongside `aqua-troubleshooting`, and the changes are cleanly additive: - **Expanded description trigger terms** — added `train`, `adapt`, `retrain`, `transfer-learn`, and `PEFT` so the skill matches more natural user queries beyond just "fine-tune" - **Added end-to-end workflow** — new numbered 6-step sequence (prepare dataset → check config → create job → monitor status → review metrics → deploy) with validation checkpoints at each stage - **Added status monitoring loop** — Python SDK example now includes a polling loop that checks `lifecycle_state` until `SUCCEEDED` or `FAILED`, so users don't have to guess how to track job progress - **Consolidated redundant code examples** — replaced three near-duplicate `CreateFineTuningDetails` blocks (basic, advanced LoRA, validation split) with one complete example plus a variations table showing only the differing parameters - **Removed generic ML explanations** — cut "Loss should decrease over epochs" / "Accuracy should increase" prose that Claude already knows, keeping the Training Metrics section implicit in the workflow - **Formatted description as quoted string** — switched from bare string to standard YAML quoted format in frontmatter </details> I also stress-tested your `aqua-finetuning` skill against a few real-world task evals and it held up really well on multi-format JSONL dataset preparation with instruction-to-conversational auto-conversion. Kudos for that. Honest disclosure — I work at @tesslio where we build tooling around skills like these. Not a pitch — just saw room for improvement and wanted to contribute. Want to self-improve your skills? Just point your agent (Claude Code, Codex, etc.) at [this Tessl guide](https://docs.tessl.io/evaluate/optimize-a-skill-using-best-practices) and ask it to optimize your skill. Ping me — [@yogesh-tessl](https://github.com/yogesh-tessl) — if you hit any snags. Thanks in advance 🙏
|
Thanks for the contribution and for disclosing the Tessl affiliation. Since this PR changes a skill and the main justification is a claimed score improvement from 79% to 96%, could you please add the supporting evaluation details? Specifically: the scoring rubric, the exact before/after eval scenarios, the command/version used, raw or summarized eval outputs, screenshots and whether the score is reproducible by maintainers without a private/vendor-only tool. Also, please add testing evidence for the skill change itself: at minimum, manual test prompts that trigger the skill, expected behavior before/after, and validation that the SDK examples remain correct and runnable. Without that, the score claim is not independently verifiable, and the PR currently reads more like tool promotion than a reviewable engineering change. |
|
@qiuosier, that's a fair ask. You're right that the score on its own isn't verifiable without the rubric and inputs, and the consolidated code examples should be validated against the actual SDK. Let me address each point. Reproducing the score locally
The scoring rubric is documented here: Testing evidence I'll add:
Appreciate you holding this to a high bar. |
Hey @qiuosier 👋
ran your skills through
tessl skill reviewat work and found some targeted improvements. Here's the before/after:Changes summary
aqua-finetuning (+17%) - picked this skill because it had the most improvement headroom alongside
aqua-troubleshooting, and the changes are cleanly additive:train,adapt,retrain,transfer-learn, andPEFTso the skill matches more natural user queries beyond just "fine-tune"lifecycle_stateuntilSUCCEEDEDorFAILED, so users don't have to guess how to track job progressCreateFineTuningDetailsblocks (basic, advanced LoRA, validation split) with one complete example plus a variations table showing only the differing parametersquick honest disclosure. I work at https://github.com/tesslio where we build tooling around skills like these. Not a pitch, just saw room for improvement and wanted to contribute.
If you want to self-improve your skills, or define your own scenarios to pressure test, just ask your agent (Claude Code, Codex, etc.) to evaluate and optimize your skill with Tessl. Ping me @yogesh-tessl, if you hit any snags.