feat: improve aqua-finetuning skill score 79% → 96% by yogesh-tessl · Pull Request #1381 · oracle/accelerated-data-science

yogesh-tessl · 2026-05-19T08:28:29Z

ran your skills through tessl skill review at work and found some targeted improvements. Here's the before/after:

Skill	Before	After	Change
aqua-finetuning	79%	96%	+17%

Changes summary

aqua-finetuning (+17%) - picked this skill because it had the most improvement headroom alongside aqua-troubleshooting, and the changes are cleanly additive:

Expanded description trigger terms - added train, adapt, retrain, transfer-learn, and PEFT so the skill matches more natural user queries beyond just "fine-tune"
Added end-to-end workflow - new numbered 6-step sequence (prepare dataset → check config → create job → monitor status → review metrics → deploy) with validation checkpoints at each stage
Added status monitoring loop - Python SDK example now includes a polling loop that checks lifecycle_state until SUCCEEDED or FAILED, so users don't have to guess how to track job progress
Consolidated redundant code examples - replaced three near-duplicate CreateFineTuningDetails blocks (basic, advanced LoRA, validation split) with one complete example plus a variations table showing only the differing parameters
Removed generic ML explanations - cut "Loss should decrease over epochs" / "Accuracy should increase" prose that Claude already knows, keeping the Training Metrics section implicit in the workflow
Formatted description as quoted string - switched from bare string to standard YAML quoted format in frontmatter

quick honest disclosure. I work at https://github.com/tesslio where we build tooling around skills like these. Not a pitch, just saw room for improvement and wanted to contribute.

If you want to self-improve your skills, or define your own scenarios to pressure test, just ask your agent (Claude Code, Codex, etc.) to evaluate and optimize your skill with Tessl. Ping me @yogesh-tessl, if you hit any snags.

@qiuosier

Hey @qiuosier 👋 I ran your skills through `tessl skill review` at work and found some targeted improvements. Here's the full before/after: | Skill | Before | After | Change | |-------|--------|-------|--------| | aqua-finetuning | 79% | 96% | **+17%** | | aqua-deployment | 88% | 88% | — | | aqua-model-lifecycle | 88% | 88% | — | | aqua-evaluation | 85% | 85% | — | | aqua-metrics | 85% | 85% | — | | aqua-cli | 84% | 84% | — | | oci-data-science | 81% | 81% | — | | aqua-troubleshooting | 79% | 79% | — | <details> <summary>Changes summary</summary> **aqua-finetuning (+17%)** — picked this skill because it had the most improvement headroom alongside `aqua-troubleshooting`, and the changes are cleanly additive: - **Expanded description trigger terms** — added `train`, `adapt`, `retrain`, `transfer-learn`, and `PEFT` so the skill matches more natural user queries beyond just "fine-tune" - **Added end-to-end workflow** — new numbered 6-step sequence (prepare dataset → check config → create job → monitor status → review metrics → deploy) with validation checkpoints at each stage - **Added status monitoring loop** — Python SDK example now includes a polling loop that checks `lifecycle_state` until `SUCCEEDED` or `FAILED`, so users don't have to guess how to track job progress - **Consolidated redundant code examples** — replaced three near-duplicate `CreateFineTuningDetails` blocks (basic, advanced LoRA, validation split) with one complete example plus a variations table showing only the differing parameters - **Removed generic ML explanations** — cut "Loss should decrease over epochs" / "Accuracy should increase" prose that Claude already knows, keeping the Training Metrics section implicit in the workflow - **Formatted description as quoted string** — switched from bare string to standard YAML quoted format in frontmatter </details> I also stress-tested your `aqua-finetuning` skill against a few real-world task evals and it held up really well on multi-format JSONL dataset preparation with instruction-to-conversational auto-conversion. Kudos for that. Honest disclosure — I work at @tesslio where we build tooling around skills like these. Not a pitch — just saw room for improvement and wanted to contribute. Want to self-improve your skills? Just point your agent (Claude Code, Codex, etc.) at [this Tessl guide](https://docs.tessl.io/evaluate/optimize-a-skill-using-best-practices) and ask it to optimize your skill. Ping me — [@yogesh-tessl](https://github.com/yogesh-tessl) — if you hit any snags. Thanks in advance 🙏

qiuosier · 2026-05-19T12:54:35Z

Thanks for the contribution and for disclosing the Tessl affiliation. Since this PR changes a skill and the main justification is a claimed score improvement from 79% to 96%, could you please add the supporting evaluation details? Specifically: the scoring rubric, the exact before/after eval scenarios, the command/version used, raw or summarized eval outputs, screenshots and whether the score is reproducible by maintainers without a private/vendor-only tool.

Also, please add testing evidence for the skill change itself: at minimum, manual test prompts that trigger the skill, expected behavior before/after, and validation that the SDK examples remain correct and runnable. Without that, the score claim is not independently verifiable, and the PR currently reads more like tool promotion than a reviewable engineering change.

yogesh-tessl · 2026-05-23T05:07:18Z

@qiuosier, that's a fair ask. You're right that the score on its own isn't verifiable without the rubric and inputs, and the consolidated code examples should be validated against the actual SDK. Let me address each point.

Reproducing the score locally

tessl skill review is free and open to run. Here's how any maintainer can verify:

Install: npm install -g tessl
Run against the skill before the change: tessl skill review .claude/skills/aqua-finetuning/SKILL.md
Run against the branch with the change and compare

The scoring rubric is documented here:

Testing evidence

I'll add:

Manual test prompts that trigger aqua-finetuning (e.g. "fine-tune Llama on my JSONL dataset", "retrain a model with LoRA on OCI")
Expected behaviour before/after for each prompt
Verification that the CreateFineTuningDetails SDK examples still match the current API surface and are runnable

Appreciate you holding this to a high bar.

yogesh-tessl requested review from mrDzurb, sambitkumohanty and smfirmin as code owners May 19, 2026 08:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve aqua-finetuning skill score 79% → 96%#1381

feat: improve aqua-finetuning skill score 79% → 96%#1381
yogesh-tessl wants to merge 1 commit into
oracle:mainfrom
yogesh-tessl:improve/skill-review-optimization

yogesh-tessl commented May 19, 2026

Uh oh!

qiuosier commented May 19, 2026

Uh oh!

yogesh-tessl commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yogesh-tessl commented May 19, 2026

Uh oh!

qiuosier commented May 19, 2026

Uh oh!

yogesh-tessl commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants