Add Gemini Agent Skill for Benchmarks CLI#994
Conversation
Add an Agent Skill file (.gemini/skills/kaggle-benchmarks/SKILL.md) that provides on-demand procedural knowledge for the Kaggle Benchmarks CLI workflow. The skill covers: - Command hierarchy and aliases (kaggle b t ...) - Setup and authentication (init, auth) - Core workflow: push → run → status → download - Task file format (percent format, @task decorators, .run()) - Error scenarios and messages - Model slug conventions - Common workflow recipes The skill uses YAML frontmatter (name + description) so Gemini CLI auto-discovers it and loads the content on-demand when relevant. Also update .gitignore to allow .gemini/skills/ to be committed while keeping other .gemini/ contents ignored.
Add README.md documenting how different AI agents (Gemini CLI, Claude Code, Cursor, Copilot, Aider) can consume the SKILL.md file. Fix .gitignore to use .gemini/* (contents) instead of .gemini/ (directory) so that negation rules for .gemini/skills/ work correctly without needing git add -f.
Remove 'Source Code Reference' and 'Key Implementation Details' sections from SKILL.md — these contained internal file paths and implementation details already covered inline in user-facing sections. Remove 'Validation' section from README.md — references test files not shipped in this repo.
develra
left a comment
There was a problem hiding this comment.
I wonder if we should just put this in a top level skills directory as it's a bit strange to only provide it for Gemini - but over all LGTM
| - `LLM_DEFAULT_EVAL` — Default eval model slug | ||
| - `LLMS_AVAILABLE` — Comma-separated list of available model slugs | ||
|
|
||
| **⚠ Note:** Environment variables are **appended** to the env file (not overwritten). Running `init` or `auth` multiple times will create duplicate entries. Manually clean up the file if needed. |
There was a problem hiding this comment.
is this just for safety? Makes sense but wondering if there is a nicer UX flow.
There was a problem hiding this comment.
Yes maybe in another PR. cc @andrewmwang
There was a problem hiding this comment.
Yeah this is just for safety, that said when we load via dotenv we override so the last appended env vars are used (ie: theres no harm in running multiple times)
There was a problem hiding this comment.
Soften the warning.
|
Any reason for putting it in a We could just put it under the Maybe under Alternatively, we may want on overarching SKILL.md for the entire CLI and have a section for specific tasks defined in See: https://github.com/microsoft/playwright-cli/blob/main/skills/playwright-cli/SKILL.md#specific-tasks |
…n docs Address PR #994 review feedback: 1. Move SKILL.md from .gemini/skills/kaggle-benchmarks/ to skills/benchmarks/ per rosbo's suggestion to make the skill agent-agnostic (not tied to Gemini CLI's .gemini/ convention). Any agent can now discover it under the top-level skills/ directory. 2. Add 'Local Iteration Loop' section per develra's suggestion. Documents how to test tasks locally against the Model Proxy before pushing to the server (source .env, run with python, check .run.json output). 3. Simplify .gitignore: remove the .gemini/skills/ exception since skills now live at the top level. Remove the now-empty README.md. 4. Rename 'Quick Iteration Loop' to 'Quick Push-Run-Download' for clarity since the new 'Local Iteration Loop' section covers true local iteration.
Adopt the directory layout Vincent (rosbo) described:
skills/
├── SKILL.md ← Overarching CLI skill (all command groups)
└── references/
└── benchmarks.md ← Detailed benchmarks workflow reference
Changes:
- Create skills/SKILL.md as a lightweight entry point for the entire kaggle
CLI, listing all command groups and linking to detailed references.
- Move skills/benchmarks/SKILL.md → skills/references/benchmarks.md as a
reference doc (strip YAML frontmatter, update title).
- This matches the Playwright CLI pattern where SKILL.md is the top-level
skill and references/ holds detailed task-specific docs.
Vincent can later flesh out SKILL.md with more command groups and add
additional reference files (datasets.md, competitions.md, etc.).
|
@rosbo I restructured here with the pattern. Also place a placeholder skill.MD - we can add more in the future. Let me know. Thanks |
rosbo
left a comment
There was a problem hiding this comment.
I would be interested to review a trajectory of using this skill to publish a benchmark task. Can you include a link in the PR description?
| # Browser-based OAuth login | ||
| kaggle auth login | ||
|
|
||
| # Or place API token at ~/.kaggle/kaggle.json |
There was a problem hiding this comment.
kaggle.json is for legacy credentials.
The new API token should use the KAGGLE_API_TOKEN (as you mentioned) or store the token in a ~/.kaggle/access_token file.
https://github.com/Kaggle/kaggle-cli/blob/main/docs/README.md#option-2-environment-variable
|
|
||
| ```bash | ||
| kaggle b auth -y | ||
| kaggle b auth -y --env-file custom.env |
There was a problem hiding this comment.
I would add a comment to explain the second line if it's important to keep it.
It seems to indicate you need to run both which is not the case AFAIK.
For the skills, you probably just want to use the default...
|
|
||
| ```bash | ||
| # Initialize with defaults (writes .env, example_task.py, kaggle_benchmarks_reference.md) | ||
| kaggle b init -y |
There was a problem hiding this comment.
Does it always write the example_task.py or only when --example-file is passed?
There was a problem hiding this comment.
Changed to "always write". @andrewmwang can you confirm?
| **More complete example with assertions and pip install:** | ||
| ```python | ||
| # %% | ||
| !pip install numpy |
There was a problem hiding this comment.
Why are you installing numpy here?
There was a problem hiding this comment.
I was trying to show actually some "invalid" python can also work like in notebook. But it's probably confusing. Removed.
| kaggle b t push my-task -f task.py | ||
|
|
||
| # Push and wait for server-side creation to complete | ||
| kaggle b t push my-task -f task.py --wait |
There was a problem hiding this comment.
For the skill to work well, should you be prescriptive and always use the --wait option?
Add Agent Skill for Benchmarks CLI
Adds agent skill documentation for the Kaggle Benchmarks CLI workflow, following the Playwright CLI pattern.
Structure
SKILL.mdis a lightweight entry point (~45 lines) covering all CLI command groups, with links to detailed references.references/benchmarks.mdis the detailed reference (~420 lines) for the benchmarks workflow: setup, push, run, status, download, error handling, and common workflows.This structure allows future reference files (datasets.md, competitions.md, etc.) to be added without bloating the entry point.
Using with AI Coding Agents
The
skills/directory is agent-agnostic. To integrate with a specific tool:.gemini/skills/kaggle-cli/ → ../../skills/CLAUDE.md:See skills/references/benchmarks.md.cursor/rules/skills/SKILL.mdorskills/references/benchmarks.mddirectlyTested
Changes
skills/SKILL.md— [NEW] Top-level CLI skillskills/references/benchmarks.md— [NEW] Benchmarks reference