Add Gemini Agent Skill for Benchmarks CLI by dolaameng · Pull Request #994 · Kaggle/kaggle-cli

dolaameng · 2026-04-30T20:44:36Z

Add Agent Skill for Benchmarks CLI

Adds agent skill documentation for the Kaggle Benchmarks CLI workflow, following the Playwright CLI pattern.

Structure

skills/
├── SKILL.md                  ← Top-level skill for the entire kaggle CLI
└── references/
    └── benchmarks.md         ← Detailed benchmarks workflow reference

SKILL.md is a lightweight entry point (~45 lines) covering all CLI command groups, with links to detailed references.
references/benchmarks.md is the detailed reference (~420 lines) for the benchmarks workflow: setup, push, run, status, download, error handling, and common workflows.

This structure allows future reference files (datasets.md, competitions.md, etc.) to be added without bloating the entry point.

Using with AI Coding Agents

The skills/ directory is agent-agnostic. To integrate with a specific tool:

Tool	Integration
Gemini CLI	Symlink: `.gemini/skills/kaggle-cli/ → ../../skills/`
Claude Code	Reference in `CLAUDE.md`: `See skills/references/benchmarks.md`
Cursor	Add to `.cursor/rules/`
Other agents	Point at `skills/SKILL.md` or `skills/references/benchmarks.md` directly

Tested

auto test https://paste.googleplex.com/4609226241605632
trajectory with gemini: https://paste.googleplex.com/4861956914806784

Changes

skills/SKILL.md — [NEW] Top-level CLI skill
skills/references/benchmarks.md — [NEW] Benchmarks reference

@task

Add an Agent Skill file (.gemini/skills/kaggle-benchmarks/SKILL.md) that provides on-demand procedural knowledge for the Kaggle Benchmarks CLI workflow. The skill covers: - Command hierarchy and aliases (kaggle b t ...) - Setup and authentication (init, auth) - Core workflow: push → run → status → download - Task file format (percent format, @task decorators, .run()) - Error scenarios and messages - Model slug conventions - Common workflow recipes The skill uses YAML frontmatter (name + description) so Gemini CLI auto-discovers it and loads the content on-demand when relevant. Also update .gitignore to allow .gemini/skills/ to be committed while keeping other .gemini/ contents ignored.

Add README.md documenting how different AI agents (Gemini CLI, Claude Code, Cursor, Copilot, Aider) can consume the SKILL.md file. Fix .gitignore to use .gemini/* (contents) instead of .gemini/ (directory) so that negation rules for .gemini/skills/ work correctly without needing git add -f.

Remove 'Source Code Reference' and 'Key Implementation Details' sections from SKILL.md — these contained internal file paths and implementation details already covered inline in user-facing sections. Remove 'Validation' section from README.md — references test files not shipped in this repo.

develra

I wonder if we should just put this in a top level skills directory as it's a bit strange to only provide it for Gemini - but over all LGTM

develra · 2026-04-30T21:11:27Z

+- `LLM_DEFAULT_EVAL` — Default eval model slug
+- `LLMS_AVAILABLE` — Comma-separated list of available model slugs
+
+**⚠ Note:** Environment variables are **appended** to the env file (not overwritten). Running `init` or `auth` multiple times will create duplicate entries. Manually clean up the file if needed.


is this just for safety? Makes sense but wondering if there is a nicer UX flow.

Yes maybe in another PR. cc @andrewmwang

Yeah this is just for safety, that said when we load via dotenv we override so the last appended env vars are used (ie: theres no harm in running multiple times)

Soften the warning.

rosbo · 2026-04-30T21:14:27Z

Any reason for putting it in a ./gemini folder? It could work with any agent.

We could just put it under the skills folder: https://github.com/microsoft/playwright-cli/blob/main/skills/playwright-cli/SKILL.md

Maybe under skills/benchmarks/SKILL.md.

Alternatively, we may want on overarching SKILL.md for the entire CLI and have a section for specific tasks defined in /skills/references/benchmarks.md.

See: https://github.com/microsoft/playwright-cli/blob/main/skills/playwright-cli/SKILL.md#specific-tasks

…n docs Address PR #994 review feedback: 1. Move SKILL.md from .gemini/skills/kaggle-benchmarks/ to skills/benchmarks/ per rosbo's suggestion to make the skill agent-agnostic (not tied to Gemini CLI's .gemini/ convention). Any agent can now discover it under the top-level skills/ directory. 2. Add 'Local Iteration Loop' section per develra's suggestion. Documents how to test tasks locally against the Model Proxy before pushing to the server (source .env, run with python, check .run.json output). 3. Simplify .gitignore: remove the .gemini/skills/ exception since skills now live at the top level. Remove the now-empty README.md. 4. Rename 'Quick Iteration Loop' to 'Quick Push-Run-Download' for clarity since the new 'Local Iteration Loop' section covers true local iteration.

Adopt the directory layout Vincent (rosbo) described: skills/ ├── SKILL.md ← Overarching CLI skill (all command groups) └── references/ └── benchmarks.md ← Detailed benchmarks workflow reference Changes: - Create skills/SKILL.md as a lightweight entry point for the entire kaggle CLI, listing all command groups and linking to detailed references. - Move skills/benchmarks/SKILL.md → skills/references/benchmarks.md as a reference doc (strip YAML frontmatter, update title). - This matches the Playwright CLI pattern where SKILL.md is the top-level skill and references/ holds detailed task-specific docs. Vincent can later flesh out SKILL.md with more command groups and add additional reference files (datasets.md, competitions.md, etc.).

dolaameng · 2026-04-30T22:18:10Z

@rosbo I restructured here with the pattern. Also place a placeholder skill.MD - we can add more in the future. Let me know. Thanks

rosbo

I would be interested to review a trajectory of using this skill to publish a benchmark task. Can you include a link in the PR description?

rosbo · 2026-04-30T22:35:28Z

+# Browser-based OAuth login
+kaggle auth login
+
+# Or place API token at ~/.kaggle/kaggle.json


kaggle.json is for legacy credentials.

The new API token should use the KAGGLE_API_TOKEN (as you mentioned) or store the token in a ~/.kaggle/access_token file.

https://github.com/Kaggle/kaggle-cli/blob/main/docs/README.md#option-2-environment-variable

rosbo · 2026-04-30T22:37:53Z

+
+```bash
+kaggle b auth -y
+kaggle b auth -y --env-file custom.env


I would add a comment to explain the second line if it's important to keep it.

It seems to indicate you need to run both which is not the case AFAIK.

For the skills, you probably just want to use the default...

rosbo · 2026-04-30T22:38:41Z

+
+```bash
+# Initialize with defaults (writes .env, example_task.py, kaggle_benchmarks_reference.md)
+kaggle b init -y


Does it always write the example_task.py or only when --example-file is passed?

Changed to "always write". @andrewmwang can you confirm?

rosbo · 2026-04-30T22:39:13Z

+**More complete example with assertions and pip install:**
+```python
+# %%
+!pip install numpy


Why are you installing numpy here?

I was trying to show actually some "invalid" python can also work like in notebook. But it's probably confusing. Removed.

rosbo · 2026-04-30T22:40:15Z

+kaggle b t push my-task -f task.py
+
+# Push and wait for server-side creation to complete
+kaggle b t push my-task -f task.py --wait


For the skill to work well, should you be prescriptive and always use the --wait option?

dolaameng added 3 commits April 30, 2026 13:38

dolaameng requested review from andrewmwang, develra, rosbo and stevemessick April 30, 2026 21:10

develra approved these changes Apr 30, 2026

View reviewed changes

dolaameng added 2 commits April 30, 2026 14:50

dolaameng marked this pull request as ready for review April 30, 2026 22:17

rosbo reviewed Apr 30, 2026

View reviewed changes

address comments

2766975

dolaameng requested a review from rosbo May 1, 2026 00:32

rosbo approved these changes May 1, 2026

View reviewed changes

dolaameng merged commit b150458 into main May 1, 2026
10 checks passed

dolaameng deleted the add-benchmarks-agent-skill branch May 1, 2026 15:33

Conversation

dolaameng commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Add Agent Skill for Benchmarks CLI

Structure

Using with AI Coding Agents

Tested

Changes

Uh oh!

develra left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rosbo commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dolaameng commented Apr 30, 2026

Uh oh!

rosbo left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

dolaameng commented Apr 30, 2026 •

edited

Loading

rosbo commented Apr 30, 2026 •

edited

Loading