Skip to content

Add Gemini Agent Skill for Benchmarks CLI#994

Merged
dolaameng merged 6 commits intomainfrom
add-benchmarks-agent-skill
May 1, 2026
Merged

Add Gemini Agent Skill for Benchmarks CLI#994
dolaameng merged 6 commits intomainfrom
add-benchmarks-agent-skill

Conversation

@dolaameng
Copy link
Copy Markdown
Contributor

@dolaameng dolaameng commented Apr 30, 2026

Add Agent Skill for Benchmarks CLI

Adds agent skill documentation for the Kaggle Benchmarks CLI workflow, following the Playwright CLI pattern.

Structure

skills/
├── SKILL.md                  ← Top-level skill for the entire kaggle CLI
└── references/
    └── benchmarks.md         ← Detailed benchmarks workflow reference
  • SKILL.md is a lightweight entry point (~45 lines) covering all CLI command groups, with links to detailed references.
  • references/benchmarks.md is the detailed reference (~420 lines) for the benchmarks workflow: setup, push, run, status, download, error handling, and common workflows.

This structure allows future reference files (datasets.md, competitions.md, etc.) to be added without bloating the entry point.

Using with AI Coding Agents

The skills/ directory is agent-agnostic. To integrate with a specific tool:

Tool Integration
Gemini CLI Symlink: .gemini/skills/kaggle-cli/ → ../../skills/
Claude Code Reference in CLAUDE.md: See skills/references/benchmarks.md
Cursor Add to .cursor/rules/
Other agents Point at skills/SKILL.md or skills/references/benchmarks.md directly

Tested

Changes

  • skills/SKILL.md[NEW] Top-level CLI skill
  • skills/references/benchmarks.md[NEW] Benchmarks reference

Add an Agent Skill file (.gemini/skills/kaggle-benchmarks/SKILL.md)
that provides on-demand procedural knowledge for the Kaggle Benchmarks
CLI workflow. The skill covers:

- Command hierarchy and aliases (kaggle b t ...)
- Setup and authentication (init, auth)
- Core workflow: push → run → status → download
- Task file format (percent format, @task decorators, .run())
- Error scenarios and messages
- Model slug conventions
- Common workflow recipes

The skill uses YAML frontmatter (name + description) so Gemini CLI
auto-discovers it and loads the content on-demand when relevant.

Also update .gitignore to allow .gemini/skills/ to be committed
while keeping other .gemini/ contents ignored.
Add README.md documenting how different AI agents (Gemini CLI,
Claude Code, Cursor, Copilot, Aider) can consume the SKILL.md file.

Fix .gitignore to use .gemini/* (contents) instead of .gemini/
(directory) so that negation rules for .gemini/skills/ work
correctly without needing git add -f.
Remove 'Source Code Reference' and 'Key Implementation Details'
sections from SKILL.md — these contained internal file paths and
implementation details already covered inline in user-facing sections.

Remove 'Validation' section from README.md — references test files
not shipped in this repo.
Copy link
Copy Markdown

@develra develra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should just put this in a top level skills directory as it's a bit strange to only provide it for Gemini - but over all LGTM

Comment thread skills/references/benchmarks.md Outdated
- `LLM_DEFAULT_EVAL` — Default eval model slug
- `LLMS_AVAILABLE` — Comma-separated list of available model slugs

**⚠ Note:** Environment variables are **appended** to the env file (not overwritten). Running `init` or `auth` multiple times will create duplicate entries. Manually clean up the file if needed.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this just for safety? Makes sense but wondering if there is a nicer UX flow.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes maybe in another PR. cc @andrewmwang

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is just for safety, that said when we load via dotenv we override so the last appended env vars are used (ie: theres no harm in running multiple times)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Soften the warning.

Comment thread .gemini/skills/kaggle-benchmarks/SKILL.md Outdated
@rosbo
Copy link
Copy Markdown
Contributor

rosbo commented Apr 30, 2026

Any reason for putting it in a ./gemini folder? It could work with any agent.

We could just put it under the skills folder: https://github.com/microsoft/playwright-cli/blob/main/skills/playwright-cli/SKILL.md

Maybe under skills/benchmarks/SKILL.md.

Alternatively, we may want on overarching SKILL.md for the entire CLI and have a section for specific tasks defined in /skills/references/benchmarks.md.

See: https://github.com/microsoft/playwright-cli/blob/main/skills/playwright-cli/SKILL.md#specific-tasks

…n docs

Address PR #994 review feedback:

1. Move SKILL.md from .gemini/skills/kaggle-benchmarks/ to skills/benchmarks/
   per rosbo's suggestion to make the skill agent-agnostic (not tied to
   Gemini CLI's .gemini/ convention). Any agent can now discover it under
   the top-level skills/ directory.

2. Add 'Local Iteration Loop' section per develra's suggestion. Documents
   how to test tasks locally against the Model Proxy before pushing to the
   server (source .env, run with python, check .run.json output).

3. Simplify .gitignore: remove the .gemini/skills/ exception since skills
   now live at the top level. Remove the now-empty README.md.

4. Rename 'Quick Iteration Loop' to 'Quick Push-Run-Download' for clarity
   since the new 'Local Iteration Loop' section covers true local iteration.
Adopt the directory layout Vincent (rosbo) described:

  skills/
  ├── SKILL.md                  ← Overarching CLI skill (all command groups)
  └── references/
      └── benchmarks.md         ← Detailed benchmarks workflow reference

Changes:
- Create skills/SKILL.md as a lightweight entry point for the entire kaggle
  CLI, listing all command groups and linking to detailed references.
- Move skills/benchmarks/SKILL.md → skills/references/benchmarks.md as a
  reference doc (strip YAML frontmatter, update title).
- This matches the Playwright CLI pattern where SKILL.md is the top-level
  skill and references/ holds detailed task-specific docs.

Vincent can later flesh out SKILL.md with more command groups and add
additional reference files (datasets.md, competitions.md, etc.).
@dolaameng dolaameng marked this pull request as ready for review April 30, 2026 22:17
@dolaameng
Copy link
Copy Markdown
Contributor Author

@rosbo I restructured here with the pattern. Also place a placeholder skill.MD - we can add more in the future. Let me know. Thanks

Copy link
Copy Markdown
Contributor

@rosbo rosbo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be interested to review a trajectory of using this skill to publish a benchmark task. Can you include a link in the PR description?

Comment thread skills/SKILL.md Outdated
# Browser-based OAuth login
kaggle auth login

# Or place API token at ~/.kaggle/kaggle.json
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kaggle.json is for legacy credentials.

The new API token should use the KAGGLE_API_TOKEN (as you mentioned) or store the token in a ~/.kaggle/access_token file.

https://github.com/Kaggle/kaggle-cli/blob/main/docs/README.md#option-2-environment-variable

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment thread skills/references/benchmarks.md Outdated

```bash
kaggle b auth -y
kaggle b auth -y --env-file custom.env
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a comment to explain the second line if it's important to keep it.

It seems to indicate you need to run both which is not the case AFAIK.

For the skills, you probably just want to use the default...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


```bash
# Initialize with defaults (writes .env, example_task.py, kaggle_benchmarks_reference.md)
kaggle b init -y
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it always write the example_task.py or only when --example-file is passed?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "always write". @andrewmwang can you confirm?

Comment thread skills/references/benchmarks.md Outdated
**More complete example with assertions and pip install:**
```python
# %%
!pip install numpy
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you installing numpy here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to show actually some "invalid" python can also work like in notebook. But it's probably confusing. Removed.

kaggle b t push my-task -f task.py

# Push and wait for server-side creation to complete
kaggle b t push my-task -f task.py --wait
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the skill to work well, should you be prescriptive and always use the --wait option?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@dolaameng dolaameng requested a review from rosbo May 1, 2026 00:32
@dolaameng dolaameng merged commit b150458 into main May 1, 2026
10 checks passed
@dolaameng dolaameng deleted the add-benchmarks-agent-skill branch May 1, 2026 15:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants