Skip to content

feat: Add agent skills for NeMo Gym#1061

Closed
lbliii wants to merge 12 commits intomainfrom
lbliii/prague-v1
Closed

feat: Add agent skills for NeMo Gym#1061
lbliii wants to merge 12 commits intomainfrom
lbliii/prague-v1

Conversation

@lbliii
Copy link
Copy Markdown
Contributor

@lbliii lbliii commented Apr 13, 2026

Summary

  • Adds 7 agent skills following the agentskills.io spec: gym-review, gym-debug, gym-profile, gym-config, gym-data, gym-scaffold-agent, and updates to add-benchmark
  • Each skill includes evals/evals.json with 3 assertion-based evals (21 total) and a chains.yaml for multi-step workflows
  • gym-review is a reference implementation with a standalone deterministic Python checker (scripts/review.py), self-contained anti-pattern/fix-pattern references, and portable eval fixtures — works without the NeMo Gym repo
  • Skills cover the full contributor workflow: data prep, config composition, server scaffolding, code review, debugging, and reward profiling

Test plan

  • Run python .claude/skills/gym-review/scripts/review.py .claude/skills/gym-review/evals/files/ and verify expected findings per fixture
  • Verify skills load correctly in Claude Code (check / menu shows all 7)
  • Run with-skill vs without-skill evals per agentskills.io eval spec

Supersedes #1060 (closed due to force-push restriction on original branch for DCO fix).

🤖 Generated with Claude Code

lbliii and others added 4 commits April 13, 2026 18:38
…, config, data, and scaffolding

Seven spec-compliant agent skills with evals, references, and a deterministic
review script. gym-review is the S-tier reference implementation with a
standalone Python checker (scripts/review.py), self-contained anti-pattern
and fix-pattern references, and portable eval fixtures.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
Remove unused variables (F841), unused imports (F401), sort imports (I001),
and apply ruff formatting to all Python files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
Covers all 7 skills, 5 chains, skill structure, evaluation method
(with-skill vs baseline), grading, and portability notes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
{"role": "user", "content": "Problem statement here"}
]
},
"verifier_metadata": {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should have agent_ref


```bash
# Validate example data (required before PR submission)
ng_prepare_data "+config_paths=[resources_servers/my_benchmark/configs/my_benchmark.yaml]" \
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should have part on train_preparation mode and agent_ref

Comment thread .claude/skills/gym-data/SKILL.md Outdated
lbliii and others added 4 commits April 13, 2026 19:09
…ures

Every skill now has:
- references/ with portable documentation (config patterns, JSONL schema,
  error patterns, diagnostic fields, metrics guide, agent patterns)
- evals/files/ with bundled test fixtures (sample configs, rollouts, JSONL
  data, agent code with intentional bugs, clean implementations)
- Updated evals referencing fixtures instead of repo paths

Removed "S-tier" language from README.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
The secrets detector flagged placeholder values in sample_env_config.yaml.
These are intentional example values, not real secrets.

Signed-off-by: Lawrence Lane <llane@nvidia.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
New skill covering env.yaml setup, config validation, server launch,
health checking, smoke testing, and rollout collection. Fills the
operational gap between having a configured benchmark and profiled results.

Also adds a new "run" chain (gym-config > gym-run > gym-profile) and
inserts gym-run into existing chains (new-benchmark, validate,
external-integration).

Signed-off-by: Lawrence Lane <llane@nvidia.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
- Add agent_ref to JSONL schema examples and documentation
- Add train_preparation mode to data validation step
- Replace GitLab-first with HuggingFace-first for dataset registry
  (GitLab kept as internal-only fallback)
- Update references/schema.md to match

Signed-off-by: Lawrence Lane <llane@nvidia.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>

## Step 5: Multi-environment training

To run multiple environments simultaneously, compose multiple config files:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should also mention nemo rl config maybe for multienv training. since we dont use ng_run for training, we use nemorl config

lbliii and others added 4 commits April 13, 2026 19:34
CRITICAL: ng_reward_profile uses +materialized_inputs_jsonl_fpath=,
not +input_jsonl_fpath= — fixed in gym-profile and gym-run.

Also:
- gym-data: add license valid values enum, num_repeats field,
  artifact_fpath optionality
- gym-run: add prompt_config and upload_rollouts_to_wandb params
- gym-scaffold-agent: document self.server_client, self.config,
  aggregate_metrics() override

Signed-off-by: Lawrence Lane <llane@nvidia.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
…ross-skill links

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit retroactively signs off commit d03e4f7:

Signed-off-by: Lawrence Lane <llane@nvidia.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
I, Lawrence Lane <llane@nvidia.com>, hereby add my Signed-off-by to this commit: d03e4f7

Signed-off-by: Lawrence Lane <llane@nvidia.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Lawrence Lane <llane@nvidia.com>
@lbliii
Copy link
Copy Markdown
Contributor Author

lbliii commented Apr 13, 2026

Superseded by #1062 — rebased with DCO sign-off (force-push was blocked on the original branch).

@lbliii lbliii closed this Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants