Skip to content

Cherry-pick develop changes into release-v2.6.0#613

Merged
shubhadeepd merged 18 commits into
release-v2.6.0from
dev/shubhadeepd/rebase-release
May 25, 2026
Merged

Cherry-pick develop changes into release-v2.6.0#613
shubhadeepd merged 18 commits into
release-v2.6.0from
dev/shubhadeepd/rebase-release

Conversation

@shubhadeepd
Copy link
Copy Markdown
Collaborator

Summary

Cherry-picks commits from develop that are not yet on release-v2.6.0, bringing the release branch up to date with post-release fixes and improvements without merging all of develop.

17 files changed (+395 / −190).

Test plan

  • CI pipeline passes on this branch (ci-pipeline.yml, skills-eval.yml, skills-nv-base.yml)
  • Unit tests: test_agentic_rag.py, ingestor main tests
  • Smoke-test agentic RAG responses with non-JSON LLM output
  • Verify ingestor collection name in upload/summary flows
  • Confirm release notes and README doc updates render correctly

nv-pranjald and others added 18 commits May 22, 2026 11:32
Update checkout, upload-artifact, setup-helm, and Docker actions to
versions that default to Node 24, resolving deprecation warnings on
publish and CI workflows.
* feat(ci): agentic eval + pre-checkout volume cleanup

skills-eval.yml:
  - Pre-checkout step removes root-owned Docker volumes (Milvus/MinIO/etcd)
    using docker run alpine before git clean runs — permanent fix for the
    EACCES checkout failure loop
  - Replaced bash ci/run_skill_eval.sh with skills_eval_agent.py
    (agentic approach — diffs PR, routes per-spec platform, posts PR comment)

.github/skill-eval/skills_eval_agent.py + AGENTS.md:
  Ported from feat/skill-eval-ci — Claude agent SDK orchestrator that
  handles diff detection, per-spec routing (cpu/gpu), Harbor execution,
  and PR comment posting

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

* fix(ci): rm -rf /target/* not /target — can't delete mount point

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

* feat(ci): automate skill-source/ evals in PR pipeline

Previously skill-source/.agents/skills/ changes were invisible to the
automated pipeline — dorny/paths-filter only watched skills/** and
AGENTS.md Step 1 only diffed under skills/.

Changes:
- skills-eval.yml: add skill-source/** to paths filter so PRs touching
  the monolithic rag-blueprint skill trigger the eval
- AGENTS.md Step 1: diff both skills/ and skill-source/.agents/skills/,
  resolve SKILL_DIR to the correct root per location
- AGENTS.md generate.py call: uses SKILL_DIR so both decomposed and
  monolithic skills get the right --skill-dir and --spec paths

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

* ci: use actions/checkout@v5 and upload-artifact@v5 (aligns with #597)

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

* fix(eval): one Brev VM per platform per run — not one per spec

If rag-eval/h100.json + rag-perf/h100.json both need H100_x2, provision
ONE VM and run all H100 trials against it sequentially. Prevents spinning
up 2 separate VMs (saves 13+ min provisioning + halves cost).

Added fallback types for capacity failures:
  dmz.h100x2,scaleway_H100x2,gpu-h100-sxm.1gpu-16vcpu-200gb

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

* feat(eval): auto-deploy RAG stack before any H100 trial — no skill author action needed

Previously skill authors had to add deploy instructions to their h100
spec env field. Now the agent handles this automatically:

Before running ANY H100 spec (rag-eval, rag-perf, etc.), agent checks
if RAG stack is up at localhost:8081 on the Brev VM. If not, deploys
it using rag-blueprint/h100.json automatically — regardless of which
skills changed in the PR.

Skill authors write their h100.json specs normally. The infrastructure
handles the RAG stack prerequisite. Same pattern as VSS's profile field
but handled at the agent level not spec level.

Also: one Brev VM per platform per run (not one per spec).

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>

---------

Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
…summary keys

Signed-off-by: smasurekar <smasurekar@nvidia.com>
…lformed LLM output (#605)

Signed-off-by: smasurekar <smasurekar@nvidia.com>
…owercase-fix-updates

fix(ingestor): return backend-canonicalized collection name to align summary keys
… isn't populated for agentic requests

Signed-off-by: smasurekar <smasurekar@nvidia.com>
…tic-metrics-note

Added a Limitations bullet noting that the per-response metrics block isn't populated for agentic requests
… noise

Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
Signed-off-by: Vidushi Gupta <vidushig@nvidia.com>
Add OpenShift to the release summary and Highlights for the 2.6.0
release, linking to the Helm on OpenShift deployment guide.
Signed-off-by: Niyati Singal <nsingal@nvidia.com>
@shubhadeepd shubhadeepd merged commit 10ebb42 into release-v2.6.0 May 25, 2026
6 of 7 checks passed
@shubhadeepd shubhadeepd deleted the dev/shubhadeepd/rebase-release branch May 25, 2026 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants