Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
45be7bb
Supress milvus lite logs for library lite notebook (#598)
nv-pranjald May 22, 2026
4321fd1
ci: bump GitHub Actions to Node.js 24 runtimes (#597)
shubhadeepd May 22, 2026
75125fa
Feat/skills ci automation (#602)
vidushig-nv May 22, 2026
a3b7d30
fix(ingestor): return backend-canonicalized collection name to align …
smasurekar May 22, 2026
0de926b
fix(agentic-rag): disable JSON mode and harden response parser for ma…
smasurekar May 25, 2026
0ca6716
Merge pull request #606 from NVIDIA-AI-Blueprints/dev/smasurekar/es-l…
smasurekar May 25, 2026
75928c2
copy pr bot additional trustees
vidushig-nv May 25, 2026
348dc9f
copy pr bot additional trustees
vidushig-nv May 25, 2026
6b729a4
Added a Limitations bullet noting that the per-response metrics block…
smasurekar May 25, 2026
d9010f5
Merge pull request #607 from NVIDIA-AI-Blueprints/dev/smasurekar/agen…
smasurekar May 25, 2026
ed9136d
updated label
vidushig-nv May 25, 2026
df81328
use GitHub Secrets directly
vidushig-nv May 25, 2026
17d3518
ci: fix skills-eval runner label, credentials, artifacts, and NV-BASE…
vidushig-nv May 25, 2026
174da28
ci: add ANTHROPIC_BASE_URL and fix NV-BASE push trigger
vidushig-nv May 25, 2026
1a587cd
ci: add ANTHROPIC_MODEL and CLAUDE_CODE_DISABLE_THINKING for NVIDIA p…
vidushig-nv May 25, 2026
b86a5b6
updated agent code
vidushig-nv May 25, 2026
2a47fff
docs: document OpenShift support in v2.6.0 release notes (#600)
shubhadeepd May 22, 2026
792c69c
docs: add agent skill routing table to README (#599)
niyatisingal May 25, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 6 additions & 7 deletions .github/copy-pr-bot.yaml
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
# copy-pr-bot configuration for NVIDIA-AI-Blueprints/rag
# Docs: https://docs.gha-runners.nvidia.com/platform/apps/copy-pr-bot/
#
# All NVIDIA org members with write access are auto-trusted and auto-vetters.
# No individual names needed — scales to any number of contributors.
# Commit signing required for trusted-change classification.
# See: https://docs.github.com/en/authentication/managing-commit-signature-verification
#
# Only add to additional_vetters if someone needs vetting rights
# but has read-only repo access (rare for internal repos).
# additional_trustees: users explicitly trusted even if not org members.
# Needed when author_association is CONTRIBUTOR (not MEMBER of NVIDIA-AI-Blueprints org).

enabled: true
auto_sync_draft: false
auto_sync_ready: true

additional_trustees:
- vidushig-nv
- richa-nvidia
104 changes: 82 additions & 22 deletions .github/skill-eval/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,18 @@ find /tmp/skill-eval/results -mindepth 1 -maxdepth 1 -type d \
! -name "${GITHUB_RUN_ID}" -exec rm -rf {} + 2>/dev/null || true

mkdir -p /tmp/skill-eval/datasets /tmp/skill-eval/results

# Log exact image digests for traceability (resolve :latest to sha256)
echo "=== Image digests (for traceability) ==="
for img in \
nvcr.io/nvstaging/blueprint/rag-server:${TAG:-latest} \
nvcr.io/nvstaging/blueprint/ingestor-server:${TAG:-latest}; do
digest=$(docker inspect "$img" --format '{{index .RepoDigests 0}}' 2>/dev/null \
|| docker pull "$img" -q 2>/dev/null \
&& docker inspect "$img" --format '{{index .RepoDigests 0}}' 2>/dev/null \
|| echo "$img — not yet pulled")
echo " $img → $digest"
done
```

## Your job, in order
Expand Down Expand Up @@ -54,7 +66,7 @@ mkdir -p /tmp/skill-eval/datasets /tmp/skill-eval/results

Skills with no `eval/` dir are not yet migrated — skip them.

3. **Check the shared adapter.** All rag-* skills use a single adapter
3. **Check the shared adapter.** All rag-\* skills use a single adapter
at `skill-eval/adapters/rag-blueprint/generate.py` with
`--skill-name <skill>`. Verify it accepts `--skill-name`:

Expand All @@ -68,7 +80,7 @@ mkdir -p /tmp/skill-eval/datasets /tmp/skill-eval/results
§ 3c) with the fix and emit `BLOCKED: adapter missing --skill-name`.

Unlike VSS, you do NOT create per-skill adapters — one shared
adapter serves all rag-* skills. If a skill genuinely needs custom
adapter serves all rag-\* skills. If a skill genuinely needs custom
adapter logic (different PREAMBLE, non-standard platform), note it
in the PR comment and raise a bot PR adding
`skill-eval/adapters/<skill>/generate.py`.
Expand All @@ -78,8 +90,8 @@ mkdir -p /tmp/skill-eval/datasets /tmp/skill-eval/results
is the spec filename without `.json`.

Resolve `SKILL_DIR` based on where the skill lives:
- Decomposed skills: `SKILL_DIR="$REPO_ROOT/skills/<skill>"`
- Monolithic skills: `SKILL_DIR="$REPO_ROOT/skill-source/.agents/skills/<skill>"`
- Decomposed skills: `SKILL_DIR="$REPO_ROOT/skills/<skill>"`
- Monolithic skills: `SKILL_DIR="$REPO_ROOT/skill-source/.agents/skills/<skill>"`

```bash
cd "$REPO_ROOT/skill-eval"
Expand All @@ -95,14 +107,13 @@ mkdir -p /tmp/skill-eval/datasets /tmp/skill-eval/results
generation fails, read the traceback, fix the adapter, rerun.

5. **Run Harbor trials.** Platform routing:

- **`cpu` platform** (`nvidia_hosted.json` specs) → `LocalEnvironment`.
Docker runs directly on the `rag-skill-validator` runner — no
Brev VM needed. The runner IS the deploy host.

- **`H100_x2` platform** (`h100.json` specs) → `BrevEnvironment`.
Pre-provision an ephemeral Brev VM, run Harbor against it,
delete it after. See § GPU provisioning below.
Pre-provision ONE ephemeral Brev VM for all H100 specs in this run
(see § GPU provisioning). Run all H100 trials against that single VM.

For **cpu skills**, clean any leftover Docker state first:

Expand All @@ -119,6 +130,30 @@ mkdir -p /tmp/skill-eval/datasets /tmp/skill-eval/results
skills set — it deploys the RAG stack that all other skills test
against. Then run remaining cpu skills in any order.

**GPU pre-flight (automatic, no action required from skill authors):**
Before running ANY H100 spec for any skill, first sync the Brev VM's repo
to the PR base branch so compose files, env files, and skill docs all match
the branch under test (Harbor clones the default branch — main — not the PR):

```bash
brev exec "$BREV_INSTANCE" -- \
"cd /home/nvidia/rag && git fetch origin ${PR_BASE} && git checkout ${PR_BASE} && git pull origin ${PR_BASE}" \
2>/dev/null || true
```

Then check if the RAG stack is already running on the Brev VM:

```bash
brev exec "$BREV_INSTANCE" "curl -sf http://localhost:8081/v1/health" \
2>/dev/null && RAG_RUNNING=true || RAG_RUNNING=false
```

If `RAG_RUNNING=false` and `rag-blueprint/eval/h100.json` exists in
the repo, run it first to deploy the self-hosted RAG stack. This
happens automatically regardless of which skills are in the PR diff —
skill authors do NOT need to declare this dependency in their specs.
Once deployed, all subsequent H100 specs reuse the running stack.

Use the canonical Harbor invocation from § Harbor invocation below.
One step at a time, in order. Skip remaining steps if a step's
reward < 1.0 (skip-on-prior-fail).
Expand Down Expand Up @@ -153,34 +188,50 @@ mkdir -p /tmp/skill-eval/datasets /tmp/skill-eval/results

## GPU provisioning (H100_x2 specs only)

For specs with `platforms: ["H100_x2"]`:
**One VM per platform per run.** If multiple skills have `H100_x2` specs
(e.g. rag-eval/h100.json + rag-perf/h100.json), provision ONE Brev VM at
the start and run ALL H100 trials against it sequentially. Do NOT provision
a new VM per spec — that wastes 13+ min provisioning time and doubles cost.

**Before processing specs**, collect all unique platforms needed:

```bash
BREV_TYPE="dmz.h100x2.pcie"
# Scan all changed skill specs for their platform requirements
GPU_PLATFORMS_NEEDED=$(...) # e.g. "H100_x2"
```

Then provision once per platform, store the instance name, reuse it for
all specs of that platform:

```bash
# Provision ONCE for all H100_x2 specs in this run
BREV_TYPE="dmz.h100x2,scaleway_H100x2,gpu-h100-sxm.1gpu-16vcpu-200gb"
BREV_INSTANCE="rag-eval-gpu-$(date +%s | tail -c 8)"

# Create with retry
# Create with retry + fallback types
for attempt in $(seq 1 5); do
echo "$BREV_TYPE" | brev create "$BREV_INSTANCE" --detached 2>&1 | tail -5
brev create "$BREV_INSTANCE" --type "$BREV_TYPE" --detached 2>&1 | tail -5
brev ls 2>/dev/null | awk -v n="$BREV_INSTANCE" '$1==n {found=1} END{exit !found}' \
&& break
sleep 15
done

# Wait for RUNNING+READY (up to 30 min)
DEADLINE=$(( $(date +%s) + 1800 ))
last_state=""
while [ "$(date +%s)" -lt "$DEADLINE" ]; do
STATE=$(brev ls 2>/dev/null | awk -v n="$BREV_INSTANCE" '$1==n {print $2"+"$4}')
[ -n "$STATE" ] && [ "$STATE" != "$last_state" ] && echo " $(date -u +%H:%M:%SZ) $BREV_INSTANCE: $STATE" && last_state="$STATE"
[ "$STATE" = "RUNNING+READY" ] && break
sleep 15
done
[ "$STATE" = "RUNNING+READY" ] || { echo "BLOCKED: H100 VM never reached RUNNING+READY"; exit 1; }
[ "$last_state" = "RUNNING+READY" ] || { echo "BLOCKED: H100 VM never reached RUNNING+READY"; exit 1; }

# Record for cleanup
# Record for cleanup — workflow step deletes after 5-min cooldown
mkdir -p /tmp/brev
echo "$BREV_INSTANCE" >> "/tmp/brev/started-by-${GITHUB_RUN_ID}.txt"

export BREV_INSTANCE
export BREV_INSTANCE # reuse this for ALL H100_x2 specs below
```

---
Expand Down Expand Up @@ -248,15 +299,23 @@ done
```

**Never background harbor and poll.** Use foreground blocking calls only.
`harbor run` MUST be called directly in a Bash tool call and allowed to block
until it exits. Do NOT use TaskCreate, background processes (`&`), `nohup`,
`Monitor`, or any other mechanism to run harbor asynchronously — not even
wrapped in a shell script. The Bash tool call itself must block until harbor
exits. The call will block for up to 90 minutes on GPU specs — that is
expected and correct. Do NOT check on it with sleep loops, Read, or Monitor.
Just wait. Violating this rule causes the agent to exit without DONE:/BLOCKED:
(exit 4). This has happened multiple times — do not repeat the mistake.

---

## Platform topology

| Platform | `spec.platforms` value | Environment | Instance | After run |
|---|---|---|---|---|
| CPU / cloud NIMs | `cpu` | LocalEnvironment | `rag-skill-validator` runner | docker down + volume cleanup |
| 2× H100 80GB | `H100_x2` | BrevEnvironment | `rag-eval-gpu-<ts>` (`dmz.h100x2.pcie`) | workflow step deletes after 5-min cooldown |
| Platform | `spec.platforms` value | Environment | Instance | After run |
| ---------------- | ---------------------- | ---------------- | --------------------------------------- | ------------------------------------------ |
| CPU / cloud NIMs | `cpu` | LocalEnvironment | `rag-skill-validator` runner | docker down + volume cleanup |
| 2× H100 80GB | `H100_x2` | BrevEnvironment | `rag-eval-gpu-<ts>` (`dmz.h100x2.pcie`) | workflow step deletes after 5-min cooldown |

`rag-skill-validator` is the CI runner host — **never** provision Brev against it.

Expand All @@ -270,10 +329,10 @@ done
Head: `<short-sha>` · spec `<spec-sha>`
First started: `<utc>` · Last finished: `<utc>` · Total: `<Xhr Ymin>`

| Platform | Step | Query | Result | Reward | Duration | Turns |
|---|---|---|---|---|---|---|
| cpu | step-1 | Deploy via Docker Compose... | ✅ 1.0 (6/6) | 1.0 | 4m 29s | 18 |
| cpu | step-2 | Get RAG Blueprint running... | ✅ 1.0 (5/5) | 1.0 | 1m 23s | 9 |
| Platform | Step | Query | Result | Reward | Duration | Turns |
| -------- | ------ | ---------------------------- | ------------ | ------ | -------- | ----- |
| cpu | step-1 | Deploy via Docker Compose... | ✅ 1.0 (6/6) | 1.0 | 4m 29s | 18 |
| cpu | step-2 | Get RAG Blueprint running... | ✅ 1.0 (5/5) | 1.0 | 1m 23s | 9 |

### Failing checks

Expand Down Expand Up @@ -320,6 +379,7 @@ END=$(jq -r '.trial_finished_at' "$RESULTS"/*/*/step-${STEP}__*/result.json 2>/
## Manual full-sweep mode

When `MANUAL_FULL_SWEEP=1` (workflow_dispatch):

- **Step 1 override:** skip diff. Enumerate `skills/*/eval/*.json`;
filter by `MANUAL_SKILLS_FILTER` (`*` = all skills).
- **Step 3 override:** no bot-PR flow. Record missing adapter as
Expand Down
18 changes: 9 additions & 9 deletions .github/workflows/ci-pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,10 @@ jobs:
if: github.event_name == 'push' || github.event_name == 'workflow_dispatch' || github.event_name == 'schedule' || github.event.pull_request.head.repo.full_name == github.repository
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v5

- name: Setup Helm
uses: azure/setup-helm@v4
uses: azure/setup-helm@v5
with:
version: 'latest'

Expand All @@ -67,7 +67,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Check out repository code
uses: actions/checkout@v4
uses: actions/checkout@v5

- uses: actions/setup-python@v3
- uses: pre-commit/action@v3.0.1
Expand All @@ -79,7 +79,7 @@ jobs:
image: python:3.12-slim
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v5

- name: Install system dependencies
run: |
Expand Down Expand Up @@ -112,7 +112,7 @@ jobs:
runs-on: ubuntu-latest
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v5

- name: Setup Node.js
uses: actions/setup-node@v4
Expand Down Expand Up @@ -152,7 +152,7 @@ jobs:
pnpm test:coverage

- name: Upload coverage artifacts
uses: actions/upload-artifact@v4
uses: actions/upload-artifact@v5
if: always()
with:
name: frontend-coverage-${{ steps.sanitize.outputs.ref_name }}-${{ github.sha }}
Expand All @@ -166,7 +166,7 @@ jobs:
image: python:3.12-slim
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v5

- name: Install required packages
run: |
Expand All @@ -192,7 +192,7 @@ jobs:
ENABLE_NRL_INTEGRATION_TESTS: "false"
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v5

- name: Install NGC CLI
env:
Expand Down Expand Up @@ -1264,7 +1264,7 @@ jobs:
echo "ref_name=$SANITIZED_REF" >> $GITHUB_OUTPUT

- name: Upload all integration test logs
uses: actions/upload-artifact@v4
uses: actions/upload-artifact@v5
if: always()
with:
name: integration-tests-logs-${{ steps.sanitize.outputs.ref_name }}-${{ github.sha }}
Expand Down
26 changes: 13 additions & 13 deletions .github/workflows/publish-artifacts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ jobs:
image: python:3.10
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v5

- name: Set artifactory version
run: |
Expand Down Expand Up @@ -101,7 +101,7 @@ jobs:
ls -la dist/

- name: Upload wheel artifact
uses: actions/upload-artifact@v4
uses: actions/upload-artifact@v5
with:
name: wheel-${{ env.ARTIFACTORY_VERSION }}
path: dist/*.whl
Expand Down Expand Up @@ -156,10 +156,10 @@ jobs:
if: github.event_name != 'workflow_dispatch' || ((github.event.inputs.JOBS_TO_RUN == 'all' || github.event.inputs.JOBS_TO_RUN == 'containers-only') && github.event.inputs.PUBLISH_RAG_SERVER != 'false')
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v5

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
uses: docker/setup-buildx-action@v4

- name: Determine TAG
id: tag
Expand All @@ -177,7 +177,7 @@ jobs:
echo "Final TAG value: $TAG"

- name: Login to NGC Container Registry
uses: docker/login-action@v3
uses: docker/login-action@v4
with:
registry: nvcr.io
username: '$oauthtoken'
Expand Down Expand Up @@ -216,10 +216,10 @@ jobs:
if: github.event_name != 'workflow_dispatch' || ((github.event.inputs.JOBS_TO_RUN == 'all' || github.event.inputs.JOBS_TO_RUN == 'containers-only') && github.event.inputs.PUBLISH_INGESTOR_SERVER != 'false')
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v5

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
uses: docker/setup-buildx-action@v4

- name: Determine TAG
id: tag
Expand All @@ -237,7 +237,7 @@ jobs:
echo "Final TAG value: $TAG"

- name: Login to NGC Container Registry
uses: docker/login-action@v3
uses: docker/login-action@v4
with:
registry: nvcr.io
username: '$oauthtoken'
Expand Down Expand Up @@ -276,10 +276,10 @@ jobs:
if: github.event_name != 'workflow_dispatch' || ((github.event.inputs.JOBS_TO_RUN == 'all' || github.event.inputs.JOBS_TO_RUN == 'containers-only') && github.event.inputs.PUBLISH_RAG_FRONTEND != 'false')
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v5

- name: Set up Docker Buildx
uses: docker/setup-buildx-action@v3
uses: docker/setup-buildx-action@v4

- name: Determine TAG
id: tag
Expand All @@ -297,7 +297,7 @@ jobs:
echo "Final TAG value: $TAG"

- name: Login to NGC Container Registry
uses: docker/login-action@v3
uses: docker/login-action@v4
with:
registry: nvcr.io
username: '$oauthtoken'
Expand Down Expand Up @@ -336,10 +336,10 @@ jobs:
if: github.event_name != 'workflow_dispatch' || github.event.inputs.JOBS_TO_RUN == 'all' || github.event.inputs.JOBS_TO_RUN == 'helm-chart-only'
steps:
- name: Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v5

- name: Install Helm
uses: azure/setup-helm@v4
uses: azure/setup-helm@v5
with:
version: 'v3.17.0'

Expand Down
Loading
Loading