|
| 1 | +<!-- |
| 2 | +SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved. |
| 3 | +SPDX-License-Identifier: Apache-2.0 |
| 4 | +
|
| 5 | +Licensed under the Apache License, Version 2.0 (the "License"); |
| 6 | +you may not use this file except in compliance with the License. |
| 7 | +You may obtain a copy of the License at |
| 8 | +
|
| 9 | +http://www.apache.org/licenses/LICENSE-2.0 |
| 10 | +
|
| 11 | +Unless required by applicable law or agreed to in writing, software |
| 12 | +distributed under the License is distributed on an "AS IS" BASIS, |
| 13 | +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| 14 | +See the License for the specific language governing permissions and |
| 15 | +limitations under the License. |
| 16 | +--> |
| 17 | + |
| 18 | +# OpenClaw NeMo-Flow Harbor Smoke |
| 19 | + |
| 20 | +This workflow runs **OpenClaw** inside a Harbor trial with the published **NeMo |
| 21 | +Flow OpenClaw native plugin** (`npm:nemo-flow-openclaw`, pinned inside the |
| 22 | +agent): hooks backend and in-process observability, not a Harbor-side “ATIF |
| 23 | +export” pipeline. You can drive it with **Terminal-Bench** (`sqlite-db-truncate`) |
| 24 | +or with the same **SWE-bench** smoke instance as the OpenCode and Hermes smokes |
| 25 | +(`django__django-13741` under `swebench-opencode-smoke`). |
| 26 | + |
| 27 | +Because OpenClaw may not be registered in the Harbor build you have installed, |
| 28 | +this smoke loads the agent from **NeMo Agent Toolkit** via |
| 29 | +`--agent-import-path` (no `-a openclaw` required). |
| 30 | + |
| 31 | +## Pipeline |
| 32 | + |
| 33 | +```mermaid |
| 34 | +flowchart TD |
| 35 | + task[Terminal-Bench task<br/>sqlite-db-truncate] --> harbor[Harbor trial<br/>Docker environment] |
| 36 | +
|
| 37 | + harbor --> run[OpenClaw + optional NeMo Flow OpenClaw plugin] |
| 38 | + run --> nativeLog[agent/openclaw.txt<br/>CLI JSON / logs] |
| 39 | + run --> sessionJsonl[agent/openclaw.session.jsonl<br/>OpenClaw session log] |
| 40 | + sessionJsonl --> harborAtif[agent/trajectory.json<br/>Harbor-generated ATIF<br/>from session replay] |
| 41 | +
|
| 42 | + run --> nemoNative[NeMo Flow native plugin<br/>hooks backend + observability] |
| 43 | + nemoNative --> pluginOut[agent/nemo-flow-atif/<br/>plugin observability output] |
| 44 | +
|
| 45 | + harbor --> verifier[Verifier] |
| 46 | + verifier --> result[result.json] |
| 47 | +
|
| 48 | + harborAtif --> compare[Optional: compare_atif_tools<br/>if plugin writes comparable ATIF] |
| 49 | + pluginOut --> compare |
| 50 | +``` |
| 51 | + |
| 52 | +After a successful NeMo-Flow-enabled run, expect under `agent/`: |
| 53 | + |
| 54 | +<!-- path-check-skip-begin --> |
| 55 | + |
| 56 | +- `openclaw.txt` — CLI stdout (`openclaw agent --local --json`). |
| 57 | +- `openclaw.session.jsonl` — session log; Harbor derives ATIF (OpenClaw has no native ATIF). |
| 58 | +- `trajectory.json` — Harbor ATIF from the session log. |
| 59 | +- `nemo-flow-atif/` — NeMo Flow OpenClaw plugin observability output. |
| 60 | + |
| 61 | +At the trial root: `result.json` (outcome). |
| 62 | + |
| 63 | +<!-- path-check-skip-end --> |
| 64 | + |
| 65 | +## Prerequisites |
| 66 | + |
| 67 | +- Docker is running. |
| 68 | +- Python environment with **Harbor** and **`nvidia-nat-harbor`** installed so |
| 69 | + `harbor` and `nat_harbor` import cleanly (editable install from this repo is |
| 70 | + typical). |
| 71 | + |
| 72 | +From the NeMo Agent Toolkit repository root: |
| 73 | + |
| 74 | +<!-- path-check-skip-begin --> |
| 75 | + |
| 76 | +```bash |
| 77 | +uv venv --python 3.13 --seed .venv |
| 78 | +uv pip install -e packages/nvidia_nat_harbor |
| 79 | +uv pip install -e external/harbor |
| 80 | +``` |
| 81 | + |
| 82 | +Use a Harbor revision that ships the built-in **Terminal-Bench** benchmark |
| 83 | +dataset (`terminal-bench@2.0`). If Harbor lives under `external/harbor`, use |
| 84 | +that path in `uv pip install -e` instead of `external/harbor`. |
| 85 | + |
| 86 | +<!-- path-check-skip-end --> |
| 87 | + |
| 88 | +Create a secrets file (do not commit it). **`NVIDIA_BASE_URL` is only exercised |
| 89 | +against the OpenAI-compatible base on `integrate.api.nvidia.com` for this |
| 90 | +workflow today** (for example `https://integrate.api.nvidia.com/v1`). Other |
| 91 | +NVIDIA inference hosts are not covered here yet. |
| 92 | + |
| 93 | +<!-- path-check-skip-begin --> |
| 94 | + |
| 95 | +```bash |
| 96 | +mkdir -p .tmp/harbor/secrets |
| 97 | +read -rsp 'NVIDIA_API_KEY: ' NVIDIA_API_KEY; echo |
| 98 | +read -rsp 'NVIDIA_BASE_URL (integrate.api.nvidia.com OpenAI-compatible base): ' NVIDIA_BASE_URL; echo |
| 99 | +cat > .tmp/harbor/secrets/nvidia.env <<EOF |
| 100 | +NVIDIA_API_KEY=${NVIDIA_API_KEY} |
| 101 | +NVIDIA_BASE_URL=${NVIDIA_BASE_URL} |
| 102 | +EOF |
| 103 | +``` |
| 104 | + |
| 105 | +<!-- path-check-skip-end --> |
| 106 | + |
| 107 | +## Run the smoke (Terminal-Bench) |
| 108 | + |
| 109 | +This matches a recent working invocation, except **`--agent-import-path`** |
| 110 | +replaces `-a openclaw`** so you do not depend on Harbor registering the OpenClaw |
| 111 | +agent name. |
| 112 | + |
| 113 | +<!-- path-check-skip-begin --> |
| 114 | + |
| 115 | +```bash |
| 116 | +cd /external/NeMo-Agent-Toolkit |
| 117 | + |
| 118 | +set -a |
| 119 | +. .tmp/harbor/secrets/nvidia.env |
| 120 | +set +a |
| 121 | + |
| 122 | +.venv/bin/harbor run \ |
| 123 | + -d terminal-bench@2.0 \ |
| 124 | + -i sqlite-db-truncate \ |
| 125 | + --agent-import-path nat_harbor.agents.installed.openclaw:OpenClaw \ |
| 126 | + -m nvidia/qwen/qwen3.5-397b-a17b \ |
| 127 | + -e docker \ |
| 128 | + --env-file .tmp/harbor/secrets/nvidia.env \ |
| 129 | + --jobs-dir .tmp/harbor-openclaw-nemoflow \ |
| 130 | + --n-concurrent 1 \ |
| 131 | + --agent-kwarg enable_nemo_flow=true \ |
| 132 | + -q \ |
| 133 | + -y |
| 134 | +``` |
| 135 | + |
| 136 | +Disable NeMo Flow (faster setup, no native plugin): |
| 137 | + |
| 138 | +```bash |
| 139 | +.venv/bin/harbor run \ |
| 140 | + -d terminal-bench@2.0 \ |
| 141 | + -i sqlite-db-truncate \ |
| 142 | + --agent-import-path nat_harbor.agents.installed.openclaw:OpenClaw \ |
| 143 | + -m nvidia/qwen/qwen3.5-397b-a17b \ |
| 144 | + -e docker \ |
| 145 | + --env-file .tmp/harbor/secrets/nvidia.env \ |
| 146 | + --jobs-dir .tmp/harbor-openclaw-nemoflow \ |
| 147 | + --n-concurrent 1 \ |
| 148 | + --agent-kwarg enable_nemo_flow=false \ |
| 149 | + -q \ |
| 150 | + -y |
| 151 | +``` |
| 152 | + |
| 153 | +## Run the smoke (SWE-bench) |
| 154 | + |
| 155 | +Same prepared SWE-bench instance as **`opencode-nemoflow-smoke.md`** and |
| 156 | +**`hermes-nemoflow-smoke.md`**: `django__django-13741` under |
| 157 | +`swebench-opencode-smoke`. The task directory should be: |
| 158 | + |
| 159 | +```text |
| 160 | +external/harbor/datasets/swebench-opencode-smoke/django__django-13741 |
| 161 | +``` |
| 162 | + |
| 163 | +If it is missing, create it from a Harbor checkout (same adapter invocation as |
| 164 | +in the OpenCode smoke prerequisites): |
| 165 | + |
| 166 | +```bash |
| 167 | +cd external/harbor/adapters/swebench |
| 168 | + |
| 169 | +uv run swebench \ |
| 170 | + --instance-id django__django-13741 \ |
| 171 | + --task-dir ../../datasets/swebench-opencode-smoke \ |
| 172 | + --overwrite |
| 173 | + |
| 174 | +cd ../../../.. |
| 175 | +``` |
| 176 | + |
| 177 | +NeMo-Flow-enabled run (OpenClaw import path, same model family as the OpenCode |
| 178 | +smoke): |
| 179 | + |
| 180 | +```bash |
| 181 | +cd /external/NeMo-Agent-Toolkit |
| 182 | + |
| 183 | +export HARBOR_JOBS_DIR=.tmp/harbor/openclaw-nemoflow-swebench |
| 184 | +export SWEBENCH_TASK=external/harbor/datasets/swebench-opencode-smoke/django__django-13741 |
| 185 | +export JOB_NAME=openclaw-nemoflow-swebench-smoke-1 |
| 186 | + |
| 187 | +set -a |
| 188 | +. .tmp/harbor/secrets/nvidia.env |
| 189 | +set +a |
| 190 | + |
| 191 | +.venv/bin/harbor run \ |
| 192 | + --path "$SWEBENCH_TASK" \ |
| 193 | + -l 1 \ |
| 194 | + --job-name "$JOB_NAME" \ |
| 195 | + --jobs-dir "$HARBOR_JOBS_DIR" \ |
| 196 | + --yes -n 1 --max-retries 0 \ |
| 197 | + --env-file .tmp/harbor/secrets/nvidia.env \ |
| 198 | + --agent-import-path nat_harbor.agents.installed.openclaw:OpenClaw \ |
| 199 | + --env docker \ |
| 200 | + --model nvidia/qwen/qwen3.5-397b-a17b \ |
| 201 | + --agent-kwarg enable_nemo_flow=true |
| 202 | +``` |
| 203 | + |
| 204 | + |
| 205 | +<!-- path-check-skip-end --> |
| 206 | + |
| 207 | +## Quick artifact check |
| 208 | + |
| 209 | +### Terminal-Bench |
| 210 | + |
| 211 | +Set `TRIAL` to the completed trial directory (layout varies slightly by Harbor |
| 212 | +version; adjust the `find` if needed): |
| 213 | + |
| 214 | +<!-- path-check-skip-begin --> |
| 215 | + |
| 216 | +```bash |
| 217 | +export HARBOR_JOBS_DIR=.tmp/harbor-openclaw-nemoflow |
| 218 | +TRIAL=$(find "$HARBOR_JOBS_DIR" -mindepth 2 -maxdepth 3 -type d -name 'sqlite-db-truncate__*' | head -n 1) |
| 219 | +test -n "$TRIAL" |
| 220 | +ls -la "$TRIAL/agent" |
| 221 | +``` |
| 222 | + |
| 223 | +### SWE-bench |
| 224 | + |
| 225 | +After a run with `JOB_NAME=openclaw-nemoflow-swebench-smoke-1`: |
| 226 | + |
| 227 | +```bash |
| 228 | +export HARBOR_JOBS_DIR=.tmp/harbor/openclaw-nemoflow-swebench |
| 229 | +export JOB_NAME=openclaw-nemoflow-swebench-smoke-1 |
| 230 | +TRIAL=$(find "$HARBOR_JOBS_DIR/$JOB_NAME" -maxdepth 1 -type d -name 'django__django-13741__*' | head -n 1) |
| 231 | +test -n "$TRIAL" |
| 232 | +ls -la "$TRIAL/agent" |
| 233 | +``` |
| 234 | + |
| 235 | +### Optional tool comparison |
| 236 | + |
| 237 | +Pick one NeMo Flow plugin observability JSON under `nemo-flow-atif` (if present): |
| 238 | + |
| 239 | +```bash |
| 240 | +PLUGIN_TRAJ=$(find "$TRIAL/agent/nemo-flow-atif" -maxdepth 1 -type f -name '*.json' | head -n 1) |
| 241 | +echo "PLUGIN_TRAJ=${PLUGIN_TRAJ}" |
| 242 | +``` |
| 243 | + |
| 244 | +Optional: compare Harbor `trajectory.json` to a plugin artifact **only when** that |
| 245 | +file is ATIF-shaped enough for the tool (native plugin output varies by version): |
| 246 | + |
| 247 | +```bash |
| 248 | +test -f "$TRIAL/agent/trajectory.json" |
| 249 | +test -n "$PLUGIN_TRAJ" |
| 250 | + |
| 251 | +.venv/bin/python -m nat_harbor.smoke.compare_atif_tools \ |
| 252 | + --native "$TRIAL/agent/trajectory.json" \ |
| 253 | + --candidate "$PLUGIN_TRAJ" |
| 254 | +``` |
| 255 | + |
| 256 | +<!-- path-check-skip-end --> |
| 257 | + |
| 258 | +## Known limitations |
| 259 | + |
| 260 | +<!-- path-check-skip-begin --> |
| 261 | + |
| 262 | +- **Setup time:** each trial installs Node, OpenClaw, and (when enabled) the |
| 263 | + NeMo Flow npm plugin inside the container. |
| 264 | +- **Network:** `npm install -g` and `openclaw plugins install` need reliable |
| 265 | + access to the npm registry. |
| 266 | +- **ATIF schema:** the vendored agent under `nat_harbor` may emit |
| 267 | + `ATIF-v1.6` in `trajectory.json` when used with Harbor releases whose |
| 268 | + `Trajectory` model does not yet list `ATIF-v1.7`; the upstream Harbor OpenClaw |
| 269 | + agent may use v1.7 once dependencies align. |
| 270 | + |
| 271 | +<!-- path-check-skip-end --> |
0 commit comments