Skip to content

Commit 5815699

Browse files
committed
feat: add openclaw integration to harbor, run smoke tests
Signed-off-by: Sam Oluwalana <soluwalana@nvidia.com>
1 parent c241516 commit 5815699

5 files changed

Lines changed: 1709 additions & 8 deletions

File tree

Lines changed: 271 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,271 @@
1+
<!--
2+
SPDX-FileCopyrightText: Copyright (c) 2026, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
SPDX-License-Identifier: Apache-2.0
4+
5+
Licensed under the Apache License, Version 2.0 (the "License");
6+
you may not use this file except in compliance with the License.
7+
You may obtain a copy of the License at
8+
9+
http://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the License is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.
16+
-->
17+
18+
# OpenClaw NeMo-Flow Harbor Smoke
19+
20+
This workflow runs **OpenClaw** inside a Harbor trial with the published **NeMo
21+
Flow OpenClaw native plugin** (`npm:nemo-flow-openclaw`, pinned inside the
22+
agent): hooks backend and in-process observability, not a Harbor-side “ATIF
23+
export” pipeline. You can drive it with **Terminal-Bench** (`sqlite-db-truncate`)
24+
or with the same **SWE-bench** smoke instance as the OpenCode and Hermes smokes
25+
(`django__django-13741` under `swebench-opencode-smoke`).
26+
27+
Because OpenClaw may not be registered in the Harbor build you have installed,
28+
this smoke loads the agent from **NeMo Agent Toolkit** via
29+
`--agent-import-path` (no `-a openclaw` required).
30+
31+
## Pipeline
32+
33+
```mermaid
34+
flowchart TD
35+
task[Terminal-Bench task<br/>sqlite-db-truncate] --> harbor[Harbor trial<br/>Docker environment]
36+
37+
harbor --> run[OpenClaw + optional NeMo Flow OpenClaw plugin]
38+
run --> nativeLog[agent/openclaw.txt<br/>CLI JSON / logs]
39+
run --> sessionJsonl[agent/openclaw.session.jsonl<br/>OpenClaw session log]
40+
sessionJsonl --> harborAtif[agent/trajectory.json<br/>Harbor-generated ATIF<br/>from session replay]
41+
42+
run --> nemoNative[NeMo Flow native plugin<br/>hooks backend + observability]
43+
nemoNative --> pluginOut[agent/nemo-flow-atif/<br/>plugin observability output]
44+
45+
harbor --> verifier[Verifier]
46+
verifier --> result[result.json]
47+
48+
harborAtif --> compare[Optional: compare_atif_tools<br/>if plugin writes comparable ATIF]
49+
pluginOut --> compare
50+
```
51+
52+
After a successful NeMo-Flow-enabled run, expect under `agent/`:
53+
54+
<!-- path-check-skip-begin -->
55+
56+
- `openclaw.txt` — CLI stdout (`openclaw agent --local --json`).
57+
- `openclaw.session.jsonl` — session log; Harbor derives ATIF (OpenClaw has no native ATIF).
58+
- `trajectory.json` — Harbor ATIF from the session log.
59+
- `nemo-flow-atif/` — NeMo Flow OpenClaw plugin observability output.
60+
61+
At the trial root: `result.json` (outcome).
62+
63+
<!-- path-check-skip-end -->
64+
65+
## Prerequisites
66+
67+
- Docker is running.
68+
- Python environment with **Harbor** and **`nvidia-nat-harbor`** installed so
69+
`harbor` and `nat_harbor` import cleanly (editable install from this repo is
70+
typical).
71+
72+
From the NeMo Agent Toolkit repository root:
73+
74+
<!-- path-check-skip-begin -->
75+
76+
```bash
77+
uv venv --python 3.13 --seed .venv
78+
uv pip install -e packages/nvidia_nat_harbor
79+
uv pip install -e external/harbor
80+
```
81+
82+
Use a Harbor revision that ships the built-in **Terminal-Bench** benchmark
83+
dataset (`terminal-bench@2.0`). If Harbor lives under `external/harbor`, use
84+
that path in `uv pip install -e` instead of `external/harbor`.
85+
86+
<!-- path-check-skip-end -->
87+
88+
Create a secrets file (do not commit it). **`NVIDIA_BASE_URL` is only exercised
89+
against the OpenAI-compatible base on `integrate.api.nvidia.com` for this
90+
workflow today** (for example `https://integrate.api.nvidia.com/v1`). Other
91+
NVIDIA inference hosts are not covered here yet.
92+
93+
<!-- path-check-skip-begin -->
94+
95+
```bash
96+
mkdir -p .tmp/harbor/secrets
97+
read -rsp 'NVIDIA_API_KEY: ' NVIDIA_API_KEY; echo
98+
read -rsp 'NVIDIA_BASE_URL (integrate.api.nvidia.com OpenAI-compatible base): ' NVIDIA_BASE_URL; echo
99+
cat > .tmp/harbor/secrets/nvidia.env <<EOF
100+
NVIDIA_API_KEY=${NVIDIA_API_KEY}
101+
NVIDIA_BASE_URL=${NVIDIA_BASE_URL}
102+
EOF
103+
```
104+
105+
<!-- path-check-skip-end -->
106+
107+
## Run the smoke (Terminal-Bench)
108+
109+
This matches a recent working invocation, except **`--agent-import-path`**
110+
replaces `-a openclaw`** so you do not depend on Harbor registering the OpenClaw
111+
agent name.
112+
113+
<!-- path-check-skip-begin -->
114+
115+
```bash
116+
cd /external/NeMo-Agent-Toolkit
117+
118+
set -a
119+
. .tmp/harbor/secrets/nvidia.env
120+
set +a
121+
122+
.venv/bin/harbor run \
123+
-d terminal-bench@2.0 \
124+
-i sqlite-db-truncate \
125+
--agent-import-path nat_harbor.agents.installed.openclaw:OpenClaw \
126+
-m nvidia/qwen/qwen3.5-397b-a17b \
127+
-e docker \
128+
--env-file .tmp/harbor/secrets/nvidia.env \
129+
--jobs-dir .tmp/harbor-openclaw-nemoflow \
130+
--n-concurrent 1 \
131+
--agent-kwarg enable_nemo_flow=true \
132+
-q \
133+
-y
134+
```
135+
136+
Disable NeMo Flow (faster setup, no native plugin):
137+
138+
```bash
139+
.venv/bin/harbor run \
140+
-d terminal-bench@2.0 \
141+
-i sqlite-db-truncate \
142+
--agent-import-path nat_harbor.agents.installed.openclaw:OpenClaw \
143+
-m nvidia/qwen/qwen3.5-397b-a17b \
144+
-e docker \
145+
--env-file .tmp/harbor/secrets/nvidia.env \
146+
--jobs-dir .tmp/harbor-openclaw-nemoflow \
147+
--n-concurrent 1 \
148+
--agent-kwarg enable_nemo_flow=false \
149+
-q \
150+
-y
151+
```
152+
153+
## Run the smoke (SWE-bench)
154+
155+
Same prepared SWE-bench instance as **`opencode-nemoflow-smoke.md`** and
156+
**`hermes-nemoflow-smoke.md`**: `django__django-13741` under
157+
`swebench-opencode-smoke`. The task directory should be:
158+
159+
```text
160+
external/harbor/datasets/swebench-opencode-smoke/django__django-13741
161+
```
162+
163+
If it is missing, create it from a Harbor checkout (same adapter invocation as
164+
in the OpenCode smoke prerequisites):
165+
166+
```bash
167+
cd external/harbor/adapters/swebench
168+
169+
uv run swebench \
170+
--instance-id django__django-13741 \
171+
--task-dir ../../datasets/swebench-opencode-smoke \
172+
--overwrite
173+
174+
cd ../../../..
175+
```
176+
177+
NeMo-Flow-enabled run (OpenClaw import path, same model family as the OpenCode
178+
smoke):
179+
180+
```bash
181+
cd /external/NeMo-Agent-Toolkit
182+
183+
export HARBOR_JOBS_DIR=.tmp/harbor/openclaw-nemoflow-swebench
184+
export SWEBENCH_TASK=external/harbor/datasets/swebench-opencode-smoke/django__django-13741
185+
export JOB_NAME=openclaw-nemoflow-swebench-smoke-1
186+
187+
set -a
188+
. .tmp/harbor/secrets/nvidia.env
189+
set +a
190+
191+
.venv/bin/harbor run \
192+
--path "$SWEBENCH_TASK" \
193+
-l 1 \
194+
--job-name "$JOB_NAME" \
195+
--jobs-dir "$HARBOR_JOBS_DIR" \
196+
--yes -n 1 --max-retries 0 \
197+
--env-file .tmp/harbor/secrets/nvidia.env \
198+
--agent-import-path nat_harbor.agents.installed.openclaw:OpenClaw \
199+
--env docker \
200+
--model nvidia/qwen/qwen3.5-397b-a17b \
201+
--agent-kwarg enable_nemo_flow=true
202+
```
203+
204+
205+
<!-- path-check-skip-end -->
206+
207+
## Quick artifact check
208+
209+
### Terminal-Bench
210+
211+
Set `TRIAL` to the completed trial directory (layout varies slightly by Harbor
212+
version; adjust the `find` if needed):
213+
214+
<!-- path-check-skip-begin -->
215+
216+
```bash
217+
export HARBOR_JOBS_DIR=.tmp/harbor-openclaw-nemoflow
218+
TRIAL=$(find "$HARBOR_JOBS_DIR" -mindepth 2 -maxdepth 3 -type d -name 'sqlite-db-truncate__*' | head -n 1)
219+
test -n "$TRIAL"
220+
ls -la "$TRIAL/agent"
221+
```
222+
223+
### SWE-bench
224+
225+
After a run with `JOB_NAME=openclaw-nemoflow-swebench-smoke-1`:
226+
227+
```bash
228+
export HARBOR_JOBS_DIR=.tmp/harbor/openclaw-nemoflow-swebench
229+
export JOB_NAME=openclaw-nemoflow-swebench-smoke-1
230+
TRIAL=$(find "$HARBOR_JOBS_DIR/$JOB_NAME" -maxdepth 1 -type d -name 'django__django-13741__*' | head -n 1)
231+
test -n "$TRIAL"
232+
ls -la "$TRIAL/agent"
233+
```
234+
235+
### Optional tool comparison
236+
237+
Pick one NeMo Flow plugin observability JSON under `nemo-flow-atif` (if present):
238+
239+
```bash
240+
PLUGIN_TRAJ=$(find "$TRIAL/agent/nemo-flow-atif" -maxdepth 1 -type f -name '*.json' | head -n 1)
241+
echo "PLUGIN_TRAJ=${PLUGIN_TRAJ}"
242+
```
243+
244+
Optional: compare Harbor `trajectory.json` to a plugin artifact **only when** that
245+
file is ATIF-shaped enough for the tool (native plugin output varies by version):
246+
247+
```bash
248+
test -f "$TRIAL/agent/trajectory.json"
249+
test -n "$PLUGIN_TRAJ"
250+
251+
.venv/bin/python -m nat_harbor.smoke.compare_atif_tools \
252+
--native "$TRIAL/agent/trajectory.json" \
253+
--candidate "$PLUGIN_TRAJ"
254+
```
255+
256+
<!-- path-check-skip-end -->
257+
258+
## Known limitations
259+
260+
<!-- path-check-skip-begin -->
261+
262+
- **Setup time:** each trial installs Node, OpenClaw, and (when enabled) the
263+
NeMo Flow npm plugin inside the container.
264+
- **Network:** `npm install -g` and `openclaw plugins install` need reliable
265+
access to the npm registry.
266+
- **ATIF schema:** the vendored agent under `nat_harbor` may emit
267+
`ATIF-v1.6` in `trajectory.json` when used with Harbor releases whose
268+
`Trajectory` model does not yet list `ATIF-v1.7`; the upstream Harbor OpenClaw
269+
agent may use v1.7 once dependencies align.
270+
271+
<!-- path-check-skip-end -->

packages/nvidia_nat_harbor/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ source = "https://github.com/NVIDIA/NeMo-Agent-Toolkit"
4747

4848
[tool.setuptools_dynamic_dependencies]
4949
dependencies = [
50-
"harbor==0.5.0",
50+
"harbor==0.6.6",
5151
"nvidia-nat-atif[full] == {version}",
5252
"nvidia-nat-core == {version}",
5353
"nvidia-nat-eval == {version}",

0 commit comments

Comments
 (0)