Reduce moco notebook training iterations in CI via FAST_CI_MODE (#1542)

svc-bionemo · web-flow · commit 91869ec4fa0e · 2026-04-01T18:40:32.000Z
## Summary Fixes [BIO-403](https://linear.app/nvidia/issue/BIO-403/investigate-flaky-tests-and-timeout-errors-in-bionemo-moco-notebook): Investigate flaky tests and timeout errors in bionemo-moco notebook. The `discrete_data_interpolant_tutorial.ipynb` notebook has 3 training loops that hard-code `range(50000)` iterations. When executed in CI via `nbval`, these loops cause timeouts. ## Changes 1. Added a configuration cell that detects the `FAST_CI_MODE` environment variable (already set by `unit-tests-framework.yml` when running notebook tests) 2. Sets `NUM_TRAINING_STEPS = 500` when in CI mode, `50000` otherwise 3. Replaced all 3 `range(50000)` occurrences with `range(NUM_TRAINING_STEPS)` in the DFM, D3PM, and MDLM training loops This follows the existing pattern used in `bionemo-recipes/recipes/evo2_megatron/examples/` notebooks. ## Testing - The notebook behavior is unchanged when run outside CI (`FAST_CI_MODE` not set) - When `FAST_CI_MODE=true` (as in framework CI), iterations drop from 50,000 to 500, preventing timeouts Signed-off-by: svc-bionemo <267129667+svc-bionemo@users.noreply.github.com> Co-authored-by: svc-bionemo <267129667+svc-bionemo@users.noreply.github.com>
diff --git a/sub-packages/bionemo-moco/examples/discrete_data_interpolant_tutorial.ipynb b/sub-packages/bionemo-moco/examples/discrete_data_interpolant_tutorial.ipynb
@@ -22,6 +22,19 @@
     "torch.cuda.manual_seed(42)"
    ]
   },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "\n",
+    "\n",
+    "FAST_CI_MODE: bool = os.environ.get(\"FAST_CI_MODE\", \"\").lower() in (\"1\", \"true\", \"yes\")\n",
+    "NUM_TRAINING_STEPS = 500 if FAST_CI_MODE else 50000"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -144,7 +157,7 @@
    "source": [
     "model = model.to(DEVICE)\n",
     "losses = []\n",
-    "for _ in tqdm(range(50000)):\n",
+    "for _ in tqdm(range(NUM_TRAINING_STEPS)):\n",
     "    num_ones = torch.randint(0, D + 1, (B,))\n",
     "    x1 = (torch.arange(D)[None, :] < num_ones[:, None]).long().to(DEVICE)\n",
     "    # x1 e.g. [1, 1, 1, 0, 0, 0, 0, 0, 0, 0] or [1, 1, 1, 1, 1, 1, 1, 1, 1, 0]\n",
@@ -659,7 +672,7 @@
     "# NBVAL_SKIP\n",
     "model = model.to(DEVICE)\n",
     "losses = []\n",
-    "for _ in tqdm(range(50000)):\n",
+    "for _ in tqdm(range(NUM_TRAINING_STEPS)):\n",
     "    num_ones = torch.randint(0, D + 1, (B,))\n",
     "    x1 = (torch.arange(D)[None, :] < num_ones[:, None]).long().to(DEVICE)\n",
     "    # x1 e.g. [1, 1, 1, 0, 0, 0, 0, 0, 0, 0] or [1, 1, 1, 1, 1, 1, 1, 1, 1, 0]\n",
@@ -892,7 +905,7 @@
     "\n",
     "model = model.to(DEVICE)\n",
     "losses = []\n",
-    "for _ in tqdm(range(50000)):\n",
+    "for _ in tqdm(range(NUM_TRAINING_STEPS)):\n",
     "    num_ones = torch.randint(0, D + 1, (B,))\n",
     "    x1 = (torch.arange(D)[None, :] < num_ones[:, None]).long().to(DEVICE)\n",
     "    # x1 e.g. [1, 1, 1, 0, 0, 0, 0, 0, 0, 0] or [1, 1, 1, 1, 1, 1, 1, 1, 1, 0]\n",