Distillative-AI
diff --git a/‎.github/workflows/gh-docs-deploy.yml‎
Lines changed: 4 additions & 1 deletion b/‎.github/workflows/gh-docs-deploy.yml‎
Lines changed: 4 additions & 1 deletion
diff --git a/‎README.md‎
Lines changed: 5 additions & 5 deletions b/‎README.md‎
Lines changed: 5 additions & 5 deletions
diff --git a/‎bionemo-recipes/README.md‎
Lines changed: 1 addition & 1 deletion b/‎bionemo-recipes/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎bionemo-recipes/recipes/codonfm_native_te/README.md‎
Lines changed: 1 addition & 1 deletion b/‎bionemo-recipes/recipes/codonfm_native_te/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎bionemo-recipes/recipes/context_parallel.md‎
Lines changed: 1 addition & 1 deletion b/‎bionemo-recipes/recipes/context_parallel.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎bionemo-recipes/recipes/esm2_accelerate_te/README.md‎
Lines changed: 1 addition & 1 deletion b/‎bionemo-recipes/recipes/esm2_accelerate_te/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎bionemo-recipes/recipes/esm2_native_te/README.md‎
Lines changed: 1 addition & 1 deletion b/‎bionemo-recipes/recipes/esm2_native_te/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎bionemo-recipes/recipes/fp8_analysis/README.md‎
Lines changed: 1 addition & 1 deletion b/‎bionemo-recipes/recipes/fp8_analysis/README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎bionemo-recipes/recipes/geneformer_native_te_mfsdp_fp8/README.md‎
Lines changed: 2 additions & 2 deletions b/‎bionemo-recipes/recipes/geneformer_native_te_mfsdp_fp8/README.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎bionemo-recipes/recipes/llama3_native_te/README.md‎
Lines changed: 2 additions & 2 deletions b/‎bionemo-recipes/recipes/llama3_native_te/README.md‎
Lines changed: 2 additions & 2 deletions
@@ -28,7 +28,10 @@ jobs:
           python -m pip install --upgrade pip
           pip install -r docs/requirements.txt
       - name: Build site
-        run: mkdocs build
+        run: mkdocs build --strict
+        working-directory: docs
+      - name: Check internal links
+        run: python scripts/check_internal_links.py site
         working-directory: docs
       - name: Configure Git Credentials
         if: github.event_name == 'push'
 
@@ -6,8 +6,8 @@
 <div align="left">
 
 [![Click here to deploy.](https://uohmivykqgnnbiouffke.supabase.co/storage/v1/object/public/landingpage/brevdeploynavy.svg)](https://console.brev.dev/launchable/deploy/now?launchableID=env-2pPDA4sJyTuFf3KsCv5KWRbuVlU)
-[![Docs Build](https://img.shields.io/github/actions/workflow/status/NVIDIA/bionemo-framework/pages/pages-build-deployment?label=docs-build)](https://nvidia.github.io/bionemo-framework)
-[![Test Status](https://github.com/NVIDIA/bionemo-framework/actions/workflows/unit-tests.yml/badge.svg)](https://github.com/NVIDIA/bionemo-framework/actions/workflows/unit-tests.yml)
+[![Docs Build](https://img.shields.io/github/actions/workflow/status/NVIDIA/bionemo-framework/pages/pages-build-deployment?label=docs-build)](https://nvidia-bionemo.github.io/bionemo-framework)
+[![Test Status](https://github.com/NVIDIA-BioNeMo/bionemo-framework/actions/workflows/unit-tests.yml/badge.svg)](https://github.com/NVIDIA-BioNeMo/bionemo-framework/actions/workflows/unit-tests.yml)
 [![Latest Tag](https://img.shields.io/github/v/tag/NVIDIA/bionemo-framework?label=latest-version)](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/clara/containers/bionemo-framework/tags)
 [![codecov](https://codecov.io/gh/NVIDIA/bionemo-framework/branch/main/graph/badge.svg?token=XqhegdZRqB)](https://codecov.io/gh/NVIDIA/bionemo-framework)
 
@@ -55,8 +55,8 @@ cd bionemo-framework/bionemo-recipes/recipes/esm2_native_te/
 - 02/23/2026 [Mixtral MoE model](bionemo-recipes/models/mixtral/) with TE `GroupedLinear` for efficient parallel expert computation, FP8/FP4 support, and HF conversion.
 - 02/13/2026 [ESM2 PEFT recipe](bionemo-recipes/recipes/esm2_peft_te/) for LoRA fine-tuning with sequence packing support.
 - 01/14/2026 [Llama3 Context Parallelism](bionemo-recipes/recipes/llama3_native_te/README.md#performance-benchmarks) — scaling Llama 3 70B to 144K context on 36x GB300 NVL36 with ~65% MFU.
-- 10/27/2025 [CodonFM recipe](https://github.com/NVIDIA/bionemo-framework/tree/main/bionemo-recipes/recipes/codonfm_ptl_te) released! This is an accelerated version of the original [research codebase](https://github.com/NVIDIA-Digital-Bio/CodonFM) with [scientific preprint](https://research.nvidia.com/labs/dbr/assets/data/manuscripts/nv-codonfm-preprint.pdf).
-- 09/01/2025 [bionemo-recipes](https://github.com/NVIDIA/bionemo-framework/tree/main/bionemo-recipes) goes live! Lightweight and portable examples with state-of-the-art training performance you can riff on to meet your needs.
+- 10/27/2025 [CodonFM recipe](https://github.com/NVIDIA-BioNeMo/bionemo-framework/tree/main/bionemo-recipes/recipes/codonfm_ptl_te) released! This is an accelerated version of the original [research codebase](https://github.com/NVIDIA-Digital-Bio/CodonFM) with [scientific preprint](https://research.nvidia.com/labs/dbr/assets/data/manuscripts/nv-codonfm-preprint.pdf).
+- 09/01/2025 [bionemo-recipes](https://github.com/NVIDIA-BioNeMo/bionemo-framework/tree/main/bionemo-recipes) goes live! Lightweight and portable examples with state-of-the-art training performance you can riff on to meet your needs.
 
 ## Code Overview
 
@@ -114,7 +114,7 @@ BioNeMo Framework is part of a larger ecosystem of NVIDIA Biopharma products. Ge
 
 ## Documentation Resources
 
-- **Official Documentation:** Guides, API references, and troubleshooting for the framework are documented on our [official documentation](https://docs.nvidia.com/bionemo-framework/latest/). Nightly builds of this documentation are available on [BioNeMo Framework GitHub Pages](https://nvidia.github.io/bionemo-framework/)
+- **Official Documentation:** Guides, API references, and troubleshooting for the framework are documented on our [official documentation](https://docs.nvidia.com/bionemo-framework/latest/). Nightly builds of this documentation are available on [BioNeMo Framework GitHub Pages](https://nvidia-bionemo.github.io/bionemo-framework/)
 
 - **🚧 In-Progress Documentation 🚧:** `bionemo-recipes` documentation is currently work in progress, however the recipes are meant to be self-documented and easy to understand—we suggest you throw them into your favorite genai code assistant!
 
 
@@ -14,7 +14,7 @@ The biological AI community actively prototypes model architectures and needs to
 ### Performance Benchmarks
 
 <p align="center">
-  <img src="https://raw.githubusercontent.com/NVIDIA/bionemo-framework/main/docs/docs/assets/images/esm2/esm2_native_te_benchmarks.svg" width="600">
+  <img src="../docs/docs/assets/images/esm2/esm2_native_te_benchmarks.svg" width="600">
   <br>
   <em> Training benchmarks for ESM-2 using the <code>esm2_native_te</code> recipe.</em>
 </p>
 
@@ -195,4 +195,4 @@ e.g., `python train_fsdp2.py fp8_config.enabled=true`. For verbose logging, use
 
 ## License
 
-Refer to [LICENSE](../../LICENSE).
+Refer to the [bionemo-recipes LICENSE](https://github.com/NVIDIA-BioNeMo/bionemo-framework/blob/main/bionemo-recipes/LICENSE).
@@ -18,7 +18,7 @@ The core idea behind CP is to partition the data into various chunks, with each
 
 In BioNeMo, we've created some abstractions to partition the data for you. There exists a [ContextParallelDataLoaderWrapper](esm2_native_te/collator.py) that will shard the CP data for you and send it to each device. This dataloader operates on Sequence Packed (THD) data [link](https://docs.nvidia.com/nemo-framework/user-guide/24.12/nemotoolkit/features/optimizations/sequence_packing.html). This `ContextParallelDataLoaderWrapper` will take as arguments your CP group and local CP rank. This dataloader wrapper will call its underlying dataloader to generate a unique piece of data and then shard those unique sequences across your CP groups. This is beneficial because you won't need to maintain a deterministic data pipeline because unique data is only being generated across the non CP groups, and it is replicated across the CP groups. More details below.
 
-Alternatively, one could utilize any DataLoader such as the canonical [PyTorch DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), however, you would have to ensure that your dataset is synchronized across CP ranks. In some cases, if you have a non-deterministic data pipeline, even if you attempt to get the same data from a dataloader it may be different due to non-deterministic preprocessing stages such as masking. For more information on preserving determinism in your datasets, please see [MegatronLMDataModule](https://nvidia.github.io/bionemo-framework/main/about/background/megatron_datasets/).
+Alternatively, one could utilize any DataLoader such as the canonical [PyTorch DataLoader](https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader), however, you would have to ensure that your dataset is synchronized across CP ranks. In some cases, if you have a non-deterministic data pipeline, even if you attempt to get the same data from a dataloader it may be different due to non-deterministic preprocessing stages such as masking. For more information on preserving determinism in your datasets, please see [MegatronLMDataModule](../../docs/docs/main/about/background/megatron_datasets.md).
 
 ### Context Parallelism Sharding Example
 
 
@@ -6,7 +6,7 @@ This folder demonstrates how to train TE-accelerated ESM-2 using the [Hugging Fa
 
 This folder contains an independent, minimal training example. It does not depend on any other code in the top-level
 bionemo-framework repository. You can download a zipped directory of this folder alone by clicking
-[here](https://download-directory.github.io?url=https://github.com/NVIDIA/bionemo-framework/tree/main/bionemo-recipes/recipes/esm2_accelerate_te&filename=esm2-accelerate-te).
+[here](https://download-directory.github.io?url=https://github.com/NVIDIA-BioNeMo/bionemo-framework/tree/main/bionemo-recipes/recipes/esm2_accelerate_te&filename=esm2-accelerate-te).
 
 ### How to deploy this recipe on cloud providers
 
 
@@ -8,7 +8,7 @@ training.
 
 This folder contains an independent, minimal training example. It does not depend on any other code in the top-level
 bionemo-framework repository. You can download a zipped directory of this folder alone by clicking
-[here](https://download-directory.github.io?url=https://github.com/NVIDIA/bionemo-framework/tree/main/bionemo-recipes/recipes/esm2_native_te&filename=esm2-native-te).
+[here](https://download-directory.github.io?url=https://github.com/NVIDIA-BioNeMo/bionemo-framework/tree/main/bionemo-recipes/recipes/esm2_native_te&filename=esm2-native-te).
 
 ### How to deploy this recipe on cloud providers
 
 
@@ -30,7 +30,7 @@ and training scripts.
 | ESM2   | ✓   | ✓     | ✗     |
 | LLAMA3 | ✓   | ✓     | ✗     |
 
-To gather FP8 statistics for analysis, refer to the model-specific documentation (e.g., [ESM2 FP8 Debugging](../esm2_native_te/README.md#fp8-debugging)) or add these arguments to your training command:
+To gather FP8 statistics for analysis, refer to the model-specific documentation (e.g., [ESM2 quantized training](../esm2_native_te/README.md#quantized-training-fp8-mxfp8-nvfp4)) or add these arguments to your training command:
 
 ```python
 python train_fsdp2.py \
 
@@ -2,8 +2,8 @@
 
 # ⚠️ IMPORTANT FOR AI AGENTS ⚠️
 
-**DO NOT proceed without reading [AI_DOCUMENTATION.md](AI_DOCUMENTATION.md) first.**
-This file contains comprehensive documentation specifically designed for AI agents. Please see [gitingest.txt](./internal/gitingest.txt) for the complete codebase.
+**DO NOT proceed without reading [AGENT_DOCUMENTATION.md](AGENT_DOCUMENTATION.md) first.**
+This file contains comprehensive documentation specifically designed for AI agents. Please see [gitingest.sh](./internal/gitingest.sh) for the complete codebase.
 
 # Geneformer Pretraining with mfsdp and a custom pytorch training loop.
 
 
@@ -8,7 +8,7 @@ training. This recipe is configured for genomic sequences using a custom nucleot
 
 This folder contains an independent, minimal training example. It does not depend on any other code in the top-level
 bionemo-framework repository. You can download a zipped directory of this folder alone by clicking
-[here](https://download-directory.github.io?url=https://github.com/NVIDIA/bionemo-framework/tree/main/bionemo-recipes/recipes/llama3_native_te&filename=llama3-native-te).
+[here](https://download-directory.github.io?url=https://github.com/NVIDIA-BioNeMo/bionemo-framework/tree/main/bionemo-recipes/recipes/llama3_native_te&filename=llama3-native-te).
 
 ### How to deploy this recipe on cloud providers
 
@@ -145,7 +145,7 @@ We compared the convergence of this Llama3 recipe (with FSDP2) against NeMo 2.0
 implementation on the DCLM Baseline 1.0 dataset. See [Training on Natural Language Data (Lingua
 Reproduction)](#lingua-reproduction) for more details. The figure above shows similar loss convergence and step time to
 the NeMo 2.0 training example, and the following table shows downstream performance on various tasks using the
-[lm-eval](github.com/eleutherai/lm-evaluation-harness) library. The variation in training step time every 10,000 steps
+[lm-eval](https://github.com/eleutherai/lm-evaluation-harness) library. The variation in training step time every 10,000 steps
 are due checkpointing, further work will be done to improve training step time stability.
 
 | name                | arc_challenge | arc_easy | boolq | copa | hella_swag | piqa  | winogrande |
Original file line number	Diff line number	Diff line change
@@ -195,4 +195,4 @@ e.g., `python train_fsdp2.py fp8_config.enabled=true`. For verbose logging, use
`195`	`195`
`196`	`196`	`## License`
`197`	`197`
`198`		`-Refer to [LICENSE](../../LICENSE).`
	`198`	`+Refer to the [bionemo-recipes LICENSE](https://github.com/NVIDIA-BioNeMo/bionemo-framework/blob/main/bionemo-recipes/LICENSE).`