NVIDIA-NeMo · oliverholworthy · May 22, 2026 · May 22, 2026 · May 22, 2026
@@ -27,6 +27,12 @@ nemo_automodel = [
     "components/datasets/llm/megatron/Makefile",
 ]
 
+[tool.setuptools.data-files]
+"skills/retrieval-models" = [
+    "skills/retrieval-models/SKILL.md",
+    "skills/retrieval-models/PITFALLS.md",
+]
+
 [tool.setuptools.dynamic]
 version = { attr = "nemo_automodel.package_info.__version__" } # any module attribute compatible with ast.literal_eval
 readme = { file = "README.md", content-type = "text/markdown" }

@@ -26,4 +26,5 @@ To invoke a skill manually, use `/<skill-name>` in your Claude Code session.
 | `cicd` | Commit/PR workflow, CI trigger mechanism, failure investigation |
 | `build-and-dependency` | Container setup, uv package management, environment variables, CLI usage |
 | `testing` | Unit and functional test layout, tier semantics (L0/L1/L2), adding tests |
-| `fern-docs` | Maintain the Fern docs site under `fern/` — pages, slugs, redirects, version aliases, library reference |
+| `retrieval-models` | Work on bi-encoder and cross-encoder retrieval model support |
+| `fern-docs` | Maintain the Fern docs site under `fern/` — pages, slugs, redirects, version aliases, library reference |
diff --git a/skills/retrieval-models/PITFALLS.md b/skills/retrieval-models/PITFALLS.md
@@ -0,0 +1,77 @@
+# Retrieval Model Pitfalls
+
+## Incomplete Registration
+
+Adding a class to `MODEL_ARCH_MAPPING` is not enough for retrieval. Custom
+retrieval classes also need the `{"retrieval"}` tag and a matching
+`SUPPORTED_BACKBONES` entry for each supported task. Without the tag, saved
+checkpoints may miss the retrieval `auto_map` metadata. Without
+`SUPPORTED_BACKBONES`, `build_encoder_backbone` may silently fall back to HF
+Auto classes or reject the task.
+
+## Causal Mask Still Active
+
+For a bidirectional causal-decoder backbone, setting `config.is_causal = False`
+is not a sufficient proof. Verify every attention layer has `is_causal = False`
+and the forward path uses `create_bidirectional_mask`. Add a tiny test where
+changing a later token changes an earlier hidden state.
+
+## Cross-Encoder Labels Look Wrong
+
+Cross-encoder labels are one label per query group, not one label per flattened
+query-passage row. The collator emits `[B]` zero labels from `num_labels`, and
+the recipe reshapes logits from `[B * P, 1]` to `[B, P]`.
+
+## Positive Passage Order Changed
+
+Both bi-encoder and cross-encoder losses assume the positive passage is first.
+If preprocessing changes document order, labels of all zeros become wrong even
+though shapes still pass.
+
+## `n_passages` Mismatch
+
+The dataset controls how many positive-plus-negative passages are produced. The
+recipes reshape or score using `train_n_passages` and `val_n_passages`. If train
+or validation config changes `n_passages` without preserving grouped or
+flattened shape, losses and metrics can be incorrect or fail at `view`.
+
+## Wrong Dataset Or Collator Pair
+
+Use `model_type: bi_encoder` with `BiEncoderCollator`; use
+`model_type: cross_encoder` with `CrossEncoderCollator`. Mixing them usually
+shows up as missing `q_`/`d_` keys, missing labels, or invalid logits reshape.
+
+## Missing Inline Dataset Path
+
+There are two dataset loaders. `retrieval_dataset.py` handles corpus-id JSON and
+`hf://` sources. `retrieval_dataset_inline.py` handles inline JSON/JSONL text and
+rejects corpus-id format. Functional tests often use the inline loader.
+
+## Pooling Passed To Generic HF Models
+
+Pooling is a retrieval-wrapper or custom-backbone concept. Generic HF
+`AutoModel` fallback paths should not receive unsupported pooling kwargs. Let
+`build_encoder_backbone` decide which kwargs are safe for supported custom
+backbones versus HF fallback classes.
+
+## Nested Model Extraction
+
+When using `extract_submodel`, the dotted path must resolve to an object with a
+`.config`. For supported text backbones, the loader rebuilds the registered
+retrieval class from the extracted state dict and moves it to the extracted
+dtype. Test extraction with a tiny fake or local checkpoint before relying on a
+large VLM.
+
+## Save And Reload Metadata
+
+Retrieval wrappers save the inner backbone. `configure_encoder_metadata` sets
+`config.architectures` for all backbones and `config.auto_map` only for classes
+registered as retrieval architectures. If a saved custom retrieval checkpoint
+cannot reload through Auto classes, inspect the registry tag and config
+registration first.
+
+## Distributed In-Batch Negatives
+
+Distributed in-batch negatives gather passages across ranks. Keep
+`passage_doc_ids` from `BiEncoderCollator` so positives with the same corpus
+document id can be masked. This path is not implemented for ColBERT pooling.
diff --git a/skills/retrieval-models/SKILL.md b/skills/retrieval-models/SKILL.md
@@ -0,0 +1,260 @@
+---
+name: retrieval-models
+version: "1.0.0"
+author: NeMo AutoModel maintainers
+description: "NeMo AutoModel retrieval internals: bi/cross-encoder wrappers, bidirectional backbones, recipes, collators, and metadata. Not for LoRA or causal LM generation."
+when_to_use: Adding, modifying, or debugging retrieval model support; working with NeMoAutoModelBiEncoder, NeMoAutoModelCrossEncoder, bidirectional causal-decoder backbones, retrieval recipe configs, retrieval dataset/collator shape issues, or encoder save/reload metadata. Do not use for standard LLM generation or PEFT/LoRA tasks unless they also touch retrieval model wrappers, datasets, collators, or recipes.
+tags:
+  - nemo-automodel
+  - retrieval
+  - bi-encoder
+  - cross-encoder
+tools:
+  - shell
+  - read
+  - edit
+---
+
+# Retrieval Models
+
+## Purpose
+
+Use this skill when a task touches retrieval model behavior, not ordinary LLM
+generation. Retrieval support has three layers that are easy to mix up:
+
+1. Public entry points: `nemo_automodel.NeMoAutoModelBiEncoder.from_pretrained`
+   and `nemo_automodel.NeMoAutoModelCrossEncoder.from_pretrained`.
+2. Retrieval wrappers in `nemo_automodel/_transformers/retrieval.py`:
+   `BiEncoderModel`, `CrossEncoderModel`, `build_encoder_backbone`, and
+   `SUPPORTED_BACKBONES`.
+3. Concrete backbone classes under `nemo_automodel/components/models/`, such as
+   `llama_bidirectional` and `ministral_bidirectional`.
+
+If the prompt is about standard causal generation, instruction tuning, LoRA,
+PEFT, or launcher setup without retrieval model wrappers, datasets, collators,
+or recipes, stop using this skill and choose the relevant LLM training skill.
+
+## Prerequisites
+
+- Work from a NeMo AutoModel checkout and read `AGENTS.md` before editing.
+- Use `uv` for validation commands; do not introduce `pip install` steps.
+- Use the repo's Python, pytest, and ruff configuration rather than ad hoc
+  formatter or test settings.
+
+## References
+
+- `PITFALLS.md`: read when tests fail, save/reload metadata looks wrong, or a
+  recipe shape error appears.
+- `skills/model-onboarding/SKILL.md`: read before creating a new architecture
+  directory or registry entry.
+- `skills/recipe-development/SKILL.md`: read before changing retrieval recipe
+  flow or YAML config shape.
+- `skills/testing/SKILL.md`: read before adding or moving tests.
+
+## First Files
+
+Start with the narrowest surface that matches the task:
+
+- Model construction: `nemo_automodel/_transformers/retrieval.py`
+- Public AutoModel wrapper: `nemo_automodel/_transformers/auto_model.py`
+- Registry: `nemo_automodel/_transformers/registry.py`
+- Existing bidirectional examples:
+  `nemo_automodel/components/models/llama_bidirectional/model.py` and
+  `nemo_automodel/components/models/ministral_bidirectional/model.py`
+- Recipes: `nemo_automodel/recipes/retrieval/train_bi_encoder.py` and
+  `nemo_automodel/recipes/retrieval/train_cross_encoder.py`
+- Dataset/collator: `nemo_automodel/components/datasets/llm/retrieval_dataset.py`,
+  `retrieval_dataset_inline.py`, and `retrieval_collator.py`
+- Example YAMLs: `examples/retrieval/bi_encoder/` and
+  `examples/retrieval/cross_encoder/`
+
+## Work Checklist
+
+1. Classify the change as backbone, wrapper, recipe/config, or dataset/collator.
+2. Read the matching files from the first-files list before planning edits.
+3. For failures, shape mismatches, save/reload metadata issues, or unexpected
+   recipe behavior, read `PITFALLS.md` before proposing a fix.
+4. Preserve the bi-encoder or cross-encoder shape contract while making the
+   smallest code change.
+5. Add or update the focused unit test that proves the contract changed or still
+   holds.
+6. Run the smallest validation command from this skill, then broaden only if the
+   change touches distributed training, checkpointing, or full recipe execution.
+
+## Choose The Implementation Path
+
+Before editing, decide which path applies:
+
+- Generic encoder or scorer already supported by HuggingFace Auto classes:
+  leave `SUPPORTED_BACKBONES` alone unless a custom non-causal backbone is
+  required. `build_encoder_backbone` falls back to `AutoModel` for embedding and
+  `AutoModelForSequenceClassification` for scoring.
+- Causal decoder used for embeddings or reranking:
+  add a bidirectional backbone class that disables causal attention and uses a
+  bidirectional attention mask.
+- Nested model such as a VLM with a text tower:
+  use the `extract_submodel` config knob and verify the extracted object has a
+  `.config`; the loader preserves the extracted dtype when rebuilding the
+  retrieval target class.
+- Cross-encoder with custom non-causal behavior:
+  provide a sequence-classification retrieval class for the `"score"` task.
+  Otherwise the HF sequence-classification fallback may be enough.
+
+## Registration Handshake
+
+Custom retrieval backbones need all of these pieces:
+
+1. Export the model class from the model module with `ModelClass = [...]`.
+2. Register every custom retrieval architecture in
+   `MODEL_ARCH_MAPPING` in `nemo_automodel/_transformers/registry.py`.
+3. Add the optional `{"retrieval"}` tag in `MODEL_ARCH_MAPPING`. This is what
+   lets `configure_encoder_metadata` write retrieval `auto_map` metadata for
+   saved checkpoints.
+4. Add `model_type -> task -> architecture name` entries to
+   `SUPPORTED_BACKBONES` in `nemo_automodel/_transformers/retrieval.py`.
+   Use `"embedding"` for `BiEncoderModel`; use `"score"` for
+   `CrossEncoderModel`.
+5. If the config has a new `model_type`, make sure HuggingFace Auto config/model
+   reload works. Existing retrieval examples register their bidirectional config
+   with `AutoConfig` and `AutoModel`.
+
+## Backbone Rules
+
+For bidirectional causal-decoder backbones, do not stop at setting a config
+field. The forward path must actually be non-causal:
+
+- Set each attention layer's `is_causal` flag to `False`.
+- Replace the causal mask with `transformers.masking_utils.create_bidirectional_mask`.
+- Keep pooling and temperature fields on the retrieval config when the backbone
+  needs them.
+- Preserve HuggingFace return types such as `BaseModelOutputWithPast` or
+  `SequenceClassifierOutputWithPast`.
+
+Use the existing Llama and Ministral bidirectional models as patterns, but copy
+only the behavior the target architecture needs.
+
+## Bi-Encoder Contract
+
+Bi-encoder training keeps query and passage encoding separate.
+
+- YAML model target:
+  `nemo_automodel.NeMoAutoModelBiEncoder.from_pretrained`
+- Dataset: `make_retrieval_dataset(model_type="bi_encoder")`
+- Collator: `BiEncoderCollator`
+- Dataset example shape: one `question` and `doc_text` as
+  `[positive, negative_1, ...]`
+- Collated batch:
+  - `q_input_ids`, `q_attention_mask`: `[B, Lq]`
+  - `d_input_ids`, `d_attention_mask`: `[B * P, Ld]`
+  - `labels`: `[B]` zeros for compatibility
+- The recipe computes scores `[B, P]` and real CE labels internally. The
+  positive passage must be at column 0.
+
+When `do_distributed_inbatch_negative` is enabled, keep `passage_doc_ids` from
+the collator so duplicate positives can be masked across gathered passages.
+ColBERT pooling does not support distributed in-batch negatives.
+
+## Existing Bi-Encoder Migration
+
+For an existing fine-tuned encoder loaded with
+`NeMoAutoModelBiEncoder.from_pretrained`, verify the loader path and embedding
+contract before editing model code:
+
+- Read `auto_model.py` for the public entry point, then `retrieval.py` for
+  `BiEncoderModel.build`, pooling, normalization, and `SUPPORTED_BACKBONES`.
+- Decide whether the checkpoint needs a custom bidirectional backbone or the
+  HuggingFace `AutoModel` fallback.
+- Run a tiny forward pass and confirm embeddings are `[batch, hidden]`, finite,
+  correctly typed, stable under padding, and normalized when expected.
+- For migrations, compare a fixed query/document pair for deterministic shape,
+  finite values, and ranking direction before chasing numerical drift.
+
+## Cross-Encoder Contract
+
+Cross-encoder training jointly encodes a query-passage pair and reshapes scores
+back to query groups.
+
+- YAML model target:
+  `nemo_automodel.NeMoAutoModelCrossEncoder.from_pretrained`
+- Dataset: `make_retrieval_dataset(model_type="cross_encoder")`
+- Collator: `CrossEncoderCollator`
+- Dataset transform flattens grouped passages into one row per query-passage
+  pair and carries `num_labels`.
+- Collated batch:
+  - `input_ids`, `attention_mask`: `[B * P, L]`
+  - `labels`: `[B]` zeros, created from `num_labels`
+- The recipe runs the scorer, reshapes `outputs.logits.view(-1, n_passages)`,
+  and applies CE with the positive at column 0.
+
+Any change to `n_passages`, `eval_negative_size`, or flattening must preserve
+the invariant that flattened rows are divisible by the recipe's
+`train_n_passages` or `val_n_passages`.
+
+## Validation
+
+Prefer focused CPU tests first. Use functional or GPU tests only when changing
+distributed training, checkpointing, or real recipe execution.
+
+For model/backbone changes, run the relevant subset:
+
+```bash
+uv run pytest tests/unit_tests/_transformers/test_retrieval.py -q
+uv run pytest tests/unit_tests/models/bi_encoder/test_bi_encoder_model.py -q
+uv run pytest tests/unit_tests/models/bi_encoder/test_llama_bidirectional_model.py -q
+uv run pytest tests/unit_tests/models/bi_encoder/test_ministral_bidirectional_model.py -q
+```
+
+For dataset, recipe, or shape changes:
+
+```bash
+uv run pytest tests/unit_tests/datasets/llm/test_bi_encoder_collator.py tests/unit_tests/datasets/llm/test_cross_encoder_collator.py -q
+uv run pytest tests/unit_tests/datasets/llm/test_retrieval_dataset.py -q
+uv run pytest tests/unit_tests/recipes/test_train_cross_encoder.py -q
+```
+
+For a new custom retrieval backbone, add tiny tests that cover:
+
+- config fields and model type,
+- all attention layers are non-causal,
+- changing a later token affects an earlier token,
+- `BiEncoderModel.build` resolves through `SUPPORTED_BACKBONES`,
+- `CrossEncoderModel.build` resolves the custom scorer or intentionally falls
+  back to HF sequence classification,
+- `extract_submodel` rebuilds the retrieval target and preserves dtype,
+- saved metadata contains `architectures` and retrieval `auto_map` when the
+  architecture has the `{"retrieval"}` tag.
+
+## Trigger Checks
+
+Use this skill for prompts about retrieval encoders, rerankers, bi-encoder
+training, cross-encoder scoring, bidirectional retrieval backbones, retrieval
+recipe shape errors, and retrieval checkpoint save/reload metadata.
+
+Do not use this skill for unrelated RAG application code, generic causal LM
+generation, VLM chunk retrieval, or hard-negative mining unless the task also
+touches the model wrapper, dataset/collator contract, or retrieval recipe.
+
+## Limitations
+
+- This skill covers NeMo AutoModel retrieval model internals. It is not a guide
+  for generic RAG application wiring, vector databases, or embedding service
+  deployment.
+- It gives CPU-first validation commands. Broaden to GPU, distributed, or full
+  recipe tests only when the changed surface needs that coverage.
+- It assumes the existing HuggingFace fallback path is preferred unless a custom
+  retrieval backbone is explicitly required.
+
+## Troubleshooting
+
+- If saved checkpoints reload without retrieval metadata, check the registry
+  `{"retrieval"}` tag and `configure_encoder_metadata` path first.
+- If cross-encoder logits cannot be reshaped, verify flattened dataset rows are
+  divisible by the configured `train_n_passages` or `val_n_passages`.
+- If a causal decoder appears bidirectional only in config, inspect the forward
+  mask and each attention layer's `is_causal` flag.
+- Read `PITFALLS.md` for deeper failure patterns before widening the change.
+
+## Evaluation
+
+Live evaluation scenarios live in `evals/evals.json`. Validate them with
+`astra-skill-eval validate skills/retrieval-models` before running agent evals.