|
| 1 | +--- |
| 2 | +name: nemo-curator-docs |
| 3 | +description: Maintain the NeMo Curator Fern docs site — add, update, move, or remove pages under fern/. Use for any documentation changes. |
| 4 | +--- |
| 5 | + |
| 6 | +# NeMo Curator Docs Maintenance |
| 7 | + |
| 8 | +Unified skill for adding, updating, moving, and removing pages on the NeMo Curator Fern documentation site. |
| 9 | + |
| 10 | +## Scope Rule |
| 11 | + |
| 12 | +**ALL docs edits happen under `fern/`.** The legacy `docs/` directory is deprecated — do not add or move content into it. Release notes, migration guides, and every new page belong under `fern/`. |
| 13 | + |
| 14 | +## Layout at a Glance |
| 15 | + |
| 16 | +``` |
| 17 | +fern/ |
| 18 | +├── fern.config.json # Minimal Fern config (org + CLI version) |
| 19 | +├── docs.yml # Site config: versions, tabs, redirects, libraries |
| 20 | +├── versions/ |
| 21 | +│ ├── latest.yml # Symlink → v26.02.yml (do not edit directly) |
| 22 | +│ ├── v26.02.yml # Nav tree for current train |
| 23 | +│ ├── v26.02/pages/ # MDX content for current train |
| 24 | +│ ├── v25.09.yml |
| 25 | +│ └── v25.09/pages/ |
| 26 | +├── components/ # Custom TSX components (footer, etc.) |
| 27 | +├── assets/ # Images, SVGs, favicon |
| 28 | +├── substitute_variables.py # CI: resolves {{ variables }} in MDX |
| 29 | +└── AUTODOCS_GUIDE.md # Library reference generation guide |
| 30 | +``` |
| 31 | + |
| 32 | +**Current train:** `v26.02`. Default all new pages there unless the user specifies a version. |
| 33 | + |
| 34 | +``` |
| 35 | +File system Published URL |
| 36 | +─────────────────────────────────────── ──────────────────────────────────────── |
| 37 | +fern/versions/v26.02/pages/ docs.nvidia.com/nemo/curator/latest/ |
| 38 | + └─ get-started/text.mdx └─ get-started/text |
| 39 | +fern/versions/v26.02.yml ── nav for ──┐ docs.nvidia.com/nemo/curator/v26.02/ |
| 40 | +fern/versions/latest.yml ─ symlink ───┘ └─ get-started/text |
| 41 | +fern/versions/v25.09/pages/ docs.nvidia.com/nemo/curator/v25.09/ |
| 42 | + └─ get-started/text.mdx └─ get-started/text |
| 43 | +``` |
| 44 | + |
| 45 | +## Operations |
| 46 | + |
| 47 | +### Add a Page |
| 48 | + |
| 49 | +1. Gather: page title, target section, filename (kebab-case `.mdx`), subdirectory under `fern/versions/v26.02/pages/`. |
| 50 | +2. Create `fern/versions/v26.02/pages/<subdirectory>/<filename>.mdx`: |
| 51 | + |
| 52 | +```mdx |
| 53 | +--- |
| 54 | +description: "One-line SEO description" |
| 55 | +categories: ["<category>"] |
| 56 | +tags: ["<tag-1>", "<tag-2>"] |
| 57 | +personas: ["<persona>"] |
| 58 | +difficulty: "beginner" # beginner | intermediate | advanced |
| 59 | +content_type: "tutorial" # tutorial | how-to | reference | concept | index |
| 60 | +modality: "text-only" # text-only | image-only | video-only | audio-only | universal |
| 61 | +--- |
| 62 | + |
| 63 | +# <Page Title> |
| 64 | + |
| 65 | +<content> |
| 66 | +``` |
| 67 | + |
| 68 | +3. Add a nav entry in `fern/versions/v26.02.yml` under the correct section: |
| 69 | + |
| 70 | +```yaml |
| 71 | +- page: <Page Title> |
| 72 | + path: ./v26.02/pages/<subdirectory>/<filename>.mdx |
| 73 | + slug: <filename> |
| 74 | +``` |
| 75 | +
|
| 76 | +4. If this also applies to `latest`, no action needed — `latest.yml` is a symlink to `v26.02.yml`. |
| 77 | + |
| 78 | +### Update a Page |
| 79 | + |
| 80 | +1. Locate by path, title, or keyword (`grep -rn` in `fern/versions/v26.02/pages/`). |
| 81 | +2. **Content only** — edit the MDX directly. |
| 82 | +3. **Title change** — update the frontmatter and the `- page:` name in `fern/versions/v26.02.yml`. |
| 83 | +4. **Section move** — `git mv` the file, update its `path:` in the nav, and fix all incoming links. |
| 84 | +5. **Slug change** — update `slug:` in the nav and add a redirect in `fern/docs.yml` so old URLs keep working. |
| 85 | + |
| 86 | +### Remove a Page |
| 87 | + |
| 88 | +1. Find incoming links: `grep -r "<filename>" fern/versions/v26.02/pages/ --include="*.mdx"`. |
| 89 | +2. `git rm fern/versions/v26.02/pages/<subdirectory>/<filename>.mdx`. |
| 90 | +3. Remove the `- page:` block from `fern/versions/v26.02.yml`. If it was the last page in a section, remove the `- section:` block. |
| 91 | +4. Fix or remove all incoming links found in step 1. |
| 92 | +5. Add a redirect in `fern/docs.yml` if the URL was public. |
| 93 | + |
| 94 | +### Back-port to an Older Version |
| 95 | + |
| 96 | +Only when explicitly asked. Repeat the operation in the corresponding `fern/versions/vXX.YY/` tree and `vXX.YY.yml` nav. MDX content often diverges between trains — do not blindly copy. |
| 97 | + |
| 98 | +### Worked Example: Adding a Page |
| 99 | + |
| 100 | +Request: *"Add a how-to for benchmarking text pipelines under Curate Text."* |
| 101 | + |
| 102 | +1. Create `fern/versions/v26.02/pages/curate-text/benchmarking.mdx`: |
| 103 | + |
| 104 | + ```mdx |
| 105 | + --- |
| 106 | + description: "Benchmark text curation pipelines and interpret throughput and memory metrics" |
| 107 | + categories: ["how-to"] |
| 108 | + tags: ["text-curation", "benchmarking", "performance"] |
| 109 | + personas: ["mle-focused"] |
| 110 | + difficulty: "intermediate" |
| 111 | + content_type: "how-to" |
| 112 | + modality: "text-only" |
| 113 | + --- |
| 114 | +
|
| 115 | + # Benchmark Text Pipelines |
| 116 | +
|
| 117 | + <content> |
| 118 | + ``` |
| 119 | + |
| 120 | +2. Add nav entry in `fern/versions/v26.02.yml` under the existing `Curate Text` section: |
| 121 | + |
| 122 | + ```yaml |
| 123 | + - page: Benchmark Text Pipelines |
| 124 | + path: ./v26.02/pages/curate-text/benchmarking.mdx |
| 125 | + slug: benchmarking |
| 126 | + ``` |
| 127 | + |
| 128 | +3. `cd fern && fern check` then `fern docs dev` and verify the page renders at `/curate-text/benchmarking`. |
| 129 | + |
| 130 | +### Worked Example: Renaming a Slug (with Redirect) |
| 131 | + |
| 132 | +Request: *"Rename `/curate-text/benchmarking` to `/curate-text/performance`."* |
| 133 | + |
| 134 | +1. Update `slug:` in `fern/versions/v26.02.yml`: `slug: performance`. |
| 135 | +2. (Optional) `git mv` the MDX file if you want the filename to match the slug. |
| 136 | +3. Add a redirect to `fern/docs.yml` so old links keep working: |
| 137 | + |
| 138 | + ```yaml |
| 139 | + redirects: |
| 140 | + - source: "/nemo/curator/latest/curate-text/benchmarking" |
| 141 | + destination: "/nemo/curator/latest/curate-text/performance" |
| 142 | + - source: "/nemo/curator/v26.02/curate-text/benchmarking" |
| 143 | + destination: "/nemo/curator/v26.02/curate-text/performance" |
| 144 | + ``` |
| 145 | + |
| 146 | +4. `grep -rn "/curate-text/benchmarking" fern/versions/v26.02/pages/` and update any incoming links. |
| 147 | + |
| 148 | +--- |
| 149 | + |
| 150 | +## Content Guidelines |
| 151 | + |
| 152 | +NeMo Curator uses **Fern-native MDX components directly** (unlike Dynamo, which converts GitHub callouts in CI). Do not use `> [!NOTE]` syntax — it will not render. |
| 153 | + |
| 154 | +| Purpose | Component | |
| 155 | +|---|---| |
| 156 | +| Neutral aside | `<Note>...</Note>` | |
| 157 | +| Helpful tip | `<Tip>...</Tip>` | |
| 158 | +| Informational callout | `<Info>...</Info>` | |
| 159 | +| Warning | `<Warning>...</Warning>` | |
| 160 | +| Error / danger | `<Error>...</Error>` | |
| 161 | +| Card grid on index pages | `<Cards>` with `<Card title="..." href="...">` children | |
| 162 | + |
| 163 | +Images live in `fern/assets/` (shared) or `fern/versions/vXX.YY/pages/_images/` (version-scoped). Reference with root-relative paths. |
| 164 | + |
| 165 | +Component examples: |
| 166 | + |
| 167 | +```mdx |
| 168 | +<Tip> |
| 169 | +If `uv` is not installed, see the [Installation Guide](/admin/installation). |
| 170 | +</Tip> |
| 171 | + |
| 172 | +<Warning> |
| 173 | +GPU-accelerated dedup requires CUDA {{ recommended_cuda }} or later. |
| 174 | +</Warning> |
| 175 | + |
| 176 | +<Cards> |
| 177 | + <Card title="Text Curation" href="/get-started/text"> |
| 178 | + Set up and run text curation workflows. |
| 179 | + </Card> |
| 180 | + <Card title="Image Curation" href="/get-started/image"> |
| 181 | + Set up and run image curation workflows. |
| 182 | + </Card> |
| 183 | +</Cards> |
| 184 | +``` |
| 185 | + |
| 186 | +## Frontmatter Fields |
| 187 | + |
| 188 | +Required: `description`. |
| 189 | +Optional but strongly preferred: `categories`, `tags`, `personas`, `difficulty`, `content_type`, `modality`. Existing pages in the same section are the best reference for valid values. |
| 190 | + |
| 191 | +`title` is taken from the `- page:` entry in the nav file; the MDX file itself uses an `# H1` heading matching the page name. |
| 192 | + |
| 193 | +## Variable Substitution |
| 194 | + |
| 195 | +Tokens like `{{ product_name }}`, `{{ container_version }}`, `{{ current_release }}`, `{{ github_repo }}`, `{{ min_python_version }}` are resolved by `fern/substitute_variables.py` at CI time. Use them instead of hard-coding versions or URLs. Canonical list in `DEFAULT_VARIABLES` at the top of that file. |
| 196 | + |
| 197 | +Example in MDX: |
| 198 | + |
| 199 | +```mdx |
| 200 | +Install {{ product_name }} {{ current_release }} from {{ github_repo }}. |
| 201 | +Requires Python {{ min_python_version }}+ and CUDA {{ recommended_cuda }}. |
| 202 | +``` |
| 203 | + |
| 204 | +After substitution at CI time: |
| 205 | + |
| 206 | +``` |
| 207 | +Install NeMo Curator 25.09 from https://github.com/NVIDIA-NeMo/Curator. |
| 208 | +Requires Python 3.10+ and CUDA 12.0+. |
| 209 | +``` |
| 210 | + |
| 211 | +To preview substitution locally: |
| 212 | + |
| 213 | +```bash |
| 214 | +python fern/substitute_variables.py versions/v26.02 --version 26.02 --dry-run |
| 215 | +``` |
| 216 | + |
| 217 | +## Validate |
| 218 | + |
| 219 | +```bash |
| 220 | +cd fern |
| 221 | +fern check # YAML + frontmatter validation |
| 222 | +fern docs broken-links # link check |
| 223 | +fern docs dev # localhost:3000 hot-reload preview |
| 224 | +``` |
| 225 | + |
| 226 | +`fern check` must pass before commit. Broken-link check can be deferred but must pass in CI. |
| 227 | + |
| 228 | +## Commit & Preview |
| 229 | + |
| 230 | +```bash |
| 231 | +git add fern/ |
| 232 | +git commit -s -m "docs: <add|update|remove> <page-title>" |
| 233 | +``` |
| 234 | + |
| 235 | +PRs that touch `fern/**` get an automatic Fern preview URL posted as a comment by `.github/workflows/fern-docs-preview.yml`. No manual step needed. |
| 236 | + |
| 237 | +``` |
| 238 | + ┌─ fern-docs-ci.yml → fern check + autodocs |
| 239 | +PR (touches fern/) ─┼─ fern-docs-preview.yml → preview build |
| 240 | + └─ fern-docs-preview-*.yml → 🌿 preview URL comment |
| 241 | +
|
| 242 | +Merge to main → NO publish. Site is unchanged. |
| 243 | +
|
| 244 | +Tag push (docs/v*) → publish-fern-docs.yml → docs.nvidia.com/nemo/curator |
| 245 | +``` |
| 246 | + |
| 247 | +## Publishing to Production |
| 248 | + |
| 249 | +**Merging to `main` does NOT publish.** Production only updates when a tag matching `docs/v*` is pushed (or the workflow is manually dispatched from the **Actions** tab). Do not push tags unless the user asks. |
| 250 | + |
| 251 | +Tag must be `docs/v<MAJOR>.<MINOR>.<PATCH>` — the `docs/v` prefix is required by the workflow trigger and the semver suffix should match the docs release in `CHANGELOG.md`. |
| 252 | + |
| 253 | +```bash |
| 254 | +# Correct — triggers publish |
| 255 | +git tag docs/v1.1.0 |
| 256 | +git push origin docs/v1.1.0 |
| 257 | + |
| 258 | +git tag docs/v1.2.0-rc1 # pre-release suffix is fine, still matches docs/v* |
| 259 | +git push origin docs/v1.2.0-rc1 |
| 260 | + |
| 261 | +# Wrong — these will NOT trigger publish |
| 262 | +git tag v1.1.0 # missing docs/ prefix |
| 263 | +git tag docs/1.1.0 # missing v |
| 264 | +git tag docs-v1.1.0 # wrong separator |
| 265 | +``` |
| 266 | + |
| 267 | +URL → version mapping after publish: |
| 268 | + |
| 269 | +``` |
| 270 | +docs.nvidia.com/nemo/curator/latest/... → symlink to current train (v26.02 today) |
| 271 | +docs.nvidia.com/nemo/curator/v26.02/... → 26.02 train |
| 272 | +docs.nvidia.com/nemo/curator/v25.09/... → 25.09 train |
| 273 | +``` |
| 274 | + |
| 275 | +## Version Ship Checklist (when cutting a new train) |
| 276 | + |
| 277 | +When the user ships a new version (e.g. `v26.04`): |
| 278 | + |
| 279 | +1. Copy `fern/versions/v26.02/pages/` → `fern/versions/v26.04/pages/` and edit content. |
| 280 | +2. Copy `fern/versions/v26.02.yml` → `fern/versions/v26.04.yml` and update all `./v26.02/` path prefixes. |
| 281 | +3. Repoint the symlink: `ln -sf v26.04.yml fern/versions/latest.yml`. |
| 282 | +4. Update `fern/docs.yml` `versions:` list — add the new display-name, mark older trains stable. |
| 283 | +5. Add redirect rules in `fern/docs.yml` for `/nemo/curator/26.04/:path*` → `/nemo/curator/v26.04/:path*` (see existing patterns). |
| 284 | +6. Align `display-name` strings with `CHANGELOG.md` and `nemo_curator/package_info.py`. |
| 285 | + |
| 286 | +## Debugging |
| 287 | + |
| 288 | +| Symptom | Fix | |
| 289 | +|---|---| |
| 290 | +| `fern check` YAML error | 2-space indent; `- page:` inside `contents:`; `path:` is relative to the version YAML file | |
| 291 | +| Page 404 in preview | `slug:` missing or duplicated in the same section; confirm in `vXX.YY.yml` | |
| 292 | +| `{{ variable }}` shows literally on site | Not in `DEFAULT_VARIABLES` in `substitute_variables.py` — add it there | |
| 293 | +| MDX parse error | Replace bare `<https://...>` with `[text](https://...)`; escape `<` in prose with `<` or backticks | |
| 294 | +| Old Sphinx URL breaks | Add a `redirects:` entry in `fern/docs.yml` | |
| 295 | +| Library reference missing | Run `fern docs md generate` in `fern/` (see `fern/AUTODOCS_GUIDE.md`) | |
| 296 | +| Broken image | Path is relative to the MDX file; check `fern/assets/` or `pages/_images/` exists | |
| 297 | + |
| 298 | +## Key References |
| 299 | + |
| 300 | +| File | Purpose | |
| 301 | +|---|---| |
| 302 | +| `fern/docs.yml` | Site config, versions, redirects, libraries | |
| 303 | +| `fern/versions/vXX.YY.yml` | Navigation tree for a version | |
| 304 | +| `fern/versions/vXX.YY/pages/` | MDX content for a version | |
| 305 | +| `fern/versions/latest.yml` | Symlink → current train's nav (do not edit) | |
| 306 | +| `fern/components/` | Custom TSX (footer, release banner) | |
| 307 | +| `fern/assets/` | Shared images, SVGs, favicon | |
| 308 | +| `fern/substitute_variables.py` | Variable definitions + CI replacement | |
| 309 | +| `fern/AUTODOCS_GUIDE.md` | Generating library reference MDX from source | |
| 310 | +| `fern/README.md` | Full docs architecture guide | |
| 311 | +| `.github/workflows/fern-docs-*.yml` | CI: validation, preview, publish | |
| 312 | + |
| 313 | +--- |
0 commit comments