Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 10 additions & 8 deletions scripts/knowledgebase-nav/Architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,11 +130,11 @@ Functions are grouped below the way they appear in the source file. Names refer

### Article structure and footers

- **`parse_frontmatter`**, **`_extract_body`** split YAML front matter and main body. `_extract_body` uses `_BADGE_START` as the boundary and trims a trailing `---` line cosmetically.
- **`parse_frontmatter`**, **`_extract_body`** split YAML front matter and main body. `_extract_body` uses `_BADGE_START_RE` to locate the boundary and trims a trailing `---` line cosmetically.
- **`_split_frontmatter_raw`** splits the raw MDX into the front matter block and the remainder for footer rewriting.
- **`_normalize_keywords`** coerces `keywords` front matter to a list of strings (YAML list; a single string becomes one tag with a warning; other types warn and become an empty list).
- **`_keywords_list_for_footer`** returns normalized `keywords` for footer generation (delegates to **`_normalize_keywords`**).
- **`_tab_badge_pattern`**, **`build_tab_badges_mdx`**, **`build_keyword_footer_mdx`**, **`_replace_tab_badges_in_body`** implement surgical tab-Badge sync. Managed Badges are enclosed in `_BADGE_START` / `_BADGE_END` marker comments; the function matches markers when present and falls back to regex for pre-marker articles. New footers append a blank line, markers, and Badges.
- **`_tab_badge_pattern`**, **`build_tab_badges_mdx`**, **`build_keyword_footer_mdx`**, **`_replace_tab_badges_in_body`** implement surgical tab-Badge sync. Managed Badges are located via `_BADGE_START_RE` / `_BADGE_END_RE`; the function falls back to regex for pre-marker articles. New footers append a blank line, canonical markers, and Badges.
- **`sync_support_article_footer`**, **`sync_all_support_article_footers`** write article files when tab Badges are out of date with `keywords`.

### Body previews (Card snippets)
Expand Down Expand Up @@ -162,8 +162,8 @@ Functions are grouped below the way they appear in the source file. Names refer
### Site-wide updates

- **`update_docs_json`** updates or creates hidden `Support: <display_name>` tabs under `navigation.languages` where `language` is `en`, setting `pages` to the product index plus sorted tag paths.
- **`update_support_index`** updates count lines on product Cards in root `support.mdx`. Prefers `{/* auto-generated counts */}` markers; falls back to regex for migration.
- **`update_support_featured`** regenerates the featured-articles section between `_FEATURED_START` / `_FEATURED_END` markers in root `support.mdx`.
- **`update_support_index`** updates count lines on product Cards in root `support.mdx`. Locates markers via `_COUNTS_START_RE` / `_COUNTS_END_RE`; falls back to a bare count-line pattern for migration.
- **`update_support_featured`** regenerates the featured-articles section in root `support.mdx`, locating the block via `_FEATURED_START_RE` / `_FEATURED_END_RE`.

### CLI

Expand All @@ -173,16 +173,18 @@ Functions are grouped below the way they appear in the source file. Names refer

- **`BODY_PREVIEW_MAX_LENGTH`** and **`BODY_PREVIEW_SUFFIX`** control Card preview length and ellipsis.
- **`DOCS_JSON_NAV_LANGUAGE`** is `"en"` and scopes navigation edits to the English tree only.
- **`_BADGE_START`** / **`_BADGE_END`** are the MDX comment markers that wrap managed tab Badges on each article page.
- **`_FEATURED_START`** / **`_FEATURED_END`** are the MDX comment markers that wrap the featured-articles section in root `support.mdx`.
- **`_make_markers(keyword)`** generates the four constants below for each managed section: canonical start/end strings for writing and compiled `re.Pattern` objects for reading.
- **`_BADGE_START`** / **`_BADGE_END`** — canonical `{/* AUTO-GENERATED: tab badges */}` strings written to article files. **`_BADGE_START_RE`** / **`_BADGE_END_RE`** — patterns used to locate the block (case-insensitive, colon optional, keyword anywhere in the comment).
- **`_COUNTS_START`** / **`_COUNTS_END`** — canonical `{/* AUTO-GENERATED: counts */}` strings written to `support.mdx`. **`_COUNTS_START_RE`** / **`_COUNTS_END_RE`** — patterns used inside the Card-anchored structural pattern that locates and replaces count lines.
- **`_FEATURED_START`** / **`_FEATURED_END`** — canonical `{/* AUTO-GENERATED: featured articles */}` strings written to `support.mdx`. **`_FEATURED_START_RE`** / **`_FEATURED_END_RE`** — patterns used to locate the featured-articles block.

## Design choices

- **Monolithic script**: one file holds all logic so the workflow and contributors have a single place to read and change behavior.
- **Allowed keywords**: `config.yaml` lists valid tags per product; unknown tags still generate pages but emit warnings so content is never dropped silently.
- **Tab Badge ownership**: only `<Badge>` elements linking to `/support/<product>/tags/...` are derived from `keywords`. These are wrapped in marker comments so the generator does not need regex matching after migration. The `---` line between body and badges is cosmetic; `_extract_body` uses `_BADGE_START` as the boundary and trims a trailing `---` only as cleanup.
- **Tab Badge ownership**: only `<Badge>` elements linking to `/support/<product>/tags/...` are derived from `keywords`. These are wrapped in marker comments located by `_BADGE_START_RE` / `_BADGE_END_RE`. The `---` line between body and badges is cosmetic; `_extract_body` uses `_BADGE_START_RE` as the boundary and trims a trailing `---` only as cleanup.
- **Stale tag cleanup**: tag pages that no longer correspond to any article keyword are deleted after generation, before `docs.json` is updated. This keeps the tags directory and navigation free of orphaned entries.
- **Marker-based editing**: all auto-generated sections (article tab Badges, `support.mdx` count lines, and featured articles) use MDX comment markers. This makes managed regions visible to writers and lets the generator replace content precisely without fragile regex anchors. Each marker pair has a migration path that wraps bare content on first run.
- **Marker-based editing**: all auto-generated sections (article tab Badges, `support.mdx` count lines, and featured articles) use MDX comment markers generated by `_make_markers`. Matching is case-insensitive with an optional colon, and the keyword can appear anywhere inside the comment, so authors can freely annotate markers without breaking the generator. Each marker pair has a migration path that wraps bare content on first run.
- **Golden tests**: compare generated tag pages, product index pages, article files (including footer markers), support tabs in `docs.json`, and root `support.mdx` to the committed tree so output drift is visible as a unified diff.

## Related reading
Expand Down
16 changes: 9 additions & 7 deletions scripts/knowledgebase-nav/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,11 +4,11 @@ A standalone script that regenerates knowledgebase nav pages and updates the `do

The generator reads MDX article files from `support/<product>/articles/`, aggregates them by keyword tags, and:

- **Updates tab-page Badges on articles.** Only `<Badge>` components whose link goes to `/support/<product>/tags/<tag-slug>` are rewritten from `keywords` (order preserved). Managed Badges are wrapped in MDX comment markers (`{/* AUTO-GENERATED: tab badges */}` and `{/* END AUTO-GENERATED: tab badges */}`) so the generator can locate them without regex matching on subsequent runs. Other Badges, prose, and anything outside the markers stay as you wrote them. If a new article has no tab Badges yet, the generator will insert them for you when `keywords` is non-empty.
- **Updates tab-page Badges on articles.** Only `<Badge>` components whose link goes to `/support/<product>/tags/<tag-slug>` are rewritten from `keywords` (order preserved). Managed Badges are wrapped in MDX comment markers — any `{/* ... */}` comment that contains `AUTO-GENERATED: tab badges` anywhere inside it is the start marker; any comment that contains `END AUTO-GENERATED: tab badges` anywhere inside it is the end marker. You can add notes anywhere inside these comments without breaking the generator. Other Badges, prose, and anything outside the markers stay as you wrote them. If a new article has no tab Badges yet, the generator will insert them for you when `keywords` is non-empty.
- **Produces tag pages** at `support/<product>/tags/<tag-slug>.mdx`. Each lists the articles tagged with that keyword as Mintlify Card components.
- **Product index pages** at `support/<product>.mdx`. Each shows a "Featured articles" section (if any) and a "Browse by category" listing of all tags with article counts.
- **Updated `docs.json` navigation.** Hidden support tabs are updated to reflect the current set of tag pages.
- **Updated root `support.mdx`.** The generator replaces the article and tag count lines inside each product `<Card>` (matched by `href="/support/<slug>"`) so the landing page stays in sync with the crawl. Count lines are wrapped in `{/* auto-generated counts */}` and `{/* end auto-generated counts */}` markers so writers can add other content in the Card body. The featured-articles section is also managed between its own markers.
- **Updated root `support.mdx`.** The generator replaces the article and tag count lines inside each product `<Card>` (matched by `href="/support/<slug>"`) so the landing page stays in sync with the crawl. Count lines are wrapped between `{/* AUTO-GENERATED: counts */}` and `{/* END AUTO-GENERATED: counts */}` markers. The featured-articles section is managed between `{/* AUTO-GENERATED: featured articles */}` and `{/* END AUTO-GENERATED: featured articles */}` markers. For all markers: matching is case-insensitive, the colon after "generated" is optional, and the keyword can appear anywhere inside the comment — so you can add notes without breaking the generator.

The generator runs automatically through GitHub Actions (workflow file `.github/workflows/knowledgebase-nav.yml`) when a pull request is opened, updated with new commits, or reopened, and at least one changed file matches `support/**` or `scripts/knowledgebase-nav/**`. You can also run that workflow manually from the Actions tab for previews.

Expand Down Expand Up @@ -43,6 +43,8 @@ If you want to run the generator locally (for example to preview footers and tag

3. Write the article body content after the front matter. You can stop after the last paragraph. When the workflow runs, it updates only the tab-page `<Badge>` links (targets under `/support/<product>/tags/`) to match `keywords`, wrapped in MDX comment markers. You may add a `---` line or other text yourself; anything outside the markers is left alone. The first time tab Badges are needed, the generator appends a blank line, markers, and the Badges at the end of the body (no `---` is added automatically).

The generator recognises any `{/* ... */}` comment that contains `AUTO-GENERATED: tab badges` anywhere inside it as the start marker, and any comment that contains `END AUTO-GENERATED: tab badges` anywhere inside it as the end marker. You can add notes anywhere inside these comments — before the keyword, after it, or both — without breaking the generator. The canonical marker text is always written on output regardless of what was in the original comment.

4. Open a pull request. The workflow checks out your branch, runs the generator, and commits any updates to article footers, tag pages, product index pages, `docs.json`, and `support.mdx` when those files change. You do not need to edit generated files by hand. **Pull requests from forks** still run the generator (so logs show problems), but GitHub cannot push commits back to your fork. Run the generator locally and push the regenerated files, or ask a maintainer to regenerate after merge.

If you remove every keyword from front matter (`keywords: []` or omit the field), the generator removes tab-page Badges only. Other Badges are unchanged.
Expand Down Expand Up @@ -177,17 +179,17 @@ scripts/knowledgebase-nav/

The script runs one pipeline after loading `config.yaml` and Jinja2 templates. The template environment registers a `tojson_unicode` filter for YAML front matter in the MDX templates (Jinja's default `tojson` uses HTML-oriented escapes that this project avoids).

1. **Crawl and parse** (`crawl_articles`, `parse_frontmatter`, `build_tag_index`, `get_featured_articles`): For each product, reads every `.mdx` file in `support/<product>/articles/`, parses YAML front matter (`title`, `keywords`, `featured`), and extracts the article body (everything before the `_BADGE_START` marker). The `keywords` field is normalized with `_normalize_keywords` (YAML list of strings; a single string is coerced to a one-item list with a warning; other shapes warn and become an empty list). Body text is turned into a Card preview with `plain_text` and `extract_body_preview`. The `plain_text` step removes fenced code, horizontal rules, links and image syntax (keeping link labels), autolinks, bare `http(s)` URLs, HTML and MDX or JSX tags and simple `{...}` expressions, emphasis markers, common list or heading prefixes, decodes HTML entities, replaces non-breaking spaces (U+00A0) with a normal space, maps typographic quotes and apostrophes to ASCII, then applies an allowlist of safe characters (including `_` and `=` for identifiers) and collapses whitespace. `extract_body_preview` truncates to 120 characters and appends ` ...` when longer. Unknown `keywords` values warn once per keyword but still get tag pages.
1. **Crawl and parse** (`crawl_articles`, `parse_frontmatter`, `build_tag_index`, `get_featured_articles`): For each product, reads every `.mdx` file in `support/<product>/articles/`, parses YAML front matter (`title`, `keywords`, `featured`), and extracts the article body (everything before the first `_BADGE_START_RE` match). The `keywords` field is normalized with `_normalize_keywords` (YAML list of strings; a single string is coerced to a one-item list with a warning; other shapes warn and become an empty list). Body text is turned into a Card preview with `plain_text` and `extract_body_preview`. The `plain_text` step removes fenced code, horizontal rules, links and image syntax (keeping link labels), autolinks, bare `http(s)` URLs, HTML and MDX or JSX tags and simple `{...}` expressions, emphasis markers, common list or heading prefixes, decodes HTML entities, replaces non-breaking spaces (U+00A0) with a normal space, maps typographic quotes and apostrophes to ASCII, then applies an allowlist of safe characters (including `_` and `=` for identifiers) and collapses whitespace. `extract_body_preview` truncates to 120 characters and appends ` ...` when longer. Unknown `keywords` values warn once per keyword but still get tag pages.

2. **Generate tag pages** (`render_tag_pages`, `cleanup_stale_tag_pages`): For each tag that appears in at least one article, renders `support/<product>/tags/<tag-slug>.mdx` from `support_tag.mdx.j2`. Tags present only in `config.yaml` and not used by any article do not get a file. After writing current pages, `cleanup_stale_tag_pages` deletes any `.mdx` files in the tags directory that no longer correspond to a keyword used by any article, keeping the tags directory and `docs.json` free of stale entries.

3. **Generate product index pages** (`render_product_index`): Renders `support/<product>.mdx` with optional "Featured articles" and a "Browse by category" section from `support_product_index.mdx.j2`.

4. **Sync tab Badges** (`sync_all_support_article_footers`, `sync_support_article_footer`, `build_tab_badges_mdx`, `build_keyword_footer_mdx`): For each `support/<product>/articles/*.mdx` file, replaces managed `<Badge>` links with one Badge per `keywords` entry (in list order). Managed Badges are enclosed in MDX comment markers (`_BADGE_START` / `_BADGE_END`); the generator matches markers when present and falls back to regex matching for articles that predate markers. Other Badges and the rest of the body are not edited. If there are no such Badges yet and `keywords` is non-empty, appends a blank line, markers, and the tab Badges (no `---`). If `keywords` is empty, removes the marker block (or bare tab-page Badges). Runs after tag pages are generated so articles are not modified if earlier phases fail.
4. **Sync tab Badges** (`sync_all_support_article_footers`, `sync_support_article_footer`, `build_tab_badges_mdx`, `build_keyword_footer_mdx`): For each `support/<product>/articles/*.mdx` file, replaces managed `<Badge>` links with one Badge per `keywords` entry (in list order). Managed Badges are located via `_BADGE_START_RE` and `_BADGE_END_RE`: any `{/* ... */}` comment containing `AUTO-GENERATED: tab badges` anywhere inside it is the start; any comment containing `END AUTO-GENERATED: tab badges` anywhere inside it is the end — authors can add notes anywhere in these comments without breaking the generator. The generator falls back to regex matching for articles that predate markers. Other Badges and the rest of the body are not edited. If there are no such Badges yet and `keywords` is non-empty, appends a blank line, canonical markers, and the tab Badges (no `---`). If `keywords` is empty, removes the marker block (or bare tab-page Badges). Runs after tag pages are generated so articles are not modified if earlier phases fail.

5. **Update docs.json** (`update_docs_json`): Reads `docs.json`, finds the English entry (`navigation.languages[]` where `language == "en"`), then finds or creates hidden tabs named `Support: <display_name>`. Each tab's `pages` list is `support/<slug>` followed by sorted tag page paths. Other language entries and non-support tabs are left unchanged.

6. **Update support landing page** (`update_support_index`, `update_support_featured`): Edits the repository root `support.mdx` in place. Count lines inside each product `<Card>` are matched by `{/* auto-generated counts */}` markers (falling back to regex for migration), and replaced with current article and tag counts (including singular or plural labels). The featured-articles section between its own markers is regenerated from articles with `featured: true`.
6. **Update support landing page** (`update_support_index`, `update_support_featured`): Edits the repository root `support.mdx` in place. Count lines inside each product `<Card>` are located via `_COUNTS_START_RE` / `_COUNTS_END_RE` (any `{/* ... */}` comment containing `AUTO-GENERATED: counts` / `END AUTO-GENERATED: counts`, falling back to a bare count-line pattern for migration) and replaced with current article and tag counts (including singular or plural labels). The featured-articles section is regenerated between markers located via `_FEATURED_START_RE` / `_FEATURED_END_RE` (any comment containing `AUTO-GENERATED: featured articles` / `END AUTO-GENERATED: featured articles`). All marker matching is case-insensitive with an optional colon after "generated".

### Running locally

Expand Down Expand Up @@ -232,9 +234,9 @@ If `docs.json` is not found at the repository root the tests resolve to (the par

```
<Card title="W&B NewProduct" href="/support/<slug>" arrow="true" icon="/icons/cropped-newproduct.svg">
{/* auto-generated counts */}
{/* AUTO-GENERATED: counts */}
0 articles &middot; 0 tags
{/* end auto-generated counts */}
{/* END AUTO-GENERATED: counts */}
</Card>
```

Expand Down
Loading
Loading