Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion scripts/knowledgebase-nav/Architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,11 +141,13 @@ Functions are grouped below the way they appear in the source file. Names refer

- **`plain_text`** strips Markdown (including horizontal rules), links, URLs, HTML or MDX tags, and similar so previews stay plain text (U+00A0 to space after entity decode, typographic quotes mapped to ASCII, allowlist keeps `_` and `=` for identifiers).
- **`extract_body_preview`** applies `plain_text`, truncates to `BODY_PREVIEW_MAX_LENGTH`, and adds `BODY_PREVIEW_SUFFIX` when needed.
- **`_card_text_from_frontmatter_field`** extracts a usable string from a single front matter key (`docengineDescription` or `description`): returns `None` when the field is missing, not a string, or empty after processing. Processing strips one outer pair of wrapping quotes and collapses internal newlines to a single space.
- **`resolve_body_preview`** resolves the Card preview text using a three-level hierarchy: `docengineDescription` first, then `description`, then `extract_body_preview(body)`. Frontmatter overrides are not passed through `plain_text` or truncation.

### Slugs and crawling

- **`tag_slug`** maps a display keyword to a filename or URL segment (lowercase, hyphenated).
- **`crawl_articles`** walks `support/<slug>/articles/*.mdx` and builds article dicts (`title`, `keywords`, `featured`, `body_preview`, `page_path`, `tag_links`, and others).
- **`crawl_articles`** walks `support/<slug>/articles/*.mdx` and builds article dicts (`title`, `keywords`, `featured`, `body_preview`, `page_path`, `tag_links`, and others). The `body_preview` field is resolved by `resolve_body_preview` from `docengineDescription`, `description`, or the article body.

### Tag aggregation and featured content

Expand Down
49 changes: 44 additions & 5 deletions scripts/knowledgebase-nav/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,13 +98,52 @@ Featured articles appear in the "Featured articles" section at the top of the pr

3. There is no hard limit on how many articles can be featured per product, but we recommend keeping it to 3-5 for a clean layout.

### Customizing Card preview text

By default, the generator creates Card preview text by stripping Markdown and MDX from the article body and truncating to 120 characters. You can override this with front matter fields so the Card shows exactly the text you want.

The generator resolves preview text using a three-level hierarchy:

1. **`docengineDescription`** (highest priority). Use this when you want to control the Card preview independently of SEO. This field is not used by Mintlify for anything else.
2. **`description`**. If `docengineDescription` is not set, the generator uses this field. Note that Mintlify also renders `description` as the page's `<meta name="description">` tag for search engines. Setting it affects both the Card preview and the SEO metadata. Use `docengineDescription` instead when you want the Card text to differ from the SEO description.
3. **Auto-generated body snippet**. If neither field is set, the generator falls back to the existing behavior: convert the article body to plain text and truncate to 120 characters.

The Card preview appears in three places:

- Tag page Cards (for example, `support/models/tags/experiments.mdx`).
- Featured article Cards on the product index page (for example, `support/models.mdx`).
- Featured article Cards on the root support landing page (`support.mdx`).

**Processing rules.** When using `docengineDescription` or `description`, the generator applies only minimal processing:

- Outer wrapping quotes (`"` or `'`) are stripped. YAML sometimes preserves them depending on how you quote the value.
- Internal newlines are collapsed to a single space. YAML block scalars (`|`, `>`) can produce multiline strings, but Card bodies must be single-line. If you need precise control over the text, use a single-line quoted string in front matter.
- No other processing is applied. The value is not passed through Markdown stripping, HTML entity decoding, or truncation.

**MDX safety.** Override text is emitted directly inside `<Card>` components without sanitization. Avoid characters or strings that break MDX parsing, such as unmatched `<`, raw `</Card>`, or unescaped `{`.

**Example:**

```yaml
---
title: "How do I reset my API key?"
keywords: ["Security", "Administrator"]
docengineDescription: "Step-by-step instructions for resetting your W&B API key from the user settings page."
description: "Reset your W&B API key."
---
```

In this example, the Card preview shows the `docengineDescription` value. The `description` value is used only by Mintlify for SEO. If `docengineDescription` were removed, the Card preview would show the `description` value instead.

### Front matter quick reference

| Field | Required | Default | Description |
|------------|----------|-----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `title` | Expected | `""` if omitted | Article title. Used in Cards and listings. The generator does not fail if it is missing. |
| `keywords` | Expected | `[]` if omitted | YAML list of tag names. Each should match an entry in `config.yaml` (case-sensitive). Controls which tag pages the article appears on. If the list is empty or missing, the article is not listed under any tag. If you accidentally use a single string (for example `keywords: "Security"`), the generator treats it as one tag and emits a warning. Other non-list types are ignored with a warning. |
| `featured` | No | `false` | Set to `true` to feature the article on the product index page. |
| Field | Required | Default | Description |
|------------------------|----------|-----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `title` | Expected | `""` if omitted | Article title. Used in Cards and listings. The generator does not fail if it is missing. |
| `keywords` | Expected | `[]` if omitted | YAML list of tag names. Each should match an entry in `config.yaml` (case-sensitive). Controls which tag pages the article appears on. If the list is empty or missing, the article is not listed under any tag. If you accidentally use a single string (for example `keywords: "Security"`), the generator treats it as one tag and emits a warning. Other non-list types are ignored with a warning. |
| `featured` | No | `false` | Set to `true` to feature the article on the product index page. |
| `docengineDescription` | No | `""` if omitted | Card preview text for tag pages, featured sections, and the support landing page. Takes priority over `description` and the auto-generated body snippet. Use this when you want to set Card text independently of the SEO description. Outer wrapping quotes are stripped and newlines are collapsed to a single space. |
| `description` | No | `""` if omitted | Card preview text if `docengineDescription` is not set. Mintlify also uses this field for the page's `<meta name="description">` SEO tag, so setting it affects both the Card preview and search engine metadata. Use `docengineDescription` to decouple the two. Same processing rules: outer quotes stripped, newlines collapsed. |

### Running the generator locally

Expand Down
112 changes: 110 additions & 2 deletions scripts/knowledgebase-nav/generate_tags.py
Original file line number Diff line number Diff line change
Expand Up @@ -482,6 +482,108 @@ def extract_body_preview(body: str, max_len: int = BODY_PREVIEW_MAX_LENGTH) -> s
return text


def _card_text_from_frontmatter_field(
frontmatter: Dict[str, Any],
key: str,
) -> Optional[str]:
"""
Extract a usable Card preview string from a single front matter field.

Returns ``None`` when the field is missing, not a string, or empty
after processing so the caller can fall through to the next candidate.

Processing (applied only to ``str`` values):

1. Remove one outer pair of wrapping quotes when the first and last
characters are both ``"`` or both ``'``.
2. Collapse internal newlines (and surrounding whitespace) to a
single space so the value is safe for single-line Card bodies.

No other transformation (no ``plain_text``, no truncation) is applied.

Parameters
----------
frontmatter : dict
Parsed YAML front matter for the article.
key : str
The front matter key to read (for example ``"docengineDescription"``
or ``"description"``).

Returns
-------
str or None
The processed string, or ``None`` if the field is absent, not a
string, or empty after processing.
"""
value = frontmatter.get(key)
if value is None:
return None
if not isinstance(value, str):
return None

# Strip one outer pair of matching quotes.
if len(value) >= 2 and value[0] == value[-1] and value[0] in ('"', "'"):
value = value[1:-1]

# Collapse newlines (and surrounding whitespace) to a single space.
value = re.sub(r"\s*\n\s*", " ", value)

return value if value else None


def resolve_body_preview(
frontmatter: Dict[str, Any],
body: str,
) -> str:
"""
Resolve the Card preview text for an article.

Uses a three-level hierarchy:

1. ``docengineDescription`` front matter field (highest priority).
Lets writers control the Card preview independently of SEO.
2. ``description`` front matter field. Mintlify also renders this as
the page's ``<meta name="description">`` tag, so setting it
affects both the Card text and SEO metadata.
3. Auto-generated body snippet via ``extract_body_preview`` (existing
behavior: ``plain_text`` conversion and 120-character truncation).

For levels 1 and 2, only outer wrapping quotes are stripped and
internal newlines are collapsed to a single space. No other
processing is applied.

Parameters
----------
frontmatter : dict
Parsed YAML front matter for the article.
body : str
The raw article body text (Markdown/MDX content).

Returns
-------
str
The Card preview string.

Example
-------
>>> resolve_body_preview({"docengineDescription": "Custom text."}, "Body.")
'Custom text.'
>>> resolve_body_preview({"description": "SEO and card."}, "Body.")
'SEO and card.'
>>> resolve_body_preview({}, "Body content here.")
'Body content here.'
"""
text = _card_text_from_frontmatter_field(frontmatter, "docengineDescription")
if text is not None:
return text

text = _card_text_from_frontmatter_field(frontmatter, "description")
if text is not None:
return text

return extract_body_preview(body)


# ---------------------------------------------------------------------------
# Slug generation
# ---------------------------------------------------------------------------
Expand Down Expand Up @@ -817,7 +919,13 @@ def crawl_articles(repo_root: Path, product_slug: str) -> List[Dict[str, Any]]:
``_normalize_keywords`` (see that function for string and type
coercion).
- ``featured`` (bool): Whether the article has ``featured: true``.
- ``body_preview`` (str): Truncated plain-text preview of the body.
- ``body_preview`` (str): Card preview text, resolved by
``resolve_body_preview``: uses ``docengineDescription`` front
matter if present, then ``description``, then falls back to
``extract_body_preview(body)`` (plain-text conversion and
120-character truncation). Frontmatter overrides are not
passed through ``plain_text`` or truncation; only outer
wrapping quotes are stripped and newlines are collapsed.
- ``page_path`` (str): The URL path without leading slash
(for example ``support/models/articles/my-article``).
- ``mdx_path`` (str): Repo-relative path to the MDX file using forward
Expand Down Expand Up @@ -854,7 +962,7 @@ def crawl_articles(repo_root: Path, product_slug: str) -> List[Dict[str, Any]]:
raw_keywords = []
keywords = _normalize_keywords(raw_keywords, mdx_file)
featured = frontmatter.get("featured", False)
body_preview = extract_body_preview(body)
body_preview = resolve_body_preview(frontmatter, body)
file_stem = mdx_file.stem

# Build Badge link data for each keyword so templates can render
Expand Down
Loading
Loading