Skip to content

Commit 0d3d6b0

Browse files
authored
Remove docs.json update logic (#2566)
## Summary Stops the knowledgebase-nav pipeline from reading, writing, or parsing `docs.json`. Tag pages, product indexes, article tab-badge sync, and root `support.mdx` behavior are unchanged. When tag pages are added, removed, or renamed (treated as delete + add), the PR report (`pr_report.py`) now emits a **docs.json update required** section that lists the exact Mintlify page ids to add or remove, grouped by product (`Support: <display_name>`), using display names from `scripts/knowledgebase-nav/config.yaml`. ## Motivation Mintlify navigation in `docs.json` is intended to be edited by humans. The generator should not mutate navigation automatically. ## Changes ### `scripts/knowledgebase-nav/generate_tags.py` - Removed `update_docs_json`, `DOCS_JSON_NAV_LANGUAGE`, and the post-loop `docs.json` phase. - Pipeline is now five phases ending at `support.mdx` updates; docstrings and CLI help updated accordingly. ### `scripts/knowledgebase-nav/pr_report.py` - Removed the `docs_json` bucket from categorization and the `- docs.json updated.` bullet. - Added `collect_tag_page_changes()` from `git diff --name-status HEAD` lines: maps `support/<product>/tags/<slug>.mdx` to page ids (strip `.mdx`); handles `A`, `D`, and rename/copy (`R`/`C`) as add + remove. - Added `load_product_display_names()` from `config.yaml`; optional `--config` (default `<repo-root>/scripts/knowledgebase-nav/config.yaml`). - Added `build_docs_json_section()` and wired into `build_report_markdown`; fallback body is skipped when only tag-page add/remove lists are non-empty. ### `.github/workflows/knowledgebase-nav.yml` - Step renamed to **Generate tag pages and product index**; comments describe human `docs.json` edits via PR comment. - Auto-commit: message **`chore: regenerate support tag pages`** (matches `CHORE_SUBJECT` for chore-only detection); **`file_pattern` no longer includes `docs.json`**. ### Tests - **`test_generate_tags.py`**: Removed `TestUpdateDocsJson`; fixture no longer seeds `docs.json`; full pipeline asserts `docs.json` is not created; light regression checks that `update_docs_json` is gone and the module does not path-open `docs.json`. - **`test_golden_output.py`**: Removed golden assertions against `docs.json`; fixture copies only `support/` + `support.mdx`; asserts no `docs.json` appears after the run. - **`test_pr_report.py`**: New coverage for tag-page collection, the docs.json section, display-name fallback, YAML loading edge cases, and updated categorization (e.g. `M docs.json` no longer increments a `docs_json` bucket). ### Docs - **`README.md`** and **`Architecture.md`**: Describe human-managed `docs.json`, PR comment workflow, new-product tab JSON snippet, CI file patterns, troubleshooting, and updated mermaid diagrams (no `GEN --> docs.json`). ## Breaking change / migration **Workflow behavior:** Commits from the Knowledgebase Nav workflow **no longer include `docs.json`**. Authors and reviewers must apply navigation edits manually when tag pages change. **Operational:** PRs that previously relied on the bot to refresh support tabs must instead use the **docs.json update required** block in the PR comment (or edit `docs.json` locally). **Rename an auto-commit subject string:** anything that matched `chore: regenerate support tag pages and docs.json navigation` should use **`chore: regenerate support tag pages`** for chore-only / skip-comment logic. ## How to verify - Run unit + integration tests from repo root: - `python -m pytest scripts/knowledgebase-nav/tests/ -v` - Run the generator locally: - `python scripts/knowledgebase-nav/generate_tags.py --repo-root .` - Confirm `git status` does not show `docs.json` modified. - Smoke-test the report (example): - `python scripts/knowledgebase-nav/pr_report.py --repo-root . --diff-text $'A\tsupport/models/tags/foo.mdx\n' --warnings-file /dev/null` - Confirm **`### docs.json update required`** and grouped **`Support: …`** headings. ## Risk / review focus - **Fork PRs:** Generator still runs; auto-commit remains skipped. Fork contributors must run the generator locally **and** edit `docs.json` when tag pages change (README calls this out). - **Display names:** If `config.yaml` is missing or malformed, headings fall back to **`Support: <slug>`**; confirm that is acceptable for edge cases. ## Commit - `e5067feb6` — Remove docs.json update logic
1 parent ebd7b2f commit 0d3d6b0

8 files changed

Lines changed: 696 additions & 491 deletions

File tree

.github/workflows/knowledgebase-nav.yml

Lines changed: 17 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,13 @@
66
# -----------------------
77
# Runs the Python script scripts/knowledgebase-nav/generate_tags.py against
88
# the repo root. That script syncs keyword footers on support articles,
9-
# rebuilds tag pages and product index MDX, and updates docs.json and the
10-
# root support.mdx counts. See scripts/knowledgebase-nav/README.md and
11-
# Architecture.md for the full pipeline.
9+
# rebuilds tag pages and product index MDX, and updates the root support.mdx
10+
# counts. The generator never edits docs.json. When the set of tag pages
11+
# changes, the PR comment posted by scripts/knowledgebase-nav/pr_report.py
12+
# lists the page ids a human must add to or remove from each
13+
# "Support: <display_name>" tab in docs.json by hand. See
14+
# scripts/knowledgebase-nav/README.md and Architecture.md for the full
15+
# pipeline.
1216
#
1317
# When it runs
1418
# ------------
@@ -101,7 +105,7 @@ jobs:
101105
id: nav_commit
102106
if: github.event_name == 'pull_request'
103107
run: |
104-
CHORE_SUBJECT="chore: regenerate support tag pages and docs.json navigation"
108+
CHORE_SUBJECT="chore: regenerate support tag pages"
105109
FIRST="$(git log -1 --pretty=%B | head -n1)"
106110
if [ "${FIRST}" = "${CHORE_SUBJECT}" ]; then
107111
echo "is_auto_nav_commit=true" >> "${GITHUB_OUTPUT}"
@@ -128,9 +132,10 @@ jobs:
128132
- name: Install dependencies
129133
run: pip install -r scripts/knowledgebase-nav/requirements.txt
130134

131-
# Writes into the working tree (support/, docs.json, support.mdx, etc.).
132-
# stderr (Python warnings) is captured for the PR comment steps below.
133-
- name: Generate tag pages and update docs.json
135+
# Writes into the working tree (support/, support.mdx, etc.). Never
136+
# edits docs.json. stderr (Python warnings) is captured for the PR
137+
# comment steps below.
138+
- name: Generate tag pages and product index
134139
run: python scripts/knowledgebase-nav/generate_tags.py --repo-root . 2>generator-warnings.log
135140

136141
- name: Show generator warnings in log
@@ -195,11 +200,12 @@ jobs:
195200
echo "::notice::Fork PR: auto-commit is skipped. Push regenerated files from scripts/knowledgebase-nav (see README) or a maintainer can run the generator after merge."
196201
197202
# Commits only if the generator changed files; file_pattern limits
198-
# which paths are staged. Skipped for fork PRs and unnecessary for
199-
# pushes with no diff (action is a no-op).
203+
# which paths are staged. docs.json is excluded so generator runs never
204+
# mutate it. Skipped for fork PRs and unnecessary for pushes with no
205+
# diff (action is a no-op).
200206
- name: Commit generated changes
201207
if: ${{ github.event_name == 'workflow_dispatch' || (github.event_name == 'pull_request' && github.event.pull_request.head.repo.fork == false) }}
202208
uses: stefanzweifel/git-auto-commit-action@v7
203209
with:
204-
commit_message: "chore: regenerate support tag pages and docs.json navigation"
205-
file_pattern: "support.mdx support/*/articles/*.mdx support/*/tags/*.mdx support/*.mdx docs.json"
210+
commit_message: "chore: regenerate support tag pages"
211+
file_pattern: "support.mdx support/*/articles/*.mdx support/*/tags/*.mdx support/*.mdx"

scripts/knowledgebase-nav/Architecture.md

Lines changed: 21 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ This document describes the **Knowledgebase Nav** system in the `wandb-docs` rep
44

55
## Purpose
66

7-
The generator keeps support (knowledgebase) navigation consistent with article content. It runs over configured products (for example models, weave, inference), reads MDX articles under `support/<product>/articles/`, and updates generated MDX pages, root `support.mdx` counts, and English support tabs in `docs.json`.
7+
The generator keeps support (knowledgebase) navigation consistent with article content. It runs over configured products (for example models, weave, inference), reads MDX articles under `support/<product>/articles/`, and updates generated MDX pages and root `support.mdx` counts. The generator never reads or writes `docs.json`; humans edit that file by hand based on the workflow's PR comment.
88

99
## High-level context
1010

@@ -19,40 +19,41 @@ flowchart LR
1919
GEN["generate_tags.py"]
2020
OUT1["support/*/tags/*.mdx"]
2121
OUT2["support/<product>.mdx"]
22-
DJ["docs.json"]
2322
SM["support.mdx"]
2423
end
2524
CFG --> GEN
2625
TPL --> GEN
2726
ART --> GEN
2827
GEN --> OUT1
2928
GEN --> OUT2
30-
GEN --> DJ
3129
GEN --> SM
3230
GEN --> ART
3331
```
3432

3533
The arrow back to **articles** means phase 4 updates only `<Badge>` links that point at tag pages under `/support/<product>/tags/`, wrapped in MDX comment markers. Other content (including `---`, other Badges, and text outside the markers) is not rewritten.
3634

35+
`docs.json` is intentionally absent from this diagram. When tag pages are added or removed, the workflow's PR comment (built by `pr_report.py`) lists the page ids that a human must add to or remove from the matching `Support: <display_name>` tab in `docs.json` by hand.
36+
3737
## Automation workflow
3838

39-
Pull requests trigger the **Knowledgebase Nav** workflow when files under `support/**` or `scripts/knowledgebase-nav/**` change (including new pushes to an open PR). It installs Python dependencies, runs the generator, and commits matching paths when there are diffs. Pull requests from **forks** check out the fork head commit and still run the generator, but the auto-commit step is skipped because the default token cannot push to forks.
39+
Pull requests trigger the **Knowledgebase Nav** workflow when files under `support/**` or `scripts/knowledgebase-nav/**` change (including new pushes to an open PR). It installs Python dependencies, runs the generator, posts a PR comment with any "docs.json update required" instructions, and commits matching paths when there are diffs. Pull requests from **forks** check out the fork head commit and still run the generator, but the auto-commit step is skipped because the default token cannot push to forks.
4040

4141
```mermaid
4242
flowchart TD
4343
A[PR or manual workflow_dispatch] --> B[Checkout ref]
4444
B --> C[Python 3.11 + pip install requirements.txt]
4545
C --> D["generate_tags.py --repo-root ."]
46-
D --> E{Files changed?}
46+
D --> R["pr_report.py (lists tag-page adds/removes)"]
47+
R --> E{Files changed?}
4748
E -->|yes| F[git-auto-commit selected paths]
4849
E -->|no| G[No commit]
4950
```
5051

51-
Committed path patterns include `support.mdx`, `support/*/articles/*.mdx`, `support/*/tags/*.mdx`, `support/*.mdx` (product indexes), and `docs.json`.
52+
Committed path patterns include `support.mdx`, `support/*/articles/*.mdx`, `support/*/tags/*.mdx`, and `support/*.mdx` (product indexes). `docs.json` is intentionally excluded; humans update it manually.
5253

5354
## Pipeline orchestration
5455

55-
`run_pipeline(repo_root, config_path)` is the single entry point used by the CLI and tests. It loads `config.yaml`, builds one Jinja2 environment for all products, then loops each product. After the loop it updates `docs.json` once and `support.mdx` once.
56+
`run_pipeline(repo_root, config_path)` is the single entry point used by the CLI and tests. It loads `config.yaml`, builds one Jinja2 environment for all products, then loops each product. After the loop it updates `support.mdx` once. It does not touch `docs.json`.
5657

5758
```mermaid
5859
flowchart TD
@@ -67,10 +68,9 @@ flowchart TD
6768
P4 --> P5[sync_all_support_article_footers]
6869
P5 --> P6[Record product_stats]
6970
P6 --> LOOP
70-
LOOP -->|done| P7[update_docs_json]
71-
P7 --> P8[update_support_index]
72-
P8 --> P9[update_support_featured]
73-
P9 --> DONE([Done])
71+
LOOP -->|done| P7[update_support_index]
72+
P7 --> P8[update_support_featured]
73+
P8 --> DONE([Done])
7474
```
7575

7676
## Per-product data flow
@@ -102,18 +102,19 @@ flowchart LR
102102
PATHS --> TAGS
103103
```
104104

105-
`render_tag_pages` returns sorted page id strings (for example `support/models/tags/security`) that `update_docs_json` merges into the English navigation tab for that product.
105+
`render_tag_pages` returns sorted page id strings (for example `support/models/tags/security`). `pr_report.py` consumes the same ids when it builds the "docs.json update required" section in the workflow's PR comment so a human can update the matching `Support: <display_name>` tab in `docs.json`.
106106

107107
## Components and files
108108

109109
| Component | Path | Role |
110110
|-----------|------|------|
111-
| CLI and logic | `generate_tags.py` | All phases, parsing, slug rules, previews, JSON and MDX rewrites |
111+
| CLI and logic | `generate_tags.py` | All phases, parsing, slug rules, previews, MDX rewrites (does not touch `docs.json`) |
112+
| PR report | `pr_report.py` | Markdown report from `git diff`; lists added/removed tag pages so a human can update `docs.json` |
112113
| Product and tag registry | `config.yaml` | `slug`, `display_name`, `allowed_keywords` per product |
113114
| Tag listing template | `templates/support_tag.mdx.j2` | One Card per article on a tag page |
114115
| Product hub template | `templates/support_product_index.mdx.j2` | Featured section and browse-by-category Cards |
115116
| Dependencies | `requirements.txt` | PyYAML, Jinja2 |
116-
| Unit tests | `tests/test_generate_tags.py` | Mocked filesystem and `docs.json` |
117+
| Unit tests | `tests/test_generate_tags.py` | Mocked filesystem |
117118
| Integration tests | `tests/test_golden_output.py` | Full pipeline on a temp copy of the real repo |
118119
| Pytest markers | `tests/conftest.py` | Registers the `integration` marker for the golden suite |
119120
| CI | `.github/workflows/knowledgebase-nav.yml` | Triggers, run script, auto-commit |
@@ -158,23 +159,23 @@ Functions are grouped below the way they appear in the source file. Names refer
158159

159160
- **`tojson_unicode`**, **`create_template_env`** configure Jinja2 for MDX (templates use the `tojson_unicode` filter for YAML front matter values).
160161
- **`render_tag_pages`** writes `support/<product>/tags/<tag-slug>.mdx`.
161-
- **`cleanup_stale_tag_pages`** deletes `.mdx` files in the tags directory that were not just generated, keeping the directory and `docs.json` free of stale entries.
162+
- **`cleanup_stale_tag_pages`** deletes `.mdx` files in the tags directory that were not just generated, keeping the tags directory free of stale entries.
162163
- **`render_product_index`** writes `support/<product>.mdx`.
163164

164165
### Site-wide updates
165166

166-
- **`update_docs_json`** updates or creates hidden `Support: <display_name>` tabs under `navigation.languages` where `language` is `en`, setting `pages` to the product index plus sorted tag paths.
167167
- **`update_support_index`** updates count lines on product Cards in root `support.mdx`. Locates markers via `_COUNTS_START_RE` / `_COUNTS_END_RE`; falls back to a bare count-line pattern for migration.
168168
- **`update_support_featured`** regenerates the featured-articles section in root `support.mdx`, locating the block via `_FEATURED_START_RE` / `_FEATURED_END_RE`.
169169

170+
The pipeline does not edit `docs.json`. Tag-page additions and removals are surfaced to humans through `pr_report.py`, which lists the affected page ids in the workflow's PR comment.
171+
170172
### CLI
171173

172174
- **`main`** parses `--repo-root` and optional `--config`, then calls **`run_pipeline`**.
173175

174176
## Constants
175177

176178
- **`BODY_PREVIEW_MAX_LENGTH`** and **`BODY_PREVIEW_SUFFIX`** control Card preview length and ellipsis.
177-
- **`DOCS_JSON_NAV_LANGUAGE`** is `"en"` and scopes navigation edits to the English tree only.
178179
- **`_make_markers(keyword)`** generates the four constants below for each managed section: canonical start/end strings for writing and compiled `re.Pattern` objects for reading.
179180
- **`_BADGE_START`** / **`_BADGE_END`** — canonical `{/* AUTO-GENERATED: tab badges */}` strings written to article files. **`_BADGE_START_RE`** / **`_BADGE_END_RE`** — patterns used to locate the block (case-insensitive, colon optional, keyword anywhere in the comment).
180181
- **`_COUNTS_START`** / **`_COUNTS_END`** — canonical `{/* AUTO-GENERATED: counts */}` strings written to `support.mdx`. **`_COUNTS_START_RE`** / **`_COUNTS_END_RE`** — patterns used inside the Card-anchored structural pattern that locates and replaces count lines.
@@ -185,9 +186,10 @@ Functions are grouped below the way they appear in the source file. Names refer
185186
- **Monolithic script**: one file holds all logic so the workflow and contributors have a single place to read and change behavior.
186187
- **Allowed keywords**: `config.yaml` lists valid tags per product; unknown tags still generate pages but emit warnings so content is never dropped silently.
187188
- **Tab Badge ownership**: only `<Badge>` elements linking to `/support/<product>/tags/...` are derived from `keywords`. These are wrapped in marker comments located by `_BADGE_START_RE` / `_BADGE_END_RE`. The `---` line between body and badges is cosmetic; `_extract_body` uses `_BADGE_START_RE` as the boundary and trims a trailing `---` only as cleanup.
188-
- **Stale tag cleanup**: tag pages that no longer correspond to any article keyword are deleted after generation, before `docs.json` is updated. This keeps the tags directory and navigation free of orphaned entries.
189+
- **Stale tag cleanup**: tag pages that no longer correspond to any article keyword are deleted after generation. This keeps the tags directory free of orphaned entries; the workflow's PR comment then asks a human to remove the matching entries from `docs.json`.
189190
- **Marker-based editing**: all auto-generated sections (article tab Badges, `support.mdx` count lines, and featured articles) use MDX comment markers generated by `_make_markers`. Matching is case-insensitive with an optional colon, and the keyword can appear anywhere inside the comment, so authors can freely annotate markers without breaking the generator. Each marker pair has a migration path that wraps bare content on first run.
190-
- **Golden tests**: compare generated tag pages, product index pages, article files (including footer markers), support tabs in `docs.json`, and root `support.mdx` to the committed tree so output drift is visible as a unified diff.
191+
- **`docs.json` is human-edited**: the generator never reads or writes `docs.json`. Tag page additions and removals are surfaced through `pr_report.py`, which lists page ids grouped by `Support: <display_name>` so a human can update the matching tab by hand.
192+
- **Golden tests**: compare generated tag pages, product index pages, article files (including footer markers), and root `support.mdx` to the committed tree so output drift is visible as a unified diff. The golden suite also asserts that `docs.json` is never produced in the temp tree.
191193

192194
## Related reading
193195

0 commit comments

Comments
 (0)