Skip to content

docs: restructure site, add changelogs, improve content#6098

Merged
junpuf merged 15 commits into
mainfrom
docs/restructure-site-and-content
May 20, 2026
Merged

docs: restructure site, add changelogs, improve content#6098
junpuf merged 15 commits into
mainfrom
docs/restructure-site-and-content

Conversation

@junpuf
Copy link
Copy Markdown
Contributor

@junpuf junpuf commented May 15, 2026

Summary

Major documentation site restructure for clarity, navigation, and content quality.

Site Structure

  • Nav bar: Home → User Guide → Blog Posts → Resources
  • User Guide: vLLM, vLLM-Omni, Ray, PyTorch, Base — each multi-page with sidebar nav (Overview, Supported Models, Deployment, Configuration, Changelog)
  • Resources: Reference (Image Access, Available Images, Region Availability, Support Policy, Release Notifications) + Security

Home Page

  • New landing page: DLC intro, quick start example, use-case cards (3 + 2 grid), footer links
  • Replaces previous README.md copy
  • New "Build Your Own Image" card pointing to Base guide

Framework Pages

  • vLLM, vLLM-Omni, Ray: split monolithic pages into focused sub-pages with changelogs
  • PyTorch: new guide for the AL2023 PyTorch training image (overview + EC2/SageMaker deployment + changelog)
  • Base: new guide for the lightweight CUDA + Python base images (overview + changelog, covers both v1 / CUDA 12.9 and v2 / CUDA 13.0)
  • Updated SageMaker deployment docs with standard-supervisor features (PR feat: wire vLLM SageMaker entrypoint to standard-supervisor #6044)
  • Added "What's Included" sections, default port columns, model coverage labels (Smoke / Benchmark / Smoke + Benchmark)
  • Documented SageMaker model resolution order, S3 model loading via runai-streamer, OpenAI-compatible API endpoints

Release-Notes Pipeline Removal

The auto-generated docs/releasenotes/ pages were not wired into the site nav and had no inbound links — effectively dead surface. The manually-maintained User Guide changelogs now own image history.

  • Deleted docs/releasenotes/ tree
  • Removed generate_release_notes() and helpers from docs/src/generate.py
  • Removed release-notes templates, table config, and tests
  • Stripped announcements: and packages: from 91 data YAMLs (no longer consumed)
  • Trimmed scripts/autocurrency/docs-pr.sh to drop docker-image introspection and the dead announcements:/packages: emit
  • Removed docs_packages: from .github/config/autocurrency-tracker.yml
  • Updated test_generate.py to drop the release-notes mock

Reference

  • Separated Region Availability into its own page
  • Trimmed Image Access to essentials
  • Simplified Release Notifications

Fact-Check Findings Fixed

  • RAYSERVE_NUM_GPUS was a phantom variable in Ray deployment docs (no script reads it) — removed
  • CodeArtifact (CA_REPOSITORY_ARN) was incorrectly implied to work on Ray EC2 image — corrected to SageMaker-only

CI Hardening

  • mkdocs buildmkdocs build --strict in docs-test.yml. PRs that introduce broken internal links, missing nav targets, broken anchors, or page conflicts will now fail CI (existing test_links.py covered .md link targets but not anchors).
  • Added /README.md, /tutorials/README.md, /DEVELOPMENT.md to mkdocs.yaml exclude_docs so contributor docs don't conflict with the published site.
  • Sidebar nav section titles darkened for readability.

Test plan

  • mkdocs build --strict passes locally
  • All 74 docs tests pass (pytest test/docs/)
  • pre-commit run --all-files passes (except actionlint which needs network in the local env)
  • All pages return 200 (verified locally)
  • Autocurrency unit tests: same pre-existing baseline (no new regressions)

Redesign the documentation site structure and content for clarity:

Site structure:
- Top nav: Home, User Guide, Blog Posts, Resources
- User Guide: vLLM, vLLM-Omni, Ray (each multi-page with sidebar nav)
- Resources: Reference (Image Access, Available Images, Region Availability,
  Support Policy, Release Notifications) + Security

Home page:
- New landing page with DLC intro, quick start example, use-case cards
- Replaces the previous README copy

Framework pages (vLLM, vLLM-Omni, Ray):
- Split monolithic pages into Overview, Supported Models, Deployment
  (EC2/EKS/SageMaker), Configuration, and Changelog sub-pages
- Add changelogs with real release content from PRs
- Remove auto-generated vllm-server release notes (covered by changelog)
- Update SageMaker docs with standard-supervisor features

Reference:
- Separate Region Availability into its own page
- Trim Image Access to essentials
- Simplify Release Notifications

Tests:
- Update test_generate_available_images.py for removed Region Availability
  section from available_images template
junpuf added 8 commits May 15, 2026 07:26
User Guide additions:
- Add Base image guide (overview + changelog) under docs/base/
- Add PyTorch image guide (overview + EC2/SageMaker deployment + changelog) under docs/pytorch/
- Wire both into top-level User Guide nav
- Expand vLLM/vLLM-Omni/Ray content (What's Included, API endpoints,
  port columns, model coverage labels, fact-check fixes)

Release-notes pipeline removal:
- Delete docs/releasenotes/ tree (output had no nav entry, dead surface)
- Remove generate_release_notes() and helpers from docs/src/generate.py
- Remove release-notes templates, table config, tests
- Strip announcements/packages from 91 data YAMLs
- Trim scripts/autocurrency/docs-pr.sh to drop docker introspection
  and announcement/packages emission; update autocurrency-tracker.yml

Net: User Guide changelogs are the single source of truth; ~2100 lines removed.
- CI: switch docs-test.yml to `mkdocs build --strict`. Catches broken
  internal links, missing nav targets, broken anchors, and orphan-page
  warnings that the existing test_links.py tests don't cover (anchors
  in particular).
- Fix pre-existing strict-mode warnings:
  - Add /README.md, /tutorials/README.md, /DEVELOPMENT.md to mkdocs.yaml
    exclude_docs (these conflicted with index.md or were not in nav).
    Anchored with leading / so per-tutorial README files still build.
  - Remove broken `available_images.md#tensorflow-training` anchor link
    from the home page (TensorFlow Training section was previously
    removed from the available_images table).
- Sidebar: darken section titles ("User Guide", "vLLM", etc.) to pure
  black/white in light/dark mode for readability.
- Home page: switch use-case grid to 3 columns (3 + 2) to accommodate
  the new "Build Your Own Image" Base card.
- vLLM Inference → LLM Serving using vLLM DLC
- vLLM-Omni Inference → Multimodal Serving using vLLM-Omni DLC
- Ray Serve Inference → ML Serving using Ray DLC
- PyTorch Training → ML Training using PyTorch DLC
- Base Inference → Build Custom Images using Base DLC

Sidebar nav labels (vLLM / vLLM-Omni / Ray / PyTorch / Base) are
unchanged — only the H1 title on each guide's overview page is updated.
… rendering

- Each guide overview (vLLM, vLLM-Omni, Ray, PyTorch, Base) now points to
  its respective ECR Public Gallery page next to the existing Image Access
  reference.
- Ray Example Deployments table: drop inline-code wrapping on path links so
  they render as plain links (the code-block background made the text hard
  to read against the link color).
…-and-content

# Conflicts:
#	scripts/autocurrency/docs-pr.sh
Copy link
Copy Markdown
Contributor

@Eren-Jeager123 Eren-Jeager123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

autocurrency/docspr part lgtm

Comment thread examples/ray/sagemaker/deploy_direct_app.py Outdated
Per review feedback: while the DLC image doesn't read RAYSERVE_NUM_GPUS,
the mnist-direct-app example's deployment.py uses it to parameterize
ray_actor_options.num_gpus. Restore the env var in the example, with a
comment clarifying it's a user-side convention rather than a DLC contract.

Also extend docs/ray/deployment/ec2.md Direct App Import section to call
out this pattern: env vars consumed by the user's deployment.py are valid;
they're just not defined by the DLC.
Comment thread docs/ray/deployment/sagemaker.md Outdated
Comment thread docs/vllm-omni/index.md Outdated
Per #6098 review feedback: RAYSERVE_BACKEND_URL is an internal default
in the SageMaker adapter (always 127.0.0.1:8000 on the DLC) added in #5704
explicitly marked internal in code. Customers have no supported reason
to override it, so it shouldn't appear in the user-facing env-var table.
Comment thread docs/vllm-omni/configuration.md
Comment thread docs/vllm/deployment/sagemaker.md
Comment thread docs/vllm/configuration.md
Comment thread docs/vllm/index.md Outdated
Copy link
Copy Markdown
Member

@sirutBuasai sirutBuasai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A small thing is that release notes used to have the main package version such as python version, cuda version, pytorch version, etc. Now the new changelogs doesn't display which version or commit it comes from but rather only the main package (eg: vllm) source commit.

Another small nit is that we should try to use the variables such as EC2/ECS/EKS/SageMaker variables from global.yml as much as possible to keep everything in the docs standard and any future changes are easy to make

junpuf added 4 commits May 20, 2026 00:49
Base images use nvidia/cuda:*-{base,runtime,devel}-amzn2023, not the
-cudnn flavors. cuDNN is not installed in v1 or v2.
- Use global.yml variables ({{ ec2_short }}, {{ eks_short }}, {{ sm_short }},
  {{ sagemaker }}) for AWS service names in guide pages and docs/index.md,
  user_guide/index.md instead of hardcoded "EC2", "EKS", "SageMaker",
  "Amazon SageMaker AI" so future renames update everywhere
- Add "Bundled versions" line per release in docs/vllm/changelog/index.md
  (CUDA, Python, FlashInfer, DeepEP) so the changelog conveys per-release
  framework state, matching the existing PyTorch/Ray changelog format
@junpuf
Copy link
Copy Markdown
Contributor Author

junpuf commented May 20, 2026

@sirutBuasai thanks for the review. Both points addressed in 6fcc8a0e:

1. Per-release version info on changelogs. PyTorch and Ray changelogs already include per-release framework versions; only the vLLM changelog was light. Added a **Bundled versions:** CUDA · Python · FlashInfer · DeepEP line to each vLLM entry (v1.0, v1.1, v1.2, v1.3), plus the wheel-version tag (e.g., 0.20.0.dev361+amzn2023.3f5bd482) inline with the source-commit link. Sourced from docker/vllm/versions.env at each release commit.

2. Use global.yml variables for service names. Converted hardcoded EC2, EKS, SageMaker, Amazon SageMaker AI literals to {{ ec2_short }} / {{ eks_short }} / {{ sm_short }} / {{ sagemaker }} across:

  • docs/{vllm,vllm-omni,ray,pytorch}/index.md
  • docs/index.md (landing page cards + walkthrough line)
  • docs/user_guide/index.md

Image tag URLs (*-sagemaker-cuda), file paths, and code blocks left literal since they're identifiers, not service names.

One follow-up thought worth raising: it's worth weighing whether this level of templating is worth the effort across the doc tree. Writing EC2 is meaningfully easier to read and write than {{ ec2_short }}, well-known acronyms like EC2 / EKS / SageMaker are unlikely to be mistyped by humans or coding agents. Variables make a clear difference for complex / evolving strings (full product names, version-pinned identifiers, paths) where a future rename should propagate automatically. For two-letter service abbreviations the cost-benefit is closer — the consistency win is real but the maintenance / readability cost is non-trivial.

@junpuf junpuf merged commit a913958 into main May 20, 2026
8 checks passed
@junpuf junpuf deleted the docs/restructure-site-and-content branch May 20, 2026 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants