Skip to content

Latest commit

 

History

History
97 lines (76 loc) · 4.21 KB

File metadata and controls

97 lines (76 loc) · 4.21 KB

Pipeline Documentation

Use the structured pipeline docs when editing pipeline orchestration, calibration steps, local H5 publishing, or reusable library functions that participate in those flows.

Sources Of Truth

  • @pipeline_node attaches object-level metadata. It is a no-op runtime decorator and is extracted statically.
  • docs/pipeline_map.yaml defines the five canonical stages, the detailed 1a_/1b_ substage pathway, cross-stage artifacts, and edges.
  • scripts/extract_pipeline_docs.py merges both sources and writes:
    • docs/generated/pipeline_map.json
    • docs/generated/pipeline_api.json
    • docs/engineering/pipeline-map.md

The generated JSON and Markdown files are published artifacts, not hand-authored source. PRs should update decorators, docstrings, and docs/pipeline_map.yaml, then regenerate the checked-in artifacts in the same change so reviewers see the pipeline docs that will ship. On pushes to main, automation may refresh those artifacts again with the version/changelog commit, but PR authors and AI agents must not rely on that later automation for review correctness.

Any time a PR touches a pipeline documentation segment, a @pipeline_node decorator, Pydoc-facing text that feeds the extractor, or docs/pipeline_map.yaml, regenerate and commit the checked-in docs produced by scripts/extract_pipeline_docs.py. Treat the generated docs as part of the same change, even if the source edit is small.

Annotation Rules

Annotate semantic waypoints, not every private helper. A waypoint is worth a decorator when it is a pipeline entrypoint, a bundled transitional process, a library function whose behavior affects artifacts, a validation seam, or a stable utility that downstream docs should expose.

Keep decorator metadata compact. Put durable API details in the function or class docstring and type signature so the pydoc-style API artifact can consume them. Use decorator fields for graph identity, artifacts, pathways, status, stability, and focused validation commands.

For modules intended as standard pydoc/autodoc targets, declare __all__ with the supported public classes, functions, and type aliases. Keep private helpers undocumented unless they are deliberately promoted into that public surface.

Use stable snake_case id values. Pipeline map substages should use the canonical-stage prefix plus a letter, for example 1a_raw_data_download or 4a_local_area_h5_regional, and should declare canonical_stage_id, legacy_stage_id, and any PR-855-style manifest_step_ids. Keep execution-ledger substage boundaries explicit for regional/national fitting, regional/national H5 builds, base-data staging, diagnostics upload, validation, HuggingFace promotion, GCS promotion, and version-manifest finalization. If a function moves during refactors, keep the ID unless the semantic waypoint changes. If a waypoint is being migrated, set status="transitional" and use migration_target or notes instead of renaming IDs prematurely.

Update Workflow

After adding or changing annotations or docs/pipeline_map.yaml, regenerate the tracked pipeline docs:

uv run --no-sync --with pyyaml python scripts/extract_pipeline_docs.py

If you only need to inspect the generated outputs locally without touching tracked files, write them to a temporary directory:

out_dir="$(mktemp -d)"
uv run --no-sync --with pyyaml python scripts/extract_pipeline_docs.py \
  --json "$out_dir/pipeline_map.json" \
  --api-json "$out_dir/pipeline_api.json" \
  --markdown "$out_dir/pipeline-map.md"

Then run the focused extractor tests:

uv run pytest tests/unit/test_pipeline_docs_extractor.py

Run quality guards before committing pipeline documentation changes:

uv run --no-sync --with pyyaml python scripts/run_quality_guards.py

The pipeline-docs guard checks stage IDs, canonical-stage wiring, edge and node metadata, duplicate decorator IDs, and validation command paths. The pydoc-completeness guard checks stable public library nodes for docstrings, return annotations, and __all__ membership when the source module declares an export list.

If the local platform cannot install the full project environment, use uv run --no-sync --with pyyaml ... for these docs-only commands.