Skip to content

Latest commit

 

History

History
133 lines (100 loc) · 5.95 KB

File metadata and controls

133 lines (100 loc) · 5.95 KB

Pipeline Architecture

Visibility into the automation that builds and publishes the third-party module catalogue helps contributors reason about changes and spot failure points early. This document summarizes the current canonical pipeline and the parts of the broader architecture that are still future-facing.

Current State (March 2026)

The supported production pipeline is orchestrated via node scripts/orchestrator/index.ts run full-refresh-parallel (also exposed as node --run all). The orchestrator now drives four registered stages across three operational phases: metadata collection, parallel module processing, and publication.

Stage Overview

Order Stage ID Key Outputs
1 collect-metadata in-memory metadata payload, gitHubData.json
2 parallel-processing in-memory stage-5 payload, modules/, modules_temp/, website/images/
3 aggregate-catalogue modules.json, modules.min.json, stats.json
4 generate-result-markdown result.md

Current Workflow Diagram

flowchart TB
  orchestrator[[Orchestrator<br>4-stage execution]]

  subgraph Phase 1: Metadata Collection
    seed[("Module seed list")] --> collect{{Collect metadata}}
    collect --> cache[("gitHubData.json cache")]
    collect --> metadata["metadata payload (in-memory)"]
  end

  subgraph Phase 2: Parallel Module Processing
    metadata --> parallel{{Parallel processing}}
    parallel --> clones[("modules/<br>modules_temp/")]
    parallel --> images[("website/images/")]
    parallel --> stage5["stage-5 payload (in-memory)"]
  end

  subgraph Phase 3: Catalogue Aggregation
    stage5 --> aggregate{{Aggregate catalogue}}
    aggregate --> outputs[("modules.json<br>modules.min.json<br>stats.json")]
    stage5 --> result{{Generate result markdown}}
    outputs --> result
    result --> resultMd[("result.md")]
  end

  orchestrator -.controls.-> collect
  orchestrator -.controls.-> parallel
  orchestrator -.controls.-> aggregate
Loading

Key Features

  • Orchestrator CLI: Declarative stage graph with --only/--skip support, retries, and structured logging
  • Worker Pool Stage: parallel-processing encapsulates clone, enrich, image, and analysis work behind a single supported stage
  • Aggregation Stage: aggregate-catalogue builds published JSON artifacts from the in-memory stage-5 payload
  • Schema Validation: JSON schemas enforce contracts at the published boundaries (modules.json, modules.min.json, stats.json)
  • Shared Utilities: HTTP, Git, filesystem, and rate limiting in scripts/shared/

Incremental Pipeline Behavior

The pipeline implements intelligent caching and skip logic to avoid redundant work:

Scope Optimization Current Behavior Why It Helps
Metadata API cache TTL Reuses recent host API responses during collect-metadata Reduces external API traffic
Module processing Clone reuse Recycles modules_temp/ when repositories can be refreshed in place Avoids unnecessary full re-clones
Module processing Worker batching Processes modules in bounded child-process batches Keeps memory bounded and throughput predictable
Analysis cache Cache read/write Worker-compatible moduleCache.json drives skip/read/write/prune in parallel-processing Restores second-run skip behavior while preserving worker throughput

Follow-Up Tracking

Open follow-up work is tracked centrally in Open Items.

No persisted intermediate stage boundary remains. Stage handoffs are fully in-memory.


Distribution Touchpoints

This section is about how module data enters the system and reaches downstream consumers. Unlike the canonical pipeline above, part of this flow is still conceptual.

Current Intake Flow

flowchart LR
  wiki[(module wiki list<br><i>- unreliable -</i>)]
  pipeline{{automation pipeline}}
  api[(API<br>modules.json)]
  remote[MMM-Remote-Control]
  modinstall[MMM-ModInstall]
  config[MMM-Config]
  mmpm[mmpm]
  moduleWebsite[website<br>modules.magicmirror.builders]

  wiki --> pipeline --> api
  api --> mmpm
  api --> remote
  api --> modinstall
  api --> config
  api --> moduleWebsite
Loading

Potential Future Intake Flow

flowchart LR
  ui[(Form-based front end<br>for adding, editing, and<br>deleting modules<br><i>- not yet conceptualized -</i>)]
  pipeline{{automation pipeline}}
  api[(API<br>modules.json)]
  remote[MMM-Remote-Control]
  modinstall[MMM-ModInstall]
  config[MMM-Config]
  mmpm[mmpm]
  moduleWebsite[website<br>modules.magicmirror.builders]

  ui --> pipeline --> api
  api --> remote
  api --> modinstall
  api --> config
  api --> mmpm
  api --> moduleWebsite
Loading

If this direction is pursued, the wiki would be replaced with a form-based frontend while downstream consumers continue using the unchanged API endpoint.


Related Documentation