|
1 | 1 | # Pipeline Architecture |
2 | 2 |
|
3 | | -Visibility into the automation that builds and publishes the third-party module catalogue helps contributors reason about changes and spot failure points early. This document summarizes the current canonical pipeline, the legacy state we migrated from, and the parts of the broader architecture that are still future-facing. |
| 3 | +Visibility into the automation that builds and publishes the third-party module catalogue helps contributors reason about changes and spot failure points early. This document summarizes the current canonical pipeline and the parts of the broader architecture that are still future-facing. |
4 | 4 |
|
5 | 5 | ## Current State (March 2026) |
6 | 6 |
|
@@ -68,71 +68,9 @@ The pipeline implements intelligent caching and skip logic to avoid redundant wo |
68 | 68 |
|
69 | 69 | --- |
70 | 70 |
|
71 | | -## Legacy Workflow Snapshot (pre-September 2025) |
| 71 | +### Follow-Up Tracking |
72 | 72 |
|
73 | | -```mermaid |
74 | | -flowchart TB |
75 | | - subgraph Stage 1: Create Module List |
76 | | - wiki[("MagicMirror wiki table")] --> createLegacy{{Create module list<br>Node.js}} |
77 | | - createLegacy --> stage1Legacy["legacy stage-1 snapshot"] |
78 | | - end |
79 | | -
|
80 | | - subgraph Stage 2: Update Repository Data |
81 | | - stage1Legacy --> updateLegacy{{Update repository data<br>Node.js}} |
82 | | - updateLegacy --> cacheLegacy[("gitHubData.json cache")] |
83 | | - updateLegacy --> stage2Legacy["legacy stage-2 snapshot"] |
84 | | - end |
85 | | -
|
86 | | - subgraph Stage 3: Get Modules |
87 | | - stage2Legacy --> getLegacy{{Fetch repos<br>Python}} |
88 | | - getLegacy --> clonesLegacy[("modules/<br>modules_temp/")] |
89 | | - getLegacy --> stage3Legacy["legacy stage-3 snapshot"] |
90 | | - end |
91 | | -
|
92 | | - subgraph Stage 4: Expand Module List |
93 | | - stage3Legacy --> expandLegacy{{Enrich metadata<br>Node.js}} |
94 | | - expandLegacy --> imagesLegacy[("website/images/")] |
95 | | - expandLegacy --> stage4Legacy["legacy stage-4 snapshot"] |
96 | | - end |
97 | | -
|
98 | | - subgraph Stage 5: Check Modules JS |
99 | | - stage4Legacy --> checkjsLegacy{{Static checks<br>Node.js}} |
100 | | - checkjsLegacy --> stage5Legacy["legacy Stage-5 snapshot"] |
101 | | - end |
102 | | -
|
103 | | - subgraph Stage 6: Check Modules |
104 | | - stage5Legacy --> checkLegacy{{Deep analysis<br>Python}} |
105 | | - checkLegacy --> outputsLegacy[("modules.json<br>modules.min.json<br>stats.json<br>result.md")] |
106 | | - end |
107 | | -
|
108 | | - note1[["Mixed runtime: Python + Node.js"]] |
109 | | - note2[["No orchestrator: manual script execution"]] |
110 | | - note3[["6 sequential stages"]] |
111 | | -``` |
112 | | - |
113 | | -This legacy diagram captures the pre-orchestrator, mixed-runtime pipeline. Key issues that motivated the modernization: |
114 | | - |
115 | | -- Mixed Python + Node.js runtime made maintenance difficult |
116 | | -- No orchestrator: manual script execution |
117 | | -- No incremental updates: full run required every time |
118 | | -- OOM risk with 1300+ modules loaded into memory |
119 | | -- 6 sequential stages with 6 intermediate JSON files |
120 | | - |
121 | | -### Comparison: Legacy vs. Current Flow |
122 | | - |
123 | | -| Aspect | Legacy (6 stages) | Current flow (Mar 2026) | |
124 | | -| ------------------ | ------------------------- | -------------------------------------------------------------------------------------------------------- | |
125 | | -| Runtime | Python + Node.js | Node.js with TypeScript-based deep checks | |
126 | | -| Execution | Sequential manual scripts | Orchestrated 4-stage pipeline with in-process handoff | |
127 | | -| Incremental | ❌ No | Partial: metadata cache + clone reuse | |
128 | | -| Memory | Unbounded | Batch-/worker-bounded | |
129 | | -| Intermediate files | 6 | none; only published outputs are written (`modules.json`, `modules.min.json`, `stats.json`, `result.md`) | |
130 | | - |
131 | | -### Remaining Gaps |
132 | | - |
133 | | -1. Reintegrate worker-compatible `moduleCache.json` handling under P7.6. |
134 | | -2. Record before/after repeated-run performance metrics once cache writes are back in place. |
135 | | -3. Keep the published contract (`modules.json`, `modules.min.json`, `stats.json`, `result.md`) stable while worker caching evolves. |
| 73 | +Open follow-up work is tracked centrally in [Open Items](open-items.md). |
136 | 74 |
|
137 | 75 | No persisted intermediate stage boundary remains. Stage handoffs are fully in-memory. |
138 | 76 |
|
|
0 commit comments