Skip to content

Commit 76640cf

Browse files
chore(pipeline): make full-refresh-parallel the canonical default
- Switch `npm run all` and orchestrator default to `full-refresh-parallel` - Pin legacy shortcut scripts (getModules, expandModuleList, checkModules, collectMetadata) explicitly to `full-refresh` to avoid --only resolution regression (stage IDs are validated against the selected pipeline) - Finalize D1-D4 architecture decisions in worker-pool-design.md - Add P7.5/P7.6 execution checklist (C1 inventory, C2 done) - Update roadmap, orchestrator CLI reference, and related docs
1 parent 2c886ba commit 76640cf

6 files changed

Lines changed: 124 additions & 31 deletions

File tree

docs/pipeline-refactor-roadmap.md

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -122,23 +122,22 @@ Measure and visualize pipeline performance.
122122

123123
## Next Concrete Steps
124124

125-
See [worker-pool-design.md](pipeline/worker-pool-design.md) for detailed implementation plan.
125+
See [worker-pool-design.md](pipeline/worker-pool-design.md) for architecture details and
126+
[p7-cleanup-incremental-checklist.md](pipeline/p7-cleanup-incremental-checklist.md) for execution tracking.
126127

127-
**Current focus: P7.3**Implement worker pool orchestration
128+
**Current focus: P7.5**Cleanup old stage scripts
128129

129-
- Create child process spawning for parallel workers
130-
- Implement batch distribution and work stealing algorithm
131-
- Add progress tracking and health monitoring
132-
- Test with 4 workers on full module set
130+
- Inventory legacy stage entry points and consumers (`full-refresh`, npm scripts, docs references)
131+
- Switch canonical full refresh execution to `full-refresh-parallel`
132+
- Remove/retire obsolete stage wrappers and stale intermediate artifact dependencies
133+
- Run regression checks (`lint`, fixtures, schema checks, golden checks) after cleanup
133134

134-
**Completed: P7.2**
135+
**Next: P7.6** — Incremental mode integration in worker architecture
135136

136-
Successfully implemented and tested the single-worker prototype:
137-
138-
- Merged Stage 3+4+5 logic into `processModule()` function
139-
- Tested with 20 modules (100% success rate)
140-
- Average processing time: ~400ms per module (when cached)
141-
- Code location: `pipeline/workers/`
137+
- Integrate module-cache hit/miss/prune logic into parallel worker flow
138+
- Use deterministic cache merge/write strategy to avoid worker write contention
139+
- Add tests for cache behavior and validate skip rates on repeated runs
140+
- Capture before/after timing and cache-hit metrics for readiness to start P8
142141

143142
---
144143

docs/pipeline/orchestrator-cli-reference.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Task **P1.2** delivered a lightweight Node.js command-line interface that reads
66

77
## Key capabilities
88

9-
- Provide a single entry point (e.g. `node --run pipeline -- pipeline run full-refresh`) that replaces the existing ad-hoc shell scripts.
9+
- Provide a single entry point (e.g. `node --run pipeline -- pipeline run full-refresh-parallel`) that replaces the existing ad-hoc shell scripts.
1010
- Interpret `pipeline/stage-graph.json` at runtime to determine stage ordering, inputs/outputs, and side-effects.
1111
- Emit structured logs and progress indicators so maintainers can trace stage execution locally and in CI.
1212
- Persist run metadata (planned, skipped, succeeded, failed) to `.pipeline-runs/` so partial runs stay auditable.
@@ -31,7 +31,7 @@ Task **P1.2** delivered a lightweight Node.js command-line interface that reads
3131
1. **Command Surface** — Implemented with `commander`, exposing the `pipeline` root command and subcommands:
3232
- `pipeline list` — enumerate available pipelines/stages from the graph.
3333
- `pipeline describe <stage|pipeline>` — print detailed metadata for inspection.
34-
- `pipeline run <pipelineId>` — execute stages sequentially (default: `full-refresh`).
34+
- `pipeline run <pipelineId>` — execute stages sequentially (default: `full-refresh-parallel`).
3535
- `pipeline logs [runId|--latest]` — inspect structured run metadata saved to `.pipeline-runs/`.
3636
- `pipeline doctor` — check external prerequisites (Node.js version, Git availability, required env vars).
3737

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,90 @@
1+
# P7.5/P7.6 Execution Checklist
2+
3+
This checklist turns roadmap items P7.5 and P7.6 into concrete implementation slices.
4+
5+
Related docs:
6+
7+
- [Pipeline roadmap](../pipeline-refactor-roadmap.md)
8+
- [Worker pool design](worker-pool-design.md)
9+
10+
## P7.5 Cleanup Old Stage Scripts
11+
12+
- [x] C1: Inventory and dependency map
13+
- Completed on 2026-03-19.
14+
- npm scripts still exposing legacy flow:
15+
- `all`, `getModules`, `expandModuleList`, `checkModules`, `ownList` in [package.json](../../package.json).
16+
- Legacy stage numbering/descriptions in `ntl.descriptions` in [package.json](../../package.json).
17+
- Status: canonical script usage completed in C2; retire old aliases in C3.
18+
- Stage graph still contains both legacy and parallel paths:
19+
- Legacy pipeline `full-refresh` plus stages `get-modules`, `expand-module-list`, `check-modules` in [pipeline/stage-graph.json](../../pipeline/stage-graph.json).
20+
- Parallel pipeline `full-refresh-parallel` and stage `parallel-processing` in [pipeline/stage-graph.json](../../pipeline/stage-graph.json).
21+
- Status: `full-refresh-parallel` set as canonical in C2; remove/legacy-gate old stage chain in C3.
22+
- Docs with legacy/default references identified:
23+
- Contributor stage table and examples in [docs/CONTRIBUTING.md](../CONTRIBUTING.md).
24+
- Sequential five-stage architecture narrative in [docs/architecture.md](../architecture.md).
25+
- CLI default examples around `full-refresh` in [docs/pipeline/orchestrator-cli-reference.md](orchestrator-cli-reference.md).
26+
- Full-refresh command example in [docs/pipeline/worker-pool-design.md](worker-pool-design.md).
27+
- Fixture regeneration guidance using `node --run all` in [fixtures/README.md](../../fixtures/README.md).
28+
- Stage-5 reference content in [docs/pipeline/check-modules-reference.md](check-modules-reference.md).
29+
- Follow-up: normalize docs/commands in C5 to reflect the new canonical default.
30+
- Tests and harness references:
31+
- Check-group config unit test points to `scripts/check-modules` in [scripts/check-modules/**tests**/check-group-config.test.js](../../scripts/check-modules/__tests__/check-group-config.test.js).
32+
- Compare harness workflow uses `checkModules:compare` in [.github/workflows/check-modules-compare.yaml](../../.github/workflows/check-modules-compare.yaml).
33+
- Follow-up: keep harness/tests for parity validation during P7.6; reassess after worker analysis parity is complete.
34+
- CI hooks snapshot:
35+
- No workflow currently invokes legacy stage chain (`full-refresh` or `--only=get-modules|expand-module-list|check-modules`).
36+
- Git hook only runs lint-staged in [.husky/\_/pre-commit](../../.husky/_/pre-commit).
37+
- [x] C2: Canonical pipeline switch
38+
- Completed on 2026-03-19.
39+
- `npm run all` now targets `full-refresh-parallel` in [package.json](../../package.json).
40+
- `pipeline run` without explicit pipeline id now defaults to `full-refresh-parallel` in [scripts/orchestrator/index.js](../../scripts/orchestrator/index.js).
41+
- Legacy stage-specific shortcuts (`collectMetadata`, `getModules`, `expandModuleList`, `checkModules`) are pinned to `full-refresh` as a temporary compatibility path in [package.json](../../package.json).
42+
- Legacy `full-refresh` pipeline remains available as compatibility path in [pipeline/stage-graph.json](../../pipeline/stage-graph.json).
43+
- [ ] C3: Legacy script retirement
44+
- Remove or clearly deprecate obsolete wrappers and code paths that are no longer part of canonical flow.
45+
- Ensure no dead stage artifact dependencies remain in orchestrator execution path.
46+
- [ ] C4: Artifact contract cleanup
47+
- Confirm expected outputs for parallel path and remove stale assumptions about intermediate stage files.
48+
- Validate schema references for the artifacts that remain part of supported flows.
49+
- [ ] C5: Docs and command surface cleanup
50+
- Update README/docs/npm script descriptions to match canonical flow.
51+
- Ensure contributor instructions do not point to retired stage sequence.
52+
- [ ] C6: Validation pass
53+
- Run: `npm run lint`
54+
- Run: `npm run test:fixtures`
55+
- Run: `npm run golden:check`
56+
- Run: `npm run schemas:check`
57+
- Run one full `full-refresh-parallel` execution and archive summary logs.
58+
59+
## P7.6 Incremental Mode Integration
60+
61+
- [ ] I1: Cache key contract
62+
- Define worker-compatible cache key (module identity + repo freshness signal + analysis config).
63+
- Include a cache schema/version field for safe future migrations.
64+
- [ ] I2: Read path integration
65+
- Load cache at orchestrator start and provide cache context to workers.
66+
- Preserve current behavior for cache miss and partial cache entries.
67+
- [ ] I3: Write path integration
68+
- Aggregate worker cache updates in orchestrator and write once, deterministically.
69+
- Prevent write races from child processes.
70+
- [ ] I4: Skip semantics and reporting
71+
- Standardize `status=skipped` and `skippedReason=cached` handling.
72+
- Ensure progress output and final summary report skipped/cached totals clearly.
73+
- [ ] I5: Invalidation and pruning
74+
- Prune cache entries for removed modules.
75+
- Invalidate entries when key inputs change (checks config/schema version).
76+
- [ ] I6: Test coverage
77+
- Add unit tests for cache hit/miss/invalidation paths.
78+
- Add integration test showing improved second-run skip rate.
79+
- [ ] I7: Runtime controls
80+
- Support toggles for cache enable/disable in parallel mode (`--no-cache` and/or env var).
81+
- Document default behavior for full vs incremental runs.
82+
- [ ] I8: Evidence and acceptance
83+
- Record before/after performance and cache-hit metrics.
84+
- Confirm output parity for representative module subsets.
85+
86+
## Definition of Done
87+
88+
- [ ] P7.5 done: Legacy stage flow is removed or explicitly legacy-only, and all docs/scripts reference the canonical flow.
89+
- [ ] P7.6 done: Incremental cache behavior is integrated into worker architecture with tests and measurable skip-rate benefit.
90+
- [ ] Ready for P8: No open P7 blockers remain in roadmap or worker design decision list.

docs/pipeline/worker-pool-design.md

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -517,16 +517,20 @@ PIPELINE_CACHE_ENABLED=true
517517

518518
---
519519

520-
## Open Questions
520+
## Architecture Decisions (Mar 2026)
521521

522-
1. **Cache location**: Single shared cache file or per-worker cache files?
523-
- Recommendation: Single shared cache, locked during writes
522+
1. **Cache location**: Keep a single shared cache file at `website/data/moduleCache.json`.
524523

525-
2. **Git operations**: Shared `modules/` directory or per-worker clone dirs?
526-
- Recommendation: Shared `modules/` with file locking per module
524+
- Decision details: Workers return cache updates to orchestrator, and orchestrator performs deterministic cache writeback.
527525

528-
3. **Image processing**: Run in worker or separate phase?
529-
- Recommendation: In worker (already part of Stage 4)
526+
2. **Git operations**: Keep shared `modules/` and `modules_temp/` directories.
530527

531-
4. **ESLint/ncu**: Include in worker or optional post-processing?
532-
- Recommendation: Optional in worker, controlled by config
528+
- Decision details: Use per-module isolation/locking to avoid conflicts while preserving clone reuse.
529+
530+
3. **Image processing**: Keep image extraction/resizing inside worker flow.
531+
532+
- Decision details: Image handling remains part of the enrich phase to keep per-module processing self-contained.
533+
534+
4. **ESLint/ncu**: Keep optional in worker, controlled by check-group configuration.
535+
536+
- Decision details: Pipeline mode defaults are explicit and may differ between full-refresh and incremental-oriented runs.

package.json

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -28,11 +28,11 @@
2828
},
2929
"scripts": {
3030
"start": "ntl --size 50",
31-
"all": "node scripts/orchestrator/index.js run full-refresh",
32-
"collectMetadata": "node scripts/orchestrator/index.js run --only=collect-metadata",
33-
"getModules": "node scripts/orchestrator/index.js run --only=get-modules",
34-
"expandModuleList": "node scripts/orchestrator/index.js run --only=expand-module-list",
35-
"checkModules": "node scripts/orchestrator/index.js run --only=check-modules",
31+
"all": "node scripts/orchestrator/index.js run full-refresh-parallel",
32+
"collectMetadata": "node scripts/orchestrator/index.js run full-refresh --only=collect-metadata",
33+
"getModules": "node scripts/orchestrator/index.js run full-refresh --only=get-modules",
34+
"expandModuleList": "node scripts/orchestrator/index.js run full-refresh --only=expand-module-list",
35+
"checkModules": "node scripts/orchestrator/index.js run full-refresh --only=check-modules",
3636
"pipeline": "node scripts/orchestrator/index.js",
3737
"golden:check": "node scripts/golden-artifacts/index.js check",
3838
"golden:update": "node scripts/golden-artifacts/index.js update",
@@ -96,7 +96,7 @@
9696
},
9797
"ntl": {
9898
"descriptions": {
99-
"all": "Run all scripts (1 till 6) on all modules. Requires a lot of time and storage space!",
99+
"all": "Run the canonical full refresh pipeline (`full-refresh-parallel`) on all modules. Requires a lot of time and storage space!",
100100
"collectMetadata": "Script 1+2: Convert the official module list from the wiki into a json file and update the JSON file that collects the GitHub information of the modules.",
101101
"getModules": "Script 3: Get all modules with `git clone`. Requires a lot of time and storage space!",
102102
"expandModuleList": "Script 4: Extend the module list with information from the package.json. And get an image if one is available and the license is okay.",

scripts/orchestrator/index.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -431,7 +431,7 @@ export async function main(argv = process.argv) {
431431
.option("--json-logs", "Output logs in JSON format")
432432
.action(async (pipelineId, options) => {
433433
const graphPath = resolve(options.graph);
434-
const selectedPipeline = pipelineId ?? "full-refresh";
434+
const selectedPipeline = pipelineId ?? "full-refresh-parallel";
435435
const filters = {
436436
only: options.only,
437437
skip: options.skip

0 commit comments

Comments
 (0)