Skip to content

Latest commit

 

History

History
168 lines (108 loc) · 5.38 KB

File metadata and controls

168 lines (108 loc) · 5.38 KB

Deterministic Pipeline Outputs

This document describes the guarantees around reproducible, deterministic outputs from the module processing pipeline.

Motivation

Deterministic outputs provide several benefits:

  1. Cleaner Git Diffs: When only meaningful changes are made, diffs show exactly what changed without noise from reordered keys or renamed files
  2. Easier Code Reviews: Reviewers can focus on actual changes rather than structural reorganization
  3. Reproducible Builds: Running the pipeline multiple times on the same input produces identical output
  4. Debugging: Comparing pipeline runs becomes straightforward with consistent formatting

Guarantees

JSON Output Files

All JSON files generated by the pipeline (modules.json, modules.min.json, stats.json, metadata files) have the following guarantees:

  • Sorted Object Keys: All object keys are sorted alphabetically at every nesting level
  • Consistent Indentation: 2-space indentation for pretty-printed files
  • Trailing Newline: Every JSON file ends with exactly one newline character

Implementation: The stringifyDeterministic() function in scripts/shared/deterministic-output.ts recursively sorts all object keys before serialization.

Example:

{
  "description": "A weather module",
  "id": "MMM-Weather",
  "maintainer": "example",
  "url": "https://github.com/example/MMM-Weather"
}

Screenshot Filenames

Module screenshots are stored with deterministic filenames to ensure:

  • Same Module → Same Filename: The same module always gets the same screenshot filename
  • Different Modules → Different Filenames: No collisions between different modules
  • No Source Dependency: Renaming the source image doesn't change the output filename
  • Human-Readable: Filename clearly identifies the module for easy debugging

Implementation: The createDeterministicImageName() function uses the module identifier (moduleName---maintainer) directly as the filename base.

Format: <moduleName>---<maintainer>.<extension>

Example:

  • Module: MMM-Weather by example
  • Screenshot: MMM-Weather---example.jpg (always the same for this module)

Why Simple Deterministic Names?

Previous approach used original source filenames, which caused issues:

❌ Old: MMM-Weather---example---path/to/screenshot.jpg
✅ New: MMM-Weather---example.jpg

Problems with old approach:

  • Renaming source image triggered unnecessary file changes
  • Path separators in filenames caused issues
  • Long, unpredictable filenames

Benefits of simple deterministic approach:

  • Consistent, predictable filenames
  • No dependency on source filename
  • Human-readable for easy debugging
  • Simple implementation, no hashing needed

Usage

Writing JSON with Sorted Keys

import { writeJson } from "./shared/fs-utils.ts";

// Automatically uses sorted keys
await writeJson("output.json", { b: 2, a: 1, c: 3 });
// Result: {"a": 1, "b": 2, "c": 3}

Manual JSON Stringification

import { stringifyDeterministic } from "./shared/deterministic-output.ts";

const data = { z: 26, a: 1, m: 13 };
const json = stringifyDeterministic(data, 2);
// Result: "{\n  \"a\": 1,\n  \"m\": 13,\n  \"z\": 26\n}"

Generating Screenshot Names

import { createDeterministicImageName } from "./shared/deterministic-output.ts";

const filename = createDeterministicImageName("MMM-Weather", "example", "jpg");
// Result: "MMM-Weather---example.jpg" (deterministic, always the same)

Testing Determinism

To verify deterministic output:

# Run pipeline twice
npm run pipeline

# Copy output
cp website/data/modules.json /tmp/modules-run1.json

# Run pipeline again
npm run pipeline

# Compare outputs - should be identical
diff website/data/modules.json /tmp/modules-run1.json

No diff means perfect reproducibility.

Implementation Details

Key Sorting Algorithm

The sortObjectKeys() function recursively processes values:

  1. Primitives (null, string, number, boolean): returned as-is
  2. Arrays: mapped recursively, preserving order
  3. Objects: keys sorted alphabetically, values processed recursively

This ensures deterministic output at all nesting levels.

Filename Generation

Screenshot filenames follow a simple, deterministic pattern:

  • Format: ${moduleName}---${maintainer}.${extension}
  • Example: MMM-Weather---example.jpg
  • Benefits: Human-readable, debuggable, no collisions

No hashing required - the module identifier itself is already unique and deterministic.

Performance Impact

  • Key Sorting: Negligible (<1% overhead on typical module counts)
  • Filename Generation: Instant string concatenation
  • Overall: No measurable impact on pipeline runtime

Compatibility Note

Some older snapshots in the repository may still contain pre-standardized screenshot filenames (for example, names derived from source paths). The current canonical output uses the deterministic <moduleName>---<maintainer>.<extension> format.

Downstream consumers should always read screenshot paths from modules.json rather than hard-coding file names.

Status

Deterministic output safeguards (sorted keys and deterministic image naming) are part of the current pipeline behavior.

Follow-Up Tracking

Potential deterministic-output enhancements are tracked centrally in Open Items under "Backlog (Optional)".