Skip to content

Latest commit

 

History

History
68 lines (44 loc) · 2.94 KB

File metadata and controls

68 lines (44 loc) · 2.94 KB

Versioning Architecture: Copy-and-Link vs Inheritance

Define-JSON employs a copy-and-link versioning model rather than traditional inheritance hierarchies. This architectural choice has profound implications for clinical data management, regulatory compliance, and system maintainability.

The Copy-and-Link Model

Instead of inheriting from parent versions, each MetaDataVersion is a complete, self-contained snapshot that links back to its sources via wasDerivedFrom relationships:

                    ┌──wasDerivedFrom── CDISC Standard v1.0
                    │
Study ABC v1.3 ─────┼──wasDerivedFrom── Study Template v2.1
                    │
                    └──wasDerivedFrom── Local Extensions

Each version contains complete copies of all relevant definitions, creating an immutable snapshot with explicit provenance chains.

Why This Matters for Clinical Data

Regulatory Compliance

  • Immutable Audit Trail: Once created, versions never change - essential for regulatory submissions
  • Complete Context: Each submission is self-contained with no external dependencies
  • Clear Lineage: wasDerivedFrom provides unambiguous provenance for regulatory review

Data Integrity

  • No Cascade Failures: Changes to standards don't break existing studies
  • Temporal Consistency: Historical analyses remain reproducible indefinitely
  • Explicit Customisation: Study-specific modifications are clearly documented

Operational Benefits

  • Simplified Queries: All metadata is local - no complex inheritance resolution
  • Parallel Development: Teams can work independently without version conflicts
  • Predictable Performance: No deep inheritance chains to traverse

Trade-offs Considered

Storage vs Simplicity

  • Cost: More storage required for duplicated definitions
  • Benefit: Eliminates complex inheritance logic and cascade dependencies
  • Reality: Storage is cheap; developer time and regulatory risk are expensive

Propagation vs Isolation

  • Challenge: Updates must be explicitly propagated to derived versions
  • Benefit: Prevents unintended changes from affecting production systems
  • Mitigation: Tooling can assist with selective update propagation

Divergence vs Control

  • Risk: Versions may diverge over time without careful governance
  • Benefit: Explicit control over what changes are adopted and when
  • Practice: Regular review cycles ensure alignment where needed

Implementation Pattern

This mirrors successful patterns in other domains:

  • Git: Complete snapshots with parent pointers, not deltas
  • Docker: Immutable layers with explicit inheritance
  • Blockchain: Immutable blocks with cryptographic lineage

For clinical data, where "reproducible forever" is not just good practice but regulatory requirement, this architecture provides the foundation for trustworthy, auditable data systems.