Define-JSON employs a copy-and-link versioning model rather than traditional inheritance hierarchies. This architectural choice has profound implications for clinical data management, regulatory compliance, and system maintainability.
Instead of inheriting from parent versions, each MetaDataVersion is a complete, self-contained snapshot that links back to its sources via wasDerivedFrom relationships:
┌──wasDerivedFrom── CDISC Standard v1.0
│
Study ABC v1.3 ─────┼──wasDerivedFrom── Study Template v2.1
│
└──wasDerivedFrom── Local Extensions
Each version contains complete copies of all relevant definitions, creating an immutable snapshot with explicit provenance chains.
Regulatory Compliance
- Immutable Audit Trail: Once created, versions never change - essential for regulatory submissions
- Complete Context: Each submission is self-contained with no external dependencies
- Clear Lineage:
wasDerivedFromprovides unambiguous provenance for regulatory review
Data Integrity
- No Cascade Failures: Changes to standards don't break existing studies
- Temporal Consistency: Historical analyses remain reproducible indefinitely
- Explicit Customisation: Study-specific modifications are clearly documented
Operational Benefits
- Simplified Queries: All metadata is local - no complex inheritance resolution
- Parallel Development: Teams can work independently without version conflicts
- Predictable Performance: No deep inheritance chains to traverse
Storage vs Simplicity
- Cost: More storage required for duplicated definitions
- Benefit: Eliminates complex inheritance logic and cascade dependencies
- Reality: Storage is cheap; developer time and regulatory risk are expensive
Propagation vs Isolation
- Challenge: Updates must be explicitly propagated to derived versions
- Benefit: Prevents unintended changes from affecting production systems
- Mitigation: Tooling can assist with selective update propagation
Divergence vs Control
- Risk: Versions may diverge over time without careful governance
- Benefit: Explicit control over what changes are adopted and when
- Practice: Regular review cycles ensure alignment where needed
This mirrors successful patterns in other domains:
- Git: Complete snapshots with parent pointers, not deltas
- Docker: Immutable layers with explicit inheritance
- Blockchain: Immutable blocks with cryptographic lineage
For clinical data, where "reproducible forever" is not just good practice but regulatory requirement, this architecture provides the foundation for trustworthy, auditable data systems.