Skip to content

Date Shift Audit Report #21

@AndyRae

Description

@AndyRae

Is this the right issue type?

  • Yes, I'm planning work for this project team.

Summary

For studies, we need formalised auditability around our date shifting process.

We should generate a structured Date Shifting Audit Report automatically whenever the date shifting pipeline is executed.

The goal is to demonstrate:

  • Determinism and reproducibility
  • Consistent application
  • Data integrity preservation
  • Traceability to code version and run context

Content

1. Run Metadata

Include:

  • Study ID
  • Dataset version
  • Extraction date/time (UTC)
  • Git commit hash
  • Environment (dev/test/prod)
  • Operator or service account
  • Source snapshot reference if applicable

Purpose: anchor the run in time and code version.


2. Date Shift Configuration

Document the exact configuration used:

  • Per-patient random offset
  • Per-study fixed offset
  • Offset range (e.g. ±180 days)
  • Random seed used
  • Unit of shift (days)
  • Direction symmetry

Purpose: ensure reproducibility and clarity of implementation.


3. Scope of Application

Explicitly list:

  • Date fields shifted
  • Date fields excluded
  • Any preserved anchor/index dates
  • Handling of partial dates (e.g. year-only)

Purpose: show intentional application of shifting rules.


4. Statistical Summary

Generate summary statistics:

  • Number of patients shifted
  • Number of records shifted
  • Min offset
  • Max offset
  • Mean offset
  • Distribution of offsets (bucketed counts)
  • Null or skipped date counts

Purpose: demonstrate controlled and expected distribution.


5. Integrity Checks

Automated validation checks with pass/fail status:

  • Chronological order preserved within patient
  • No negative durations introduced
  • No dates shifted outside allowed study window
  • No system boundary violations
  • Referential integrity unchanged

Purpose: confirm no unintended data distortion.


6. Reproducibility Statement

Include statement:

This dataset can be regenerated by executing commit with seed against source snapshot .

Purpose: explicit reproducibility assurance.


7. Output Fingerprint

Include:

  • Row count
  • Column count
  • SHA256 hash of final dataset

Purpose: detect post-generation alteration.


Deliverables

  • JSON audit artifact stored with study outputs
  • Human-readable summary (markdown or PDF)

Non-Goals

  • Do not include per-patient offset mapping in report
  • Do not expose identifiable data
  • Do not store sensitive seeds outside secure context

Acceptance Criteria

  • When the date shifting process runs, generate a structured audit report (JSON + human-readable summary).
  • Audit report generated automatically during date shift run
  • Report includes all required metadata and validation checks
  • Output dataset hash reproducible across reruns with same seed
  • Tests pass.
  • Documentation added.

Confirm creation

  • Ready to create this feature?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Priority

    P2

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions