Skip to content

Proposal: optional experiment audit manifest for medical ML runs #1040

@mindbomber

Description

@mindbomber

Proposal

Add an optional, dependency-light experiment audit manifest example for GaNDLF training/inference runs.

GaNDLF is a general medical imaging framework, so reproducibility and safe sharing matter: users often need to explain which config, dataset reference, model artifact, and output path produced a result without publishing patient data, private paths, PHI/PII, tokens, or full sensitive runtime arguments.

This proposal is intentionally docs/example-first. It would not change GaNDLF training behavior, scoring, or dependencies.

Suggested Shape

{
  "schema_version": "gandlf.experiment_audit.v1",
  "run_id": "local-run:redacted-or-hashed",
  "task_type": "segmentation | classification | regression | synthesis | other",
  "config": {
    "config_path": "relative/or/redacted/config.yaml",
    "config_sha256": "..."
  },
  "data": {
    "dataset_ref": "dataset:redacted-or-public-id",
    "split_ref": "calibration | heldout | external_reporting | local_private",
    "contains_phi_or_pii": "unknown | yes | no",
    "redaction_status": "not_public | safe_for_public_log"
  },
  "artifacts": {
    "model_ref": "relative/or/redacted/model/path",
    "prediction_ref": "relative/or/redacted/output/path",
    "metrics_ref": "relative/or/redacted/metrics/path"
  },
  "runtime": {
    "container_digest": "sha256:...",
    "gandlf_version": "...",
    "python_version": "..."
  },
  "claim_status": "diagnostic | internal | heldout | external_reporting",
  "privacy_controls": {
    "raw_patient_data_logged": false,
    "phi_or_pii_in_public_log": false,
    "private_paths_redacted": true
  }
}

Why This Helps

  • Makes GaNDLF runs easier to reproduce and review.
  • Gives researchers a safe way to publish run provenance without exposing protected health data or private local paths.
  • Separates internal/diagnostic results from held-out or externally reportable claims.
  • Provides a simple foundation for downstream auditability, provenance, and model-card/report artifacts.

Initial Scope

A minimal first PR could add:

  1. docs/examples/experiment_audit.example.json or a similar example file.
  2. A short docs note explaining that the manifest is optional and non-normative.
  3. Guidance that public manifests should not include raw patient data, PHI/PII, tokens, private account IDs, or full sensitive arguments.

If maintainers prefer this to live under a different docs path or naming scheme, I can adjust before opening a PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions