Skip to content

Latest commit

 

History

History
880 lines (687 loc) · 27.4 KB

File metadata and controls

880 lines (687 loc) · 27.4 KB

Audit Remediation Checklist

Based on: AUDIT_REPORT_BBOP_SKILLS.md Date Created: 2026-02-19 Overall Compliance: 78% (7/9 PASS) → Target: 100% (9/9 PASS)


Quick Summary

Current Status:

  • 7 criteria PASS - Provenance, model tracking, reasoning/code separation, validation, error-correction, RAG
  • ⚠️ 1 criterion PARTIAL - Temporary artifact cleanup (basic exists, needs documentation/automation)
  • 1 criterion FAIL - MCP server integration (not critical, consider adoption)
  • 1 criterion FAIL - Input data hashing (critical gap, needs fixing)

Priority Actions:

  1. 🔴 HIGH: Add input data hashing (1 day)
  2. 🟡 MEDIUM: Document cleanup policies (2 hours)
  3. 🟡 MEDIUM: Add cleanup automation (4 hours)
  4. 🟢 LOW: Evaluate MCP adoption (1-2 weeks research)

High Priority Actions (Week 1-2)

✅ Action 1: Implement Input Data Hashing

Priority: 🔴 HIGH (Critical for Criterion 4) Estimated Effort: 1 day Owner: Data Engineering Impact: Critical - Enables cryptographic reproducibility verification

Description: Add SHA256 checksums to all experimental data processing pipelines to enable cryptographic verification of input data integrity.

Files to Modify:

  • scripts/analyze_plate_replicates.py - Add hashing function
  • scripts/run_dual_analysis.py - Record hashes in workflow
  • .claude/provenance/manifest-template.yaml - Add input_data_hashes section
  • data/checksums.txt - Create manifest (new file)

Implementation Steps:

Step 1.1: Add hashing utility function

# Add to scripts/analyze_plate_replicates.py (before main())
import hashlib

def compute_sha256(file_path: str) -> str:
    """
    Compute SHA256 checksum of file.

    Args:
        file_path: Path to file

    Returns:
        Hexadecimal SHA256 checksum

    Examples:
        >>> compute_sha256("test.tsv")  # doctest: +SKIP
        'a3f8b2c1d4e5f6a7b8c9d0e1f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d8e9f0a1'
    """
    sha256 = hashlib.sha256()
    with open(file_path, "rb") as f:
        for chunk in iter(lambda: f.read(4096), b""):
            sha256.update(chunk)
    return sha256.hexdigest()

Step 1.2: Hash input files in analysis pipeline

# Modify main() in scripts/analyze_plate_replicates.py
def main(data_dir, mode="absolute", output_base="outputs/"):
    # ... (existing code)

    # NEW: Compute checksums for all input files
    input_files = glob.glob(f"{data_dir}/*.tsv")
    checksums = {}
    for file_path in input_files:
        rel_path = os.path.relpath(file_path)
        checksum = compute_sha256(file_path)
        checksums[rel_path] = checksum
        print(f"Computed SHA256 for {rel_path}: {checksum}")

    # NEW: Save checksums to output directory
    checksum_file = output_dir / "input_data_checksums.json"
    with open(checksum_file, "w") as f:
        json.dump(checksums, f, indent=2)
    print(f"Saved checksums to: {checksum_file}")

    # ... (rest of analysis)

Step 1.3: Create data checksums manifest

# Run once to create data/checksums.txt
#!/bin/bash
# scripts/generate_checksums.sh

cd data/experimental
find . -name "*.tsv" -type f | while read file; do
    echo "$(sha256sum "$file")" >> ../checksums.txt
done

Step 1.4: Update provenance template

# .claude/provenance/manifest-template.yaml
# Add after user_context:
input_data:
  - path: "relative/path/to/input1.tsv"
    hash: "sha256:PLACEHOLDER"
    size_bytes: 0
    last_modified: "YYYY-MM-DDTHH:MM:SSZ"

Step 1.5: Add verification command to justfile

# Add to justfile
# Verify data integrity using checksums
[group('project management')]
verify-data-integrity:
  #!/usr/bin/env python3
  import sys, hashlib, json
  with open("data/checksums.txt") as f:
      for line in f:
          checksum, path = line.strip().split("  ", 1)
          actual = hashlib.sha256(open(path, "rb").read()).hexdigest()
          if actual != checksum:
              print(f"FAIL: {path} (expected {checksum}, got {actual})")
              sys.exit(1)
  print("✅ All checksums verified")

Verification:

  • Run just analyze-experimental data/experimental/plate_designs_v10_results/
  • Check that outputs/.../input_data_checksums.json exists
  • Verify checksums match original files
  • Run just verify-data-integrity and confirm PASS

Acceptance Criteria:

  • ✅ All experimental analysis runs save input_data_checksums.json
  • ✅ Checksums include all input TSV files
  • data/checksums.txt manifest exists
  • just verify-data-integrity command works

✅ Action 2: Document Artifact Cleanup Policies

Priority: 🔴 HIGH (Required for Criterion 9) Estimated Effort: 2 hours Owner: DevOps/Infrastructure Impact: Moderate - Prevents unbounded storage growth

Description: Create comprehensive documentation of artifact retention, archival, and deletion policies.

Files to Create:

  • docs/ARTIFACT_CLEANUP_POLICY.md - Cleanup policy documentation (new file)

Implementation Steps:

Step 2.1: Create policy document

# Create new file
touch docs/ARTIFACT_CLEANUP_POLICY.md

Step 2.2: Write policy content

# Artifact Cleanup Policy

**Last Updated:** 2026-02-19
**Version:** 1.0

## Overview

This document defines retention, archival, and deletion policies for all artifacts generated by MicroGrowAgents.

## Retention Tiers

### Tier 1: Archival (Indefinite Retention)

**Keep permanently. Never auto-delete.**

- **Final recommendations:** `outputs/recommendations/recommendations_v*.yaml`
- **Optimization reports:** `outputs/*/optimization/OPTIMIZATION_REPORT_*.md`
- **Session manifests:** `.claude/provenance/sessions/*/manifest.yaml`
- **Session summaries:** `.claude/provenance/sessions/*/summary.md`
- **Schema definitions:** `src/microgrowagents/schema/*.yaml`
- **Data checksums:** `data/checksums.txt`

**Storage:** Git-tracked (version controlled)

---

### Tier 2: Temporary (90-Day Retention)

**Keep for analysis, archive or delete after 90 days.**

- **Experimental analysis outputs:**
  - `outputs/*/processed_data_*.tsv`
  - `outputs/*/replicate_statistics_*.tsv`
  - `outputs/*/control_statistics.tsv`
- **Visualizations:**
  - `outputs/*/*.pdf`
  - `outputs/*/*.png`
- **Response surfaces:**
  - `outputs/*/response_surfaces/surface_predictions_*.csv`
  - `outputs/*/response_surfaces/*.pdf`
- **Clustering outputs:**
  - `outputs/*/cluster_assignments_*.csv`
  - `outputs/*/clustered_heatmap_*.pdf`

**Storage:** Local disk (NOT git-tracked)
**After 90 days:** Archive to `archives/YYYY-MM/` as `.tar.gz` OR delete if redundant

---

### Tier 3: Ephemeral (Remove After Successful Run)

**Temporary working files. Delete immediately on success.**

- **Model pickles:** `outputs/*/*.pkl` (GP models)
- **Intermediate files:** `outputs/*/*.tmp`
- **Debug artifacts:** `outputs/*/debug_*.log`
- **Failed run outputs:** `outputs/*/.incomplete` directories

**Storage:** Local disk (gitignored)
**Cleanup:** Automatic on successful run completion

---

## Cleanup Commands

### Manual Cleanup

```bash
# Remove outputs older than 90 days (DRY RUN - safe to test)
just clean-old-outputs --older-than 90 --dry-run

# Actually remove outputs older than 90 days
just clean-old-outputs --older-than 90

# Archive outputs to compressed tarball
just archive-outputs --older-than 90

# Remove ephemeral artifacts from all output directories
just clean-ephemeral

# Remove incomplete outputs from failed runs
just clean-failed-runs

Automated Cleanup

Weekly cron job (recommended):

# Add to crontab: Archive outputs older than 90 days every Sunday
0 0 * * 0 cd /path/to/MicroGrowAgents && just archive-outputs --older-than 90

Post-run cleanup (recommended):

# Add to analysis scripts: Remove ephemeral artifacts on success
just analyze-experimental data/... && just clean-ephemeral

Storage Estimates

Tier Current Size Growth Rate Retention
Archival ~5 MB ~0.5 MB/month Indefinite
Temporary ~60 MB ~15 MB/week 90 days
Ephemeral ~1 MB ~1 MB/run Immediate

Projected storage (with cleanup):

  • Month 1: 5 + 60 + 0 = 65 MB
  • Month 3: 5 + 180 + 0 = 185 MB
  • Month 6: 5 + 180 + 0 = 185 MB (steady state with 90-day rotation)

Without cleanup (unbounded growth):

  • Month 6: 5 + 360 + 24 = 389 MB
  • Year 1: 5 + 720 + 48 = 773 MB

Exceptions

Keep beyond retention policy if:

  1. Part of published study (DOI assigned)
  2. Referenced in grant report
  3. Legal/compliance hold
  4. Active debugging investigation

Document exception: Add entry to archives/RETENTION_EXCEPTIONS.md


Recovery

Accidental deletion?

  1. Check git history for archival-tier files (git log -- outputs/recommendations/)
  2. Check archives/YYYY-MM/ for temporary-tier files
  3. Check provenance logs for reproduction commands (.claude/provenance/sessions/*/manifest.yaml)

Re-generate deleted outputs:

# From provenance session manifest
SESSION_ID="2026-01-10-21-16"
cat .claude/provenance/sessions/$SESSION_ID/manifest.yaml

# Extract original command
# Example: just analyze-experimental data/experimental/plate_designs_v10_results/

# Re-run analysis
just analyze-experimental data/experimental/plate_designs_v10_results/

Related Documentation


**Step 2.3: Link from CLAUDE.md**
```markdown
# Add to CLAUDE.md under "Project-Specific Files & Locations"

### Artifact Management

**Cleanup Policy:** `docs/ARTIFACT_CLEANUP_POLICY.md`
- Retention tiers (archival, temporary, ephemeral)
- Cleanup commands (`just clean-old-outputs`, `just archive-outputs`)
- Storage estimates and growth projections

Verification:

  • docs/ARTIFACT_CLEANUP_POLICY.md created
  • Policy defines 3 retention tiers
  • Cleanup commands documented
  • Linked from CLAUDE.md

Acceptance Criteria:

  • ✅ Policy document exists and is comprehensive
  • ✅ All artifact types categorized into tiers
  • ✅ Cleanup commands documented (even if not yet implemented)
  • ✅ Storage estimates provided

Medium Priority Actions (Month 1-2)

✅ Action 3: Add Cleanup Automation Scripts

Priority: 🟡 MEDIUM Estimated Effort: 4 hours Owner: DevOps/Infrastructure Impact: Moderate - Automates manual cleanup tasks

Description: Implement cleanup scripts referenced in ARTIFACT_CLEANUP_POLICY.md.

Files to Create:

  • scripts/cleanup_old_outputs.py - Remove old analysis outputs (new file)
  • scripts/archive_outputs.py - Archive old outputs to tar.gz (new file)
  • scripts/clean_ephemeral.py - Remove ephemeral artifacts (new file)

Files to Modify:

  • justfile - Add cleanup commands

Implementation Steps:

Step 3.1: Create cleanup_old_outputs.py

# scripts/cleanup_old_outputs.py
"""Remove analysis outputs older than specified days."""
import os
import shutil
import argparse
from pathlib import Path
from datetime import datetime, timedelta

def cleanup_old_outputs(output_dir: str, older_than_days: int = 90, dry_run: bool = True):
    """
    Remove analysis outputs older than specified days.

    Args:
        output_dir: Base output directory
        older_than_days: Age threshold in days
        dry_run: If True, only print what would be removed

    Examples:
        >>> cleanup_old_outputs("outputs/", older_than_days=90, dry_run=True)  # doctest: +SKIP
    """
    cutoff = datetime.now() - timedelta(days=older_than_days)
    removed_count = 0
    total_size = 0

    for analysis_dir in Path(output_dir).glob("*_experimental_analysis_*"):
        # Skip if not a directory
        if not analysis_dir.is_dir():
            continue

        # Check modification time
        mtime = datetime.fromtimestamp(analysis_dir.stat().st_mtime)

        if mtime < cutoff:
            # Calculate size
            size = sum(f.stat().st_size for f in analysis_dir.rglob('*') if f.is_file())
            total_size += size

            if dry_run:
                print(f"[DRY RUN] Would remove: {analysis_dir} ({size / 1024 / 1024:.2f} MB)")
            else:
                print(f"Removing: {analysis_dir} ({size / 1024 / 1024:.2f} MB)")
                shutil.rmtree(analysis_dir)
                removed_count += 1

    if dry_run:
        print(f"\n[DRY RUN] Would remove {removed_count} directories ({total_size / 1024 / 1024:.2f} MB total)")
    else:
        print(f"\nRemoved {removed_count} directories ({total_size / 1024 / 1024:.2f} MB total)")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Remove old analysis outputs")
    parser.add_argument("output_dir", default="outputs/", help="Output directory")
    parser.add_argument("--older-than", type=int, default=90, help="Age threshold in days")
    parser.add_argument("--dry-run", action="store_true", help="Dry run (don't actually remove)")
    args = parser.parse_args()

    cleanup_old_outputs(args.output_dir, args.older_than, args.dry_run)

Step 3.2: Create archive_outputs.py

# scripts/archive_outputs.py
"""Archive old outputs to compressed tarball."""
import os
import tarfile
import argparse
from pathlib import Path
from datetime import datetime, timedelta

def archive_outputs(output_dir: str, older_than_days: int = 90, archive_dir: str = "archives"):
    """
    Archive outputs older than specified days to .tar.gz.

    Args:
        output_dir: Base output directory
        older_than_days: Age threshold in days
        archive_dir: Directory to store archives

    Examples:
        >>> archive_outputs("outputs/", older_than_days=90)  # doctest: +SKIP
    """
    cutoff = datetime.now() - timedelta(days=older_than_days)
    archive_base = Path(archive_dir)
    archive_base.mkdir(exist_ok=True)

    # Create monthly archive directory
    month_dir = archive_base / datetime.now().strftime("%Y-%m")
    month_dir.mkdir(exist_ok=True)

    archived_count = 0

    for analysis_dir in Path(output_dir).glob("*_experimental_analysis_*"):
        if not analysis_dir.is_dir():
            continue

        mtime = datetime.fromtimestamp(analysis_dir.stat().st_mtime)

        if mtime < cutoff:
            # Create archive filename
            archive_name = month_dir / f"{analysis_dir.name}.tar.gz"

            print(f"Archiving: {analysis_dir}{archive_name}")

            # Create tarball
            with tarfile.open(archive_name, "w:gz") as tar:
                tar.add(analysis_dir, arcname=analysis_dir.name)

            # Remove original
            shutil.rmtree(analysis_dir)
            archived_count += 1

    print(f"\nArchived {archived_count} directories to {month_dir}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Archive old outputs")
    parser.add_argument("output_dir", default="outputs/", help="Output directory")
    parser.add_argument("--older-than", type=int, default=90, help="Age threshold in days")
    parser.add_argument("--archive-dir", default="archives", help="Archive directory")
    args = parser.parse_args()

    archive_outputs(args.output_dir, args.older_than, args.archive_dir)

Step 3.3: Add justfile commands

# Add to justfile under [group('project management')]

# Clean outputs older than specified days (dry run by default)
[group('project management')]
clean-old-outputs DAYS="90" DRY_RUN="--dry-run":
  uv run python scripts/cleanup_old_outputs.py outputs/ --older-than {{DAYS}} {{DRY_RUN}}

# Archive outputs older than specified days
[group('project management')]
archive-outputs DAYS="90":
  uv run python scripts/archive_outputs.py outputs/ --older-than {{DAYS}}

# Remove ephemeral artifacts (*.pkl, *.tmp)
[group('project management')]
clean-ephemeral:
  find outputs/ -name "*.pkl" -delete
  find outputs/ -name "*.tmp" -delete
  find outputs/ -name "debug_*.log" -delete
  echo "✅ Ephemeral artifacts removed"

Verification:

  • Run just clean-old-outputs --dry-run and verify output
  • Run just clean-ephemeral and verify *.pkl files removed
  • Test just archive-outputs on old test data
  • Verify archives created in archives/YYYY-MM/

Acceptance Criteria:

  • ✅ All 3 cleanup scripts implemented
  • ✅ Justfile commands work as documented
  • ✅ Dry-run mode prevents accidental deletion
  • ✅ Archive creation verified

✅ Action 4: Evaluate MCP Server Adoption

Priority: 🟡 MEDIUM Estimated Effort: 1-2 weeks (research) + 1-2 weeks (pilot) + 1-2 months (full migration if approved) Owner: Architecture Team Impact: Low (nice-to-have, not critical)

Description: Research Model Context Protocol (MCP) ecosystem and evaluate whether adoption would benefit MicroGrowAgents.

Tasks:

Phase 1: Research (Week 1)

  • Read MCP specification (https://modelcontextprotocol.io/)
  • Identify available MCP servers for tools we use:
    • PubMed/Entrez API
    • Neo4j/knowledge graphs
    • DuckDB/SQL databases
    • HTTP APIs (GAPMind, TCDB, UniProt)
  • Review MCP adoption in similar projects
  • Document findings in docs/MCP_EVALUATION.md

Phase 2: Pilot Implementation (Week 2-3)

  • Select 1 pilot agent (recommend: LiteratureAgent)
  • Implement MCP server for PubMed/Entrez (or find existing server)
  • Refactor LiteratureAgent to use MCP instead of direct Biopython calls
  • Test pilot against existing test suite
  • Compare code complexity (before/after)

Phase 3: Decision (Week 4)

  • Evaluate pilot results:
    • ✅ Does MCP reduce boilerplate code?
    • ✅ Does MCP improve tool portability?
    • ✅ Does MCP ecosystem have necessary servers?
    • ❌ Does MCP add unnecessary complexity?
    • ❌ Does MCP require significant refactoring effort?
  • Make GO/NO-GO decision
    • GO: Plan full migration (1-2 months)
    • NO-GO: Document rationale and defer

Verification:

  • MCP evaluation report completed
  • Pilot agent implemented (if GO decision)
  • Decision documented with rationale

Acceptance Criteria:

  • ✅ Research findings documented
  • ✅ Pilot implementation tested (if feasible)
  • ✅ Clear GO/NO-GO decision made
  • ✅ Migration plan created (if GO)

✅ Action 5: Add Per-Response Model Tracking

Priority: 🟡 MEDIUM Estimated Effort: 4 hours Owner: Provenance System Maintainer Impact: Low (enhancement to already-passing Criterion 2)

Description: Add model version and parameters to action logs (JSONL) for per-response granularity.

Files to Modify:

  • .claude/provenance/action-types.yaml - Add model_version, temperature fields

Implementation Steps:

Step 5.1: Update action log schema

# .claude/provenance/action-types.yaml
# Add to decision action type:
decision:
  description: "Decision points and rationale (LLM-driven)"
  required_fields:
    - timestamp
    - action_id
    - type
    - tool
    - status
    - model_version        # NEW: e.g., "claude-sonnet-4-5"
    - temperature          # NEW: e.g., 0.7
    - max_tokens          # NEW: e.g., 4096
  optional_fields:
    - details
    - response_tokens
    - prompt_tokens

Step 5.2: Example action log entry

{"timestamp": "2026-02-19T14:30:15.123Z", "action_id": 42, "type": "decision", "tool": "LLM", "model_version": "claude-sonnet-4-5", "temperature": 0.7, "max_tokens": 4096, "status": "success", "details": {"prompt": "Design medium for AM1", "response_tokens": 2341, "prompt_tokens": 1523}, "files_affected": [], "read_only": true}

Verification:

  • Action logs include model_version for LLM decisions
  • Temperature and max_tokens recorded when available
  • Query works: jq 'select(.type == "decision") | .model_version' actions.jsonl

Acceptance Criteria:

  • ✅ Schema updated with model tracking fields
  • ✅ Example action log demonstrates new fields
  • ✅ Documentation updated

Low Priority Actions (Ongoing)

✅ Action 6: Optimize Git Storage for Large Artifacts

Priority: 🟢 LOW Estimated Effort: 2 hours Owner: DevOps/Infrastructure Impact: Low (storage optimization)

Description: Reduce git repository size by excluding outputs/ and using Git LFS for large files.

Files to Modify:

  • .gitignore - Add outputs/ exclusion
  • .gitattributes - Configure Git LFS (new file)

Implementation Steps:

Step 6.1: Update .gitignore

# Add to .gitignore
# Experimental outputs (generated, not source)
outputs/
!outputs/README.md                # Keep documentation
!outputs/recommendations/*.yaml   # Keep final recommendations (small files)

Step 6.2: Configure Git LFS

# .gitattributes
*.pdf filter=lfs diff=lfs merge=lfs -text
*.pkl filter=lfs diff=lfs merge=lfs -text
*.png filter=lfs diff=lfs merge=lfs -text
data/raw/kg_microbe_core/*.tsv filter=lfs diff=lfs merge=lfs -text

Step 6.3: Migrate existing large files to LFS

# One-time migration
git lfs migrate import --include="*.pdf,*.pkl,*.png,data/raw/kg_microbe_core/*.tsv"

Verification:

  • .git/ directory size reduced (check with du -sh .git/)
  • Large files tracked by LFS (git lfs ls-files)
  • Outputs no longer tracked in git (git status clean)

Acceptance Criteria:

  • .git/ directory size reduced by >50MB
  • ✅ Large files managed via LFS
  • ✅ outputs/ excluded from git tracking

✅ Action 7: Create Compliance Monitoring Dashboard

Priority: 🟢 LOW (Nice-to-have) Estimated Effort: 1 day Owner: QA/Automation Team Impact: Low (automation for ongoing monitoring)

Description: Automate audit compliance checking with a script that verifies all 9 criteria.

Files to Create:

  • scripts/audit_compliance.py - Automated compliance checker (new file)
  • .github/workflows/audit.yml - Weekly CI/CD audit (new file)

Implementation Steps:

Step 7.1: Create audit_compliance.py

# scripts/audit_compliance.py
"""Automated compliance audit for all 9 criteria."""
import os
import glob
import json
from pathlib import Path

def check_criterion_1_session_logging():
    """Check if provenance sessions exist and are valid."""
    sessions = glob.glob(".claude/provenance/sessions/*/manifest.yaml")
    if len(sessions) >= 1:
        return {"status": "PASS", "evidence": f"{len(sessions)} sessions found"}
    return {"status": "FAIL", "evidence": "No sessions found"}

def check_criterion_2_model_tracking():
    """Check if model versions are recorded in sessions."""
    sessions = glob.glob(".claude/provenance/sessions/*/manifest.yaml")
    if not sessions:
        return {"status": "FAIL", "evidence": "No sessions to check"}

    # Check if first session has model field
    with open(sessions[0]) as f:
        content = f.read()
        if "model:" in content:
            return {"status": "PASS", "evidence": f"Model tracking found in {sessions[0]}"}
    return {"status": "FAIL", "evidence": "No model tracking found"}

def check_criterion_4_data_hashing():
    """Check if input data hashes are recorded."""
    checksums_exist = os.path.exists("data/checksums.txt")
    if checksums_exist:
        return {"status": "PASS", "evidence": "data/checksums.txt found"}
    return {"status": "FAIL", "evidence": "No checksums.txt file"}

def check_criterion_6_schema_validation():
    """Check if LinkML schemas exist."""
    schemas = glob.glob("src/microgrowagents/schema/*.yaml")
    if len(schemas) >= 5:
        return {"status": "PASS", "evidence": f"{len(schemas)} LinkML schemas found"}
    return {"status": "FAIL", "evidence": f"Only {len(schemas)} schemas found (expected 5+)"}

def check_criterion_9_cleanup_policy():
    """Check if cleanup policy is documented."""
    policy_exists = os.path.exists("docs/ARTIFACT_CLEANUP_POLICY.md")
    if policy_exists:
        return {"status": "PASS", "evidence": "docs/ARTIFACT_CLEANUP_POLICY.md found"}
    return {"status": "FAIL", "evidence": "No cleanup policy documented"}

def generate_audit_report():
    """Generate compliance audit report."""
    results = {
        "criterion_1_session_logging": check_criterion_1_session_logging(),
        "criterion_2_model_tracking": check_criterion_2_model_tracking(),
        "criterion_4_data_hashing": check_criterion_4_data_hashing(),
        "criterion_6_schema_validation": check_criterion_6_schema_validation(),
        "criterion_9_cleanup_policy": check_criterion_9_cleanup_policy(),
        # ... (add all 9 criteria)
    }

    # Generate markdown report
    report = f"# Automated Compliance Audit\n\n"
    report += f"**Date:** {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}\n\n"

    pass_count = sum(1 for r in results.values() if r["status"] == "PASS")
    total_count = len(results)
    compliance_pct = (pass_count / total_count) * 100

    report += f"**Compliance:** {pass_count}/{total_count} ({compliance_pct:.0f}%)\n\n"

    for criterion, result in results.items():
        status_emoji = "✅" if result["status"] == "PASS" else "❌"
        report += f"{status_emoji} **{criterion}:** {result['status']} - {result['evidence']}\n"

    return report

if __name__ == "__main__":
    report = generate_audit_report()
    print(report)

    # Save to file
    with open("docs/COMPLIANCE_REPORT.md", "w") as f:
        f.write(report)
    print("\n✅ Report saved to docs/COMPLIANCE_REPORT.md")

Step 7.2: Add justfile command

# Add to justfile
# Run automated compliance audit
[group('project management')]
audit-compliance:
  uv run python scripts/audit_compliance.py
  cat docs/COMPLIANCE_REPORT.md

Step 7.3: Create CI/CD workflow (optional)

# .github/workflows/audit.yml
name: Weekly Compliance Audit
on:
  schedule:
    - cron: '0 0 * * 0'  # Every Sunday at midnight
jobs:
  audit:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'
      - name: Run audit
        run: |
          pip install uv
          uv sync
          just audit-compliance
      - name: Upload report
        uses: actions/upload-artifact@v3
        with:
          name: compliance-report
          path: docs/COMPLIANCE_REPORT.md

Verification:

  • Run just audit-compliance and verify output
  • Check docs/COMPLIANCE_REPORT.md generated
  • Verify compliance percentage matches manual audit

Acceptance Criteria:

  • ✅ Audit script checks all 9 criteria
  • ✅ Report generated with pass/fail status
  • ✅ CI/CD workflow configured (optional)

Progress Tracking

Action Priority Effort Status Assignee Target Date
1. Input data hashing 🔴 HIGH 1 day ⬜ Not Started Data Engineering Week 1
2. Cleanup policy docs 🔴 HIGH 2 hours ⬜ Not Started DevOps Week 1
3. Cleanup automation 🟡 MEDIUM 4 hours ⬜ Not Started DevOps Week 2
4. MCP evaluation 🟡 MEDIUM 1-2 weeks ⬜ Not Started Architecture Month 1
5. Per-response tracking 🟡 MEDIUM 4 hours ⬜ Not Started Provenance Month 1
6. Git storage optimization 🟢 LOW 2 hours ⬜ Not Started DevOps Month 2
7. Compliance dashboard 🟢 LOW 1 day ⬜ Not Started QA Month 2

Target Compliance: 100% (9/9 PASS) by end of Month 2


Version History

Version Date Changes
1.0 2026-02-19 Initial checklist based on audit report

End of Checklist