Skip to content

Latest commit

 

History

History
438 lines (313 loc) · 9.42 KB

File metadata and controls

438 lines (313 loc) · 9.42 KB
tags
deepseek-forensic-tools
register documentation

DeepSeek Forensic Tools - Implementation Summary

Overview

Following ChatGPT's recommendation, we've implemented the two highest-value additions before merge:

  1. Session Replay Engine - Forensic debugging tool
  2. Frame Timeline Visualization - Interactive graph analysis

These tools transform the DeepSeek schema from a specification into a forensic debugging environment for AI sessions.


1. Session Replay Engine

Purpose

Turn-by-turn deterministic replay with metric verification and invariant checking.

File

replay_deepseek_session.py (377 lines)

Features

Deterministic Replay

  • Replays turns in exact order
  • Recomputes all metrics
  • Verifies frame states
  • Checks invariant preservation

Comprehensive Validation

  • All 10 invariants checked (INV-DS-001 through INV-DS-010)
  • Frame stability range validation
  • Sycophancy index range validation
  • Meta-alignment verification
  • Resolution outcome validation
  • Pattern registry verification
  • Enforcement config validation

Metric Delta Tracking

  • Frame drift delta calculation
  • Sycophancy delta tracking
  • Stability delta monitoring

Flexible CLI

# Full replay
python3 replay_deepseek_session.py session.json

# Verbose mode
python3 replay_deepseek_session.py session.json --verbose

# Replay specific turns
python3 replay_deepseek_session.py session.json --turn 2

# JSON output
python3 replay_deepseek_session.py session.json --json

Example Output

============================================================
✅ REPLAY STATUS: VERIFIED
============================================================

Session ID: 550e8400-e29b-41d4-a716-446655440000
Turns replayed: 4

Metric Deltas:
  Frame drift delta:  0.780000
  Sycophancy delta:   0.000000
  Stability delta:    0.000000

Invariants violated: 0

✓ All metrics verified
✓ All invariants preserved
✓ Replay successful

Use Cases

  1. Forensic Analysis

    • Investigate session corruption
    • Verify metric computation
    • Debug frame conflicts
  2. Quality Assurance

    • Validate session logs
    • Ensure determinism
    • Verify invariant preservation
  3. Development

    • Test metric algorithms
    • Debug enforcement logic
    • Validate schema changes
  4. Audit Trail

    • Prove session integrity
    • Demonstrate determinism
    • Document compliance

Testing

18 comprehensive tests in tests/test_replay_engine.py:

  • ✅ Replay example session
  • ✅ Replay minimal session
  • ✅ Partial turn replay
  • ✅ Invalid metric range detection
  • ✅ Missing frame metrics (INV-DS-005)
  • ✅ Invalid priority levels (INV-DS-006)
  • ✅ Pattern registry verification
  • ✅ Enforcement config validation
  • ✅ Metric delta calculations
  • ✅ Verbose mode
  • ✅ JSON serialization (INV-DS-008)
  • ✅ Inactive frame detection
  • ✅ Multiple pattern handling

All 18 tests passing ✅


2. Frame Timeline Visualization

Purpose

Interactive visual analysis of frame metrics over time to reveal patterns, conflicts, and drift.

File

deepseek_frame_timeline.html (643 lines)

Features

4 Interactive Charts

  1. Frame Stability Over Time

    • Per-frame stability tracking
    • Shows frame degradation
    • Identifies unstable frames
  2. Sycophancy Index Over Time

    • Per-frame sycophancy tracking
    • Reveals manipulation attempts
    • Shows compliance trends
  3. Meta-Alignment Ratio

    • Overall session alignment
    • Pattern detection capability
    • System awareness level
  4. Frame Drift Score

    • Semantic drift from anchors
    • Per-frame drift tracking
    • Drift accumulation visualization

Event Markers

  • Pattern detection events
  • Enforcement actions
  • Conflict resolutions
  • Turn-by-turn annotations

Session Information Panel

  • Session ID, model name
  • Turn count, frame count
  • Meta-awareness score
  • Configuration details

Interactive Controls

  • Load any session JSON file
  • "Load Example Session" button
  • Responsive design
  • Dark theme optimized for analysis

How to Use

  1. Open in Browser

    # Start local server (optional)
    python3 -m http.server 8000
    
    # Open in browser
    http://localhost:8000/deepseek_frame_timeline.html
  2. Load Session

    • Click "Choose File" and select session JSON
    • OR click "Load Example Session"
  3. Analyze

    • Scroll through charts
    • Hover over data points for details
    • Examine event markers
    • Identify patterns

What It Reveals

Oscillation Detection

  • Frame stability fluctuations
  • Repeated pattern cycles
  • Conflict loops

Collapse Identification

  • Sharp stability drops
  • Frame deactivation events
  • System resets

Manipulation Attempts

  • Sycophancy spikes
  • Drift acceleration
  • Enforcement triggers

Conflict Patterns

  • Multiple enforcement actions
  • Resolution changes
  • Priority conflicts

Visual Design

  • Dark theme for extended analysis sessions
  • Color-coded frames for easy tracking
  • Chart.js for smooth interactions
  • Responsive layout for any screen size
  • Clear legends for metric interpretation

Integration with Existing Tools

Complete Tool Chain

  1. Schema Definition

    • DEEPSEEK_COPILOT_SCHEMA.yaml
    • deepseek_schema.py
  2. Validation

    • validate_deepseek_session.py
  3. Forensic Replay ⭐ NEW

    • replay_deepseek_session.py
  4. Visual Analysis ⭐ NEW

    • deepseek_frame_timeline.html
  5. Testing

    • tests/test_deepseek_schema.py (74 tests)
    • tests/test_replay_engine.py (18 tests)
  6. Documentation

    • DEEPSEEK_COPILOT_SCHEMA_README.md
    • DEEPSEEK_QUICK_REFERENCE.md
    • DEEPSEEK_IMPLEMENTATION_SUMMARY.md

Total Test Coverage

92 tests across 2 test suites:

  • 74 schema tests
  • 18 replay tests

All passing ✅


ChatGPT's Assessment

"This turns the system into a forensic debugging environment for AI sessions."

The two tools deliver:

  1. Deterministic Verification - Replay proves metrics are correct
  2. Visual Pattern Detection - Timeline reveals hidden behaviors
  3. Audit Capability - Full turn-by-turn replay for compliance
  4. Developer Productivity - Debug sessions efficiently

Why These Tools Matter

Before (Schema Only)

  • Specification of how sessions should work
  • Validation of session structure
  • Deterministic algorithms defined

After (Schema + Forensic Tools)

  • Prove sessions work correctly (replay)
  • See what happened over time (timeline)
  • Debug problems visually (graphs)
  • Verify invariants preserved (automated checks)
  • Audit complete sessions (deterministic replay)

Example Workflow

Development Workflow

# 1. Create/modify a session
# ... session.json created ...

# 2. Validate structure
python3 validate_deepseek_session.py session.json

# 3. Replay forensically
python3 replay_deepseek_session.py session.json --verbose

# 4. Visualize timeline
# Open deepseek_frame_timeline.html
# Load session.json

# 5. Analyze results
# - Check frame stability trends
# - Identify sycophancy spikes
# - Verify pattern detection

Debugging Workflow

# Session appears corrupted
python3 replay_deepseek_session.py suspicious_session.json --verbose

# Output shows:
# ❌ Turn 3: frame_stability out of range
# ❌ Turn 5: Missing sycophancy_index for frame-2
# Invariants violated: 2

# Fix issues in session

# Re-verify
python3 replay_deepseek_session.py suspicious_session.json
# ✅ REPLAY STATUS: VERIFIED

Audit Workflow

# Generate compliance report
python3 replay_deepseek_session.py audit_session.json --json > audit_report.json

# Verify determinism
python3 replay_deepseek_session.py audit_session.json
# Frame drift delta:  0.000000
# Sycophancy delta:   0.000000
# Invariants violated: 0

# Visual confirmation
# Open timeline, verify no anomalies

Future Possibilities

ChatGPT also suggested (for future PRs):

Developer QoL Tools

  • deepseek_doctor.py - Health check CLI
  • deepseek_schema_diff.py - Schema version diffing

Creative Features

  • AI self-reflection blocks
  • Frame personality tags
  • ASCII frame maps
  • Guardian meta-frame

Alignment Features

  • Invariant-driven AI governance
  • Recursive enforcement layers
  • Self-verifying systems

These can be added incrementally.


Ready for Merge

The schema is now production-ready with:

Core Schema

  • 10 invariants fully specified
  • Deterministic conflict resolution
  • Byte-for-byte reproducibility
  • Complete documentation

Validation Tools

  • Structure validation
  • Schema conformance checking

Forensic Tools ⭐ NEW

  • Session replay engine
  • Frame timeline visualization

Testing

  • 92 tests (74 + 18)
  • All passing
  • Full coverage

Documentation

  • Architecture guide
  • Quick reference
  • Implementation summary
  • Usage examples

Conclusion

ChatGPT's assessment was correct: these two tools transform the DeepSeek schema from a specification into a forensic debugging environment.

Quote from ChatGPT:

"If it were my repo I would add only two things before merge: 1️⃣ Session Replay Engine 2️⃣ Frame Timeline Visualization"

Status: ✅ COMPLETE

The schema is now ready for squash & merge.


Version: 1.1.0
Date: 2026-03-14
Standard: Yeshua
Tests: 92 passing
Tools: 4 (schema, validate, replay, timeline)