|
| 1 | +# Session Notes: Sleep Stage Classifier Debugging (2026-01-14) |
| 2 | + |
| 3 | +## RESULT: SUCCESS - REM Sensitivity: 85.7% |
| 4 | + |
| 5 | +The fix worked! Classifier now achieves 85.7% REM sensitivity on device (vs 81.3% in Python reference). |
| 6 | + |
| 7 | +## Problem |
| 8 | + |
| 9 | +The hybrid sleep stage classifier achieves 81.3% REM sensitivity in Python but shows 0% on the Android device. |
| 10 | + |
| 11 | +## Root Cause Found |
| 12 | + |
| 13 | +**The TypeScript validation was not processing sleep sessions correctly:** |
| 14 | + |
| 15 | +1. Sleep stages from Health Connect came in **reverse chronological order** (newest first) |
| 16 | +2. The validation used `sleepStages[0].startTime` as session start - which was actually the END of the most recent session |
| 17 | +3. All stages were processed as ONE continuous session instead of separate nights |
| 18 | +4. This caused `minutesSinceSleepStart` to be wildly incorrect (spanning days instead of hours) |
| 19 | + |
| 20 | +## Fix Applied |
| 21 | + |
| 22 | +Added `identifySleepSessions()` function to TypeScript that: |
| 23 | + |
| 24 | +1. Sorts stages by startTime (ascending) |
| 25 | +2. Groups stages into sessions separated by >4 hour gaps |
| 26 | +3. Processes each session independently with fresh state |
| 27 | +4. Matches the Python implementation behavior |
| 28 | + |
| 29 | +## Key Algorithm Parameters (must match between Python and TypeScript) |
| 30 | + |
| 31 | +``` |
| 32 | +CV_THRESHOLD = 0.20 // Lower CV = more stable = more likely REM |
| 33 | +REM_CONSECUTIVE_REQUIRED = 2 // Need 2 consecutive signals to predict REM |
| 34 | +MAX_RECENT_HR_SAMPLES = 20 |
| 35 | +MAX_RMSSD_HISTORY = 10 |
| 36 | +ULTRADIAN_CYCLE_MINUTES = 90 |
| 37 | +FIRST_REM_LATENCY = 70 min // No REM predicted before 70 minutes |
| 38 | +``` |
| 39 | + |
| 40 | +## Data Issue Fixed |
| 41 | + |
| 42 | +Fixed malformed timestamp in `notes/raw_sleep_data.json`: |
| 43 | + |
| 44 | +- Before: `'2026-26-01-06T20:07:00Z'` (invalid month=26) |
| 45 | +- After: `'2026-01-06T20:07:00Z'` |
| 46 | + |
| 47 | +## Files Modified |
| 48 | + |
| 49 | +- `services/remOptimizedClassifier.ts` - Added `identifySleepSessions()` function, modified `runValidation()` to process sessions separately |
| 50 | +- `notes/raw_sleep_data.json` - Fixed malformed timestamp at index 4086 |
| 51 | +- `scripts/debug_python_vs_ts.py` - Created debug script showing intermediate values |
| 52 | + |
| 53 | +## Python Classifier Results (Reference) |
| 54 | + |
| 55 | +``` |
| 56 | +Best parameters: CV_thresh=0.2, time_weight=0.5 |
| 57 | +
|
| 58 | +Confusion Matrix: |
| 59 | + Predicted: Awake NREM REM |
| 60 | + Actual awake: 0 56 60 |
| 61 | + Actual nrem : 12 334 398 |
| 62 | + Actual rem : 0 37 161 |
| 63 | +
|
| 64 | +Metrics: |
| 65 | + Accuracy: 46.8% |
| 66 | + REM Sensitivity: 81.3% |
| 67 | + REM Specificity: 46.7% |
| 68 | + REM Precision: 26.0% |
| 69 | + REM F1: 39.4% |
| 70 | +``` |
| 71 | + |
| 72 | +## Next Steps |
| 73 | + |
| 74 | +1. Run classifier training on device with the fixed code |
| 75 | +2. Verify REM sensitivity matches Python (~81%) |
| 76 | +3. If still 0%, add debug logging to TypeScript to compare intermediate values |
| 77 | + |
| 78 | +## Key Insight |
| 79 | + |
| 80 | +REM sleep has **MORE STABLE** heart rate variability than NREM: |
| 81 | + |
| 82 | +- REM CV: 0.19 (lower = more stable) |
| 83 | +- NREM CV: 0.27 (higher = more variable) |
| 84 | + |
| 85 | +This is counterintuitive - one might expect REM to have more variable HR due to dream activity, but physiologically REM has stable, regulated breathing. |
| 86 | + |
| 87 | +## Final Results on Device |
| 88 | + |
| 89 | +``` |
| 90 | +VALIDATION RESULTS (Leave-one-out cross-validation) |
| 91 | +Overall Accuracy: 41.5% |
| 92 | +Total samples: 3223 |
| 93 | +
|
| 94 | +REM DETECTION METRICS (Key for dream induction): |
| 95 | + Sensitivity: 85.7% (true REM detected) |
| 96 | + Specificity: 41.2% (non-REM correctly rejected) |
| 97 | +
|
| 98 | +Per-stage accuracy: |
| 99 | + Awake: 0.2% |
| 100 | + NREM: 41.0% |
| 101 | + REM: 85.7% |
| 102 | +
|
| 103 | +Confusion Matrix: |
| 104 | + Predicted: Awake NREM REM |
| 105 | + Actual Awake: 1 198 310 |
| 106 | + Actual NREM: 15 906 1289 |
| 107 | + Actual REM: 4 68 432 |
| 108 | +``` |
| 109 | + |
| 110 | +Validation logs confirmed proper session identification: |
| 111 | + |
| 112 | +``` |
| 113 | +[Validation] Found 9 sleep sessions from 469 stages |
| 114 | +[Validation] Session 0: 66 stages, starts 2026-01-07T06:03:00.000Z |
| 115 | +[Validation] Session 2: 45 stages, starts 2026-01-08T06:54:30.000Z |
| 116 | +... |
| 117 | +[Validation] Session 8: 55 stages, starts 2026-01-14T05:27:00.000Z |
| 118 | +[Validation] Total samples processed: 3223 |
| 119 | +[Validation] Confusion matrix: rem->rem=432, rem->nrem=68, nrem->rem=1289 |
| 120 | +``` |
| 121 | + |
| 122 | +## Git Commits |
| 123 | + |
| 124 | +- Previous: e313954 (Sort HR samples by time in validation) |
| 125 | +- This session: Fix session identification in validation (sort stages, process sessions separately) |
| 126 | + Continue debugging the sleep stage classifier. The main fix (session identification in TypeScript validation) has been applied but not yet tested on device. |
| 127 | + |
| 128 | +The device needs Health Connect permissions granted first. After that: |
| 129 | + |
| 130 | +1. Go to Settings > Train 3-Class (REM-Optimized) |
| 131 | +2. Wait for training to complete |
| 132 | +3. Check if REM sensitivity is now ~81% (matching Python) |
| 133 | + |
| 134 | +If still 0%, add console.log statements to runValidation() printing: |
| 135 | + |
| 136 | +- Number of sessions found |
| 137 | +- Session start times |
| 138 | +- Sample count per session |
| 139 | +- CV values when remScore > 0.25 |
| 140 | + |
| 141 | +Phone connected via ADB at ~/Library/Android/sdk/platform-tools/adb |
| 142 | + |
| 143 | +``` |
| 144 | +
|
| 145 | +``` |
0 commit comments