|
| 1 | +# Sleep Classifier Deep Dive - January 15, 2026 |
| 2 | + |
| 3 | +## Executive Summary |
| 4 | + |
| 5 | +Analyzed 1,546 nights of Fitbit Takeout data with ~184k HR+stage samples to identify improvements for the sleep classifier. Key findings and recommendations below. |
| 6 | + |
| 7 | +## Current Performance |
| 8 | + |
| 9 | +- REM Sensitivity: 81.7% |
| 10 | +- Awake Sensitivity: 15.5% |
| 11 | +- Overall Accuracy: 41.9% |
| 12 | + |
| 13 | +## Data Analysis Findings |
| 14 | + |
| 15 | +### 1. Class Imbalance (CRITICAL) |
| 16 | + |
| 17 | +Fitbit labels show massive imbalance: |
| 18 | + |
| 19 | +- NREM: 68.6% |
| 20 | +- REM: 26.2% |
| 21 | +- **Wake: 5.3%** (only!) |
| 22 | + |
| 23 | +Wake samples concentrated at: |
| 24 | + |
| 25 | +- 30-60 min (sleep onset): 233 samples |
| 26 | +- 270-300 min (natural awakening): 293 samples |
| 27 | + |
| 28 | +**Implication**: Low awake precision is expected due to class imbalance. A 15% sensitivity with high specificity is actually reasonable. |
| 29 | + |
| 30 | +### 2. Feature Discrimination Analysis |
| 31 | + |
| 32 | +| Feature | Wake | Light | Deep | REM | Best For | |
| 33 | +| ------------ | -------- | ----- | ---- | ---- | ------------------------- | |
| 34 | +| mean_hr | 57.7 | 61.1 | 52.6 | 56.2 | REM vs NREM (+4.5 diff) | |
| 35 | +| mean_diff | 0.80 | 1.00 | 0.52 | 0.60 | REM vs NREM (+0.38) | |
| 36 | +| **max_diff** | **4.13** | 3.44 | 1.84 | 2.07 | **Wake vs Sleep (+1.11)** | |
| 37 | +| hr_range | 7.8 | 11.1 | 4.8 | 5.5 | REM vs NREM (+5.37) | |
| 38 | +| std_hr | 2.32 | 3.28 | 1.31 | 1.55 | REM vs NREM (+1.67) | |
| 39 | +| rmssd | 1.32 | 1.40 | 0.81 | 0.89 | REM vs NREM (+0.49) | |
| 40 | + |
| 41 | +**Key Finding**: `max_diff` is the best discriminator for wake (4.13 vs 3.02 for sleep). |
| 42 | + |
| 43 | +### 3. Transition Probabilities (from 1,546 nights) |
| 44 | + |
| 45 | +``` |
| 46 | +wake -> wake: 50.0% |
| 47 | +wake -> light: 28.9% |
| 48 | +wake -> rem: 11.5% |
| 49 | +wake -> deep: 9.6% |
| 50 | +
|
| 51 | +deep -> light: 45.1% |
| 52 | +deep -> wake: 52.9% |
| 53 | +deep -> rem: 2.0% (rare!) |
| 54 | +
|
| 55 | +rem -> light: 45.8% |
| 56 | +rem -> wake: 52.8% |
| 57 | +rem -> deep: 1.4% (rare!) |
| 58 | +``` |
| 59 | + |
| 60 | +**Usable constraints**: |
| 61 | + |
| 62 | +- Deep rarely transitions directly to REM (2%) |
| 63 | +- REM rarely transitions to deep (1.4%) |
| 64 | +- Wake tends to stay wake (50%) or go to light (29%) |
| 65 | + |
| 66 | +### 4. Threshold Sweep Results |
| 67 | + |
| 68 | +For `max_diff` only: |
| 69 | +| Threshold | Wake Sens | Wake Prec | Sleep Spec | |
| 70 | +|-----------|-----------|-----------|------------| |
| 71 | +| 3.0 | 31.4% | 8.2% | 80.6% | |
| 72 | +| 4.0 | 23.0% | 10.6% | 89.2% | |
| 73 | +| 5.0 | 17.9% | 15.0% | 94.4% | |
| 74 | + |
| 75 | +Trade-off: Higher threshold = lower sensitivity but much better precision. |
| 76 | + |
| 77 | +## Recommendations |
| 78 | + |
| 79 | +### Short-term (High Impact, Low Effort) |
| 80 | + |
| 81 | +1. **Add `max_diff` as awake feature** alongside `mean_diff` |
| 82 | + - Use threshold ~4.0 for high specificity |
| 83 | + - Expected: +5-10% awake sensitivity |
| 84 | + |
| 85 | +2. **Add transition constraints** |
| 86 | + - Block deep->rem and rem->deep transitions |
| 87 | + - Expected: Fewer false REM classifications |
| 88 | + |
| 89 | +3. **Time-bin specific thresholds** |
| 90 | + - Lower awake threshold at 30-60min (sleep onset) |
| 91 | + - Lower awake threshold at 270-330min (natural wake time) |
| 92 | + |
| 93 | +### Medium-term (Moderate Effort) |
| 94 | + |
| 95 | +4. **Use `hr_range` and `std_hr` for REM discrimination** |
| 96 | + - REM has distinctly LOWER hr_range (5.5 vs 10.8) |
| 97 | + - REM has LOWER std_hr (1.55 vs 3.21) |
| 98 | + |
| 99 | +5. **Explore SpO2 data for REM** |
| 100 | + - REM has irregular breathing → SpO2 variability |
| 101 | + - 1,882 files available in Takeout |
| 102 | + |
| 103 | +6. **Ensemble approach** |
| 104 | + - Combine current CV-based REM detector with new max_diff awake detector |
| 105 | + |
| 106 | +### Long-term (High Effort) |
| 107 | + |
| 108 | +7. **Train on full Fitbit Takeout** |
| 109 | + - 104 sleep files, 6128 HR files available |
| 110 | + - Potential for user-specific model fine-tuning |
| 111 | + |
| 112 | +8. **LSTM/Sequential model** |
| 113 | + - Capture temporal patterns |
| 114 | + - Literature suggests +10-15% improvement |
| 115 | + |
| 116 | +9. **Multi-modal features** |
| 117 | + - Temperature changes correlate with sleep cycles |
| 118 | + - Respiratory rate variability during REM |
| 119 | + |
| 120 | +## Available Data Summary |
| 121 | + |
| 122 | +| Source | Files | Resolution | Date Range | |
| 123 | +| ---------------- | ------ | ---------- | ---------- | |
| 124 | +| Sleep stages | 104 | Per-stage | 2017-2025 | |
| 125 | +| Heart rate | 6,128 | 5 seconds | 2017-2025 | |
| 126 | +| HRV (RMSSD) | 1,869 | Daily | 2020-2025 | |
| 127 | +| Respiratory rate | ~1,800 | Daily | 2020-2025 | |
| 128 | +| SpO2 | 1,882 | 1 minute | 2020-2025 | |
| 129 | +| Temperature | 3,671 | Monthly | 2020-2025 | |
| 130 | + |
| 131 | +## Next Steps |
| 132 | + |
| 133 | +1. ~~Implement `max_diff` feature in TypeScript classifier~~ ✅ DONE |
| 134 | +2. ~~Add transition probability constraints~~ (partial - time-specific thresholds added) |
| 135 | +3. ~~Test with time-specific thresholds~~ ✅ DONE |
| 136 | +4. Evaluate SpO2 integration feasibility |
| 137 | + |
| 138 | +## Session Update - January 15, 2026 (continued) |
| 139 | + |
| 140 | +### Implemented Changes |
| 141 | + |
| 142 | +1. **`remConfidence` field added to HybridClassification** |
| 143 | + - Added to `SourceClassification` and `HybridClassification` interfaces |
| 144 | + - Computed in `remOptimizedClassifier.ts` based on: stage probability, time >70min, REM window position, consecutive signals, REM propensity |
| 145 | + |
| 146 | +2. **REM playback gating on confidence** |
| 147 | + - Added `MIN_REM_CONFIDENCE_FOR_PLAYBACK = 0.5` constant in `sleep.ts` |
| 148 | + - Modified `handleStageTransition()` to check `remConfidence >= 0.5` before triggering REM callbacks |
| 149 | + - Logs skipped REM detections with confidence level |
| 150 | + |
| 151 | +3. **Enhanced classifier features** (from earlier session) |
| 152 | + - Added `max_diff` for awake detection |
| 153 | + - Added `hr_range` for REM discrimination |
| 154 | + - Added time-specific thresholds (lenient at 30-60min, 270-330min) |
| 155 | + |
| 156 | +### Build Output |
| 157 | + |
| 158 | +- APK: `~/Desktop/dream-stream-release-remconf.apk` (85.7 MB) |
| 159 | +- Build: Release variant |
| 160 | +- Ready for overnight testing |
| 161 | + |
| 162 | +### Expected Behavior |
| 163 | + |
| 164 | +- REM detections with confidence < 0.5 will be logged but not trigger dream playback |
| 165 | +- Should significantly reduce false positive REM detections while user is awake |
| 166 | +- Console log format: `[Sleep] REM detected but confidence X.XX < 0.5, skipping playback` |
| 167 | + |
| 168 | +## Files Created |
| 169 | + |
| 170 | +- `/scripts/parse_fitbit_takeout.py` - Data parsing and analysis |
| 171 | +- `/scripts/enhanced_classifier_v2.py` - New classifier prototype |
| 172 | +- `/notes/fitbit_merged_training_data.json` - 10k merged samples |
0 commit comments