Skip to content

Commit 4e18d2c

Browse files
committed
Add REM confidence gating to prevent false positive dream playback
- Add remConfidence field to ClassificationResult3, SourceClassification, HybridClassification - Compute remConfidence based on: REM probability, time >70min, REM window, consecutive signals - Gate REM callbacks on MIN_REM_CONFIDENCE_FOR_PLAYBACK (0.5) - Log skipped low-confidence REM detections for debugging - Remove personal Fitbit data from git history and add to .gitignore
1 parent 7835f88 commit 4e18d2c

5 files changed

Lines changed: 212 additions & 4 deletions

File tree

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -279,3 +279,4 @@ public/audio/*/*.mp3
279279
credentials.json
280280
notes/screenshots/*.png
281281
hardcoded_findings.txt
282+
notes/fitbit_merged_training_data.json
Lines changed: 172 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
# Sleep Classifier Deep Dive - January 15, 2026
2+
3+
## Executive Summary
4+
5+
Analyzed 1,546 nights of Fitbit Takeout data with ~184k HR+stage samples to identify improvements for the sleep classifier. Key findings and recommendations below.
6+
7+
## Current Performance
8+
9+
- REM Sensitivity: 81.7%
10+
- Awake Sensitivity: 15.5%
11+
- Overall Accuracy: 41.9%
12+
13+
## Data Analysis Findings
14+
15+
### 1. Class Imbalance (CRITICAL)
16+
17+
Fitbit labels show massive imbalance:
18+
19+
- NREM: 68.6%
20+
- REM: 26.2%
21+
- **Wake: 5.3%** (only!)
22+
23+
Wake samples concentrated at:
24+
25+
- 30-60 min (sleep onset): 233 samples
26+
- 270-300 min (natural awakening): 293 samples
27+
28+
**Implication**: Low awake precision is expected due to class imbalance. A 15% sensitivity with high specificity is actually reasonable.
29+
30+
### 2. Feature Discrimination Analysis
31+
32+
| Feature | Wake | Light | Deep | REM | Best For |
33+
| ------------ | -------- | ----- | ---- | ---- | ------------------------- |
34+
| mean_hr | 57.7 | 61.1 | 52.6 | 56.2 | REM vs NREM (+4.5 diff) |
35+
| mean_diff | 0.80 | 1.00 | 0.52 | 0.60 | REM vs NREM (+0.38) |
36+
| **max_diff** | **4.13** | 3.44 | 1.84 | 2.07 | **Wake vs Sleep (+1.11)** |
37+
| hr_range | 7.8 | 11.1 | 4.8 | 5.5 | REM vs NREM (+5.37) |
38+
| std_hr | 2.32 | 3.28 | 1.31 | 1.55 | REM vs NREM (+1.67) |
39+
| rmssd | 1.32 | 1.40 | 0.81 | 0.89 | REM vs NREM (+0.49) |
40+
41+
**Key Finding**: `max_diff` is the best discriminator for wake (4.13 vs 3.02 for sleep).
42+
43+
### 3. Transition Probabilities (from 1,546 nights)
44+
45+
```
46+
wake -> wake: 50.0%
47+
wake -> light: 28.9%
48+
wake -> rem: 11.5%
49+
wake -> deep: 9.6%
50+
51+
deep -> light: 45.1%
52+
deep -> wake: 52.9%
53+
deep -> rem: 2.0% (rare!)
54+
55+
rem -> light: 45.8%
56+
rem -> wake: 52.8%
57+
rem -> deep: 1.4% (rare!)
58+
```
59+
60+
**Usable constraints**:
61+
62+
- Deep rarely transitions directly to REM (2%)
63+
- REM rarely transitions to deep (1.4%)
64+
- Wake tends to stay wake (50%) or go to light (29%)
65+
66+
### 4. Threshold Sweep Results
67+
68+
For `max_diff` only:
69+
| Threshold | Wake Sens | Wake Prec | Sleep Spec |
70+
|-----------|-----------|-----------|------------|
71+
| 3.0 | 31.4% | 8.2% | 80.6% |
72+
| 4.0 | 23.0% | 10.6% | 89.2% |
73+
| 5.0 | 17.9% | 15.0% | 94.4% |
74+
75+
Trade-off: Higher threshold = lower sensitivity but much better precision.
76+
77+
## Recommendations
78+
79+
### Short-term (High Impact, Low Effort)
80+
81+
1. **Add `max_diff` as awake feature** alongside `mean_diff`
82+
- Use threshold ~4.0 for high specificity
83+
- Expected: +5-10% awake sensitivity
84+
85+
2. **Add transition constraints**
86+
- Block deep->rem and rem->deep transitions
87+
- Expected: Fewer false REM classifications
88+
89+
3. **Time-bin specific thresholds**
90+
- Lower awake threshold at 30-60min (sleep onset)
91+
- Lower awake threshold at 270-330min (natural wake time)
92+
93+
### Medium-term (Moderate Effort)
94+
95+
4. **Use `hr_range` and `std_hr` for REM discrimination**
96+
- REM has distinctly LOWER hr_range (5.5 vs 10.8)
97+
- REM has LOWER std_hr (1.55 vs 3.21)
98+
99+
5. **Explore SpO2 data for REM**
100+
- REM has irregular breathing → SpO2 variability
101+
- 1,882 files available in Takeout
102+
103+
6. **Ensemble approach**
104+
- Combine current CV-based REM detector with new max_diff awake detector
105+
106+
### Long-term (High Effort)
107+
108+
7. **Train on full Fitbit Takeout**
109+
- 104 sleep files, 6128 HR files available
110+
- Potential for user-specific model fine-tuning
111+
112+
8. **LSTM/Sequential model**
113+
- Capture temporal patterns
114+
- Literature suggests +10-15% improvement
115+
116+
9. **Multi-modal features**
117+
- Temperature changes correlate with sleep cycles
118+
- Respiratory rate variability during REM
119+
120+
## Available Data Summary
121+
122+
| Source | Files | Resolution | Date Range |
123+
| ---------------- | ------ | ---------- | ---------- |
124+
| Sleep stages | 104 | Per-stage | 2017-2025 |
125+
| Heart rate | 6,128 | 5 seconds | 2017-2025 |
126+
| HRV (RMSSD) | 1,869 | Daily | 2020-2025 |
127+
| Respiratory rate | ~1,800 | Daily | 2020-2025 |
128+
| SpO2 | 1,882 | 1 minute | 2020-2025 |
129+
| Temperature | 3,671 | Monthly | 2020-2025 |
130+
131+
## Next Steps
132+
133+
1. ~~Implement `max_diff` feature in TypeScript classifier~~ ✅ DONE
134+
2. ~~Add transition probability constraints~~ (partial - time-specific thresholds added)
135+
3. ~~Test with time-specific thresholds~~ ✅ DONE
136+
4. Evaluate SpO2 integration feasibility
137+
138+
## Session Update - January 15, 2026 (continued)
139+
140+
### Implemented Changes
141+
142+
1. **`remConfidence` field added to HybridClassification**
143+
- Added to `SourceClassification` and `HybridClassification` interfaces
144+
- Computed in `remOptimizedClassifier.ts` based on: stage probability, time >70min, REM window position, consecutive signals, REM propensity
145+
146+
2. **REM playback gating on confidence**
147+
- Added `MIN_REM_CONFIDENCE_FOR_PLAYBACK = 0.5` constant in `sleep.ts`
148+
- Modified `handleStageTransition()` to check `remConfidence >= 0.5` before triggering REM callbacks
149+
- Logs skipped REM detections with confidence level
150+
151+
3. **Enhanced classifier features** (from earlier session)
152+
- Added `max_diff` for awake detection
153+
- Added `hr_range` for REM discrimination
154+
- Added time-specific thresholds (lenient at 30-60min, 270-330min)
155+
156+
### Build Output
157+
158+
- APK: `~/Desktop/dream-stream-release-remconf.apk` (85.7 MB)
159+
- Build: Release variant
160+
- Ready for overnight testing
161+
162+
### Expected Behavior
163+
164+
- REM detections with confidence < 0.5 will be logged but not trigger dream playback
165+
- Should significantly reduce false positive REM detections while user is awake
166+
- Console log format: `[Sleep] REM detected but confidence X.XX < 0.5, skipping playback`
167+
168+
## Files Created
169+
170+
- `/scripts/parse_fitbit_takeout.py` - Data parsing and analysis
171+
- `/scripts/enhanced_classifier_v2.py` - New classifier prototype
172+
- `/notes/fitbit_merged_training_data.json` - 10k merged samples

services/hybridClassifier.ts

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ export interface StageProbabilities {
2020
export interface SourceClassification {
2121
probabilities: StageProbabilities;
2222
confidence: number;
23+
remConfidence: number;
2324
available: boolean;
2425
}
2526

@@ -29,6 +30,7 @@ export interface HybridClassification {
2930
fused: StageProbabilities;
3031
predictedStage: SleepStage;
3132
overallConfidence: number;
33+
remConfidence: number;
3234
}
3335

3436
type ClassifiableStage = 'awake' | 'light' | 'deep' | 'rem';
@@ -87,6 +89,7 @@ function classifyFromAudio(analysis: BreathingAnalysis | null): SourceClassifica
8789
return {
8890
probabilities: { awake: 0.25, light: 0.25, deep: 0.25, rem: 0.25 },
8991
confidence: 0,
92+
remConfidence: 0,
9093
available: false,
9194
};
9295
}
@@ -143,6 +146,7 @@ function classifyFromAudio(analysis: BreathingAnalysis | null): SourceClassifica
143146
return {
144147
probabilities: normalizeProbabilities(probs),
145148
confidence: Math.min(1, confidence),
149+
remConfidence: 0,
146150
available: true,
147151
};
148152
}
@@ -152,6 +156,7 @@ function classifyFromVitals(vitals: VitalsSnapshot | null): SourceClassification
152156
return {
153157
probabilities: { awake: 0.25, light: 0.25, deep: 0.25, rem: 0.25 },
154158
confidence: 0,
159+
remConfidence: 0,
155160
available: false,
156161
};
157162
}
@@ -170,6 +175,7 @@ function classifyFromVitals(vitals: VitalsSnapshot | null): SourceClassification
170175
return {
171176
probabilities: normalizeProbabilities(probs),
172177
confidence: result.confidence,
178+
remConfidence: result.remConfidence,
173179
available: true,
174180
};
175181
}
@@ -256,6 +262,7 @@ export async function classifyHybrid(
256262
fused,
257263
predictedStage,
258264
overallConfidence,
265+
remConfidence: vitalsResult.remConfidence,
259266
};
260267
}
261268

services/remOptimizedClassifier.ts

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -76,6 +76,7 @@ export interface ClassificationResult3 {
7676
stage: SleepStage3;
7777
stage4: SleepStage; // Mapped back to 4-class for compatibility
7878
confidence: number;
79+
remConfidence: number;
7980
probabilities: Stage3Probabilities;
8081
temporalFeatures: TemporalFeatures;
8182
dataSource: 'vitals' | 'audio' | 'prediction' | 'hybrid';
@@ -1398,10 +1399,25 @@ export function classifyRemOptimized(
13981399
previousStage = stage;
13991400
previousProbabilities = probabilities;
14001401

1402+
// Compute REM-specific confidence for playback gating
1403+
let remConfidence = 0;
1404+
if (stage === 'rem') {
1405+
const baseRemConf = probabilities.rem;
1406+
const timeBonus = temporal.minutesSinceSleepStart > 70 ? 0.15 : 0;
1407+
const windowBonus = temporal.isInRemWindow ? 0.1 : 0;
1408+
const consecutiveBonus = Math.min(0.2, consecutiveRemSignals * 0.05);
1409+
const propensityBonus = temporal.remPropensity * 0.1;
1410+
remConfidence = Math.min(
1411+
1.0,
1412+
baseRemConf + timeBonus + windowBonus + consecutiveBonus + propensityBonus
1413+
);
1414+
}
1415+
14011416
return {
14021417
stage,
14031418
stage4: to4Class(stage),
14041419
confidence,
1420+
remConfidence,
14051421
probabilities,
14061422
temporalFeatures: temporal,
14071423
dataSource,

services/sleep.ts

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -828,15 +828,27 @@ function inferSleepStage(analysis: BreathingAnalysis): SleepStage {
828828
return 'light';
829829
}
830830

831-
function handleStageTransition(previousStage: SleepStage, newStage: SleepStage): void {
831+
const MIN_REM_CONFIDENCE_FOR_PLAYBACK = 0.5;
832+
833+
function handleStageTransition(
834+
previousStage: SleepStage,
835+
newStage: SleepStage,
836+
remConfidence: number
837+
): void {
832838
const now = Date.now();
833839
stageHistory.push({ stage: newStage, timestamp: now });
834840
stageHistory = stageHistory.filter((s) => now - s.timestamp < 3600000);
835841

836842
notifyStageHistoryChange();
837843

838844
if (newStage === 'rem' && previousStage !== 'rem') {
839-
remCallbacks.forEach((cb) => cb());
845+
if (remConfidence >= MIN_REM_CONFIDENCE_FOR_PLAYBACK) {
846+
remCallbacks.forEach((cb) => cb());
847+
} else {
848+
console.log(
849+
`[Sleep] REM detected but confidence ${remConfidence.toFixed(2)} < ${MIN_REM_CONFIDENCE_FOR_PLAYBACK}, skipping playback`
850+
);
851+
}
840852
}
841853

842854
if (previousStage === 'rem' && newStage !== 'rem') {
@@ -853,7 +865,7 @@ async function updateStageFromAudio(analysis: BreathingAnalysis): Promise<void>
853865
if (hybrid.predictedStage !== currentSession.currentStage) {
854866
const previousStage = currentSession.currentStage;
855867
updateSleepStage(hybrid.predictedStage);
856-
handleStageTransition(previousStage, hybrid.predictedStage);
868+
handleStageTransition(previousStage, hybrid.predictedStage, hybrid.remConfidence);
857869
}
858870
}
859871

@@ -988,7 +1000,7 @@ export async function processVitalsUpdate(vitals: VitalsSnapshot): Promise<void>
9881000
if (hybrid.predictedStage !== currentSession.currentStage) {
9891001
const previousStage = currentSession.currentStage;
9901002
updateSleepStage(hybrid.predictedStage);
991-
handleStageTransition(previousStage, hybrid.predictedStage);
1003+
handleStageTransition(previousStage, hybrid.predictedStage, hybrid.remConfidence);
9921004
}
9931005
}
9941006
}

0 commit comments

Comments
 (0)