Skip to content

Commit 6ea8a93

Browse files
committed
docs(spec): MacDive SQLite ZSAMPLES profile decoding design
Follow-up to the MacDive SQLite importer (PR #256) which emits profile: [] for every dive. Spec proposes a two-phase approach: investigation spike with a go/no-go gate, then decoder implementation if the spike confirms feasibility. Stacks on feature/macdive-sqlite.
1 parent abba5a1 commit 6ea8a93

1 file changed

Lines changed: 258 additions & 0 deletions

File tree

Lines changed: 258 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,258 @@
1+
# MacDive SQLite Profile Decoding — Design
2+
3+
**Status:** Draft
4+
**Author:** Eric Griffin
5+
**Created:** 2026-04-23
6+
**Context:** Continuation of the MacDive Import Robustness work (`docs/superpowers/specs/2026-04-21-macdive-import-design.md`). Milestone 3 of that plan (PR #256, `feature/macdive-sqlite`) deferred decoding of `ZDIVE.ZSAMPLES` — MacDive's proprietary profile-sample BLOB. This spec covers the follow-up that closes the gap.
7+
8+
## Problem
9+
10+
PR #256 lands MacDive SQLite import with all dive metadata (tags, critters, gear, events, tanks, gases, sites, buddies) but emits `profile: []` for every dive. Users importing their MacDive database currently cannot see time-series profile data (depth-over-time, temperature, tank pressure, ppO2, NDL) unless they also export UDDF separately and run a second import. For the ~65% of dives in a typical MacDive database that have `ZSAMPLES` data, we need the SQLite path to produce the same profile output the UDDF importer already produces.
11+
12+
The existing metadata-only path was a deliberate scope decision in PR #256: MacDive's `ZSAMPLES` format isn't bplist, and the initial probing (zlib / gzip / lzma at offsets 0/4/8/12) didn't yield a known wrapper. Decoding the format requires focused reverse-engineering work, which we now undertake.
13+
14+
## Goals
15+
16+
1. Decode `ZDIVE.ZSAMPLES` for every dive where the blob is non-null and the format is understood, producing the same `List<Map<String, dynamic>>` payload shape the UDDF and MacDive-native-XML importers already emit.
17+
2. Produce output that is sample-for-sample equivalent to the UDDF import for the same UUID, within documented tolerance.
18+
3. Degrade gracefully: per-dive decode failures become `ImportWarning`s, never abort the import. Metadata-only dives still land exactly as they do today.
19+
4. Ship the decoder behind the same test discipline the rest of the MacDive importer uses (unit + golden + gated real-sample regression).
20+
21+
## Non-Goals
22+
23+
- Decoding `ZDIVE.ZRAWDATA` (the raw dive-computer sensor dump). That's a separate fallback path, reserved for a future milestone if `ZSAMPLES` decoding proves infeasible or if we later want to cover the 83 sample-DB dives that have `ZSAMPLES` without a full-fidelity raw dump.
24+
- Modifying the `DiveProfilePoint` domain entity or the `dive_profiles` Drift table. The importer already projects `Map<String, dynamic>` → domain entity correctly.
25+
- UI work. Warnings surface through the existing import-wizard warning list with no visual changes.
26+
- Round-trip re-export to MacDive's format.
27+
- Solving the ~35% of dives in a typical MacDive database that have no sample data at all (manual entries, non-computer-synced dives).
28+
29+
## Approach summary
30+
31+
Two phases with an explicit gate:
32+
33+
- **Phase 1 (investigation spike):** Build throwaway scripts in `scripts/reverse_engineering/zsamples/` that extract paired corpora (ZSAMPLES blob + UDDF ground truth for the same dive UUID), probe the format, and score candidate decodings. Exits when either (a) a single decoder hypothesis scores ≥90% sample-accurate across the 350 ZSAMPLES-bearing dives in the sample database, or (b) the spike timebox (1-2 active days) expires without a viable hypothesis.
34+
- **Phase 2 (implementation):** If Phase 1 succeeded, implement `MacDiveSamplesDecoder` + `MacDiveSqliteSample` typed model, wire into the existing reader/mapper, ship tests at three levels. If Phase 1 failed, this spec is closed and a separate plan covers the ZRAWDATA fallback.
35+
36+
The rest of this document describes what each phase produces concretely.
37+
38+
## Phase 1 — Investigation spike
39+
40+
### Deliverables
41+
42+
1. `docs/import-formats/macdive-zsamples.md` — written format specification precise enough that a programmer unfamiliar with the investigation can implement the decoder from it alone.
43+
2. `scripts/reverse_engineering/zsamples/` — committed scripts, retained for future format drifts.
44+
3. A go/no-go decision recorded as a commit to the plan file.
45+
46+
### Scripts
47+
48+
| File | Purpose |
49+
|---|---|
50+
| `extract_corpus.dart` | Reads `scripts/sample_data/MacDive.sqlite` + `scripts/sample_data/Apr 4 no iPad sync.uddf`, emits paired fixtures under `corpus/<uuid>.zsamples.bin` + `corpus/<uuid>.uddf.json`. UDDF → JSON projection uses the existing `UddfImportParser` so ground-truth data matches exactly what the production importer would produce. |
51+
| `inspect.dart <fixture>` | Pretty-prints a ZSAMPLES blob: hex dump, candidate interpretations at each offset (u8/u16/u32/float LE and BE), repeating-group detection by offset stride, entropy-per-window graph. Human-driven exploration tool. |
52+
| `compression_probe.dart <fixture>` | Attempts LZFSE, LZVN, LZ4, zstd, and Apple Archive decompression at every offset 0 through 64. PR #256 only tried zlib/gzip/lzma at offsets 0/4/8/12; this expands the search to cover Apple's native compression codecs, which are the plausible candidates given MacDive is a native macOS app. |
53+
| `differ.dart <fixture> <hypothesis>` | Takes a Dart function `Uint8List → List<Sample>` as the hypothesis. Decodes, compares to UDDF ground truth. Returns a score: sample-count accuracy, timestamp RMSE, depth RMSE, temperature RMSE, and % samples within tolerance (exact timestamp, ±0.1m depth, ±0.5°C temp). |
54+
| `batch_score.dart <hypothesis>` | Runs a hypothesis across every fixture in `corpus/`, reports per-dive scores plus an aggregate histogram. This is the go/no-go measurement. |
55+
56+
### Ranked hypotheses (cheap → expensive)
57+
58+
1. **Apple compression wrapper.** Run `compression_probe.dart` against fixtures from both observed header variants (`0x19` and `0x9D`). If LZFSE/LZVN/LZ4 hits at any offset, the rest of the format is whatever lies under the wrapper — likely a simple repeating record. *(Expected: 20 minutes.)*
59+
2. **Fixed-width sample records after the 8-byte header.** Compute candidate strides from `(blob_size − 8) / expected_sample_count`, where expected count = `duration / sample_interval`. Search for clean integer divisors and byte patterns at that stride. `inspect.dart`'s entropy-per-window graph flags any strong periodicity. *(Expected: 1-2 hours.)*
60+
3. **Typed record stream (TLV-like).** Each sample is one or more `(tag, length, value)` records; the decoder walks the stream. Consistent with MacDive's likely data model of "depth + optional temp + optional pressure + optional event per timestamp." *(Expected: half a day.)*
61+
4. **Container-of-blocks (vendor-specific frames).** Second header byte becomes a protocol-family ID; each family decodes differently. libdivecomputer's per-vendor parsers are reference material for framing conventions. *(Remainder of timebox.)*
62+
63+
### Validators
64+
65+
Beyond the differ's sample-by-sample comparison, per-dive aggregates on `ZDIVE` are cheap sanity checks:
66+
67+
- `ZMAXDEPTH` — must match `max(decoded.depth)` within ±0.1m.
68+
- `ZAVERAGEDEPTH` — must match mean within ±0.1m.
69+
- `ZTEMPHIGH` / `ZTEMPLOW` — must match decoded temp extremes within ±0.5°C.
70+
- `ZSAMPLEINTERVAL` — if constant-interval, must match `decoded[1].time - decoded[0].time`.
71+
- `ZTOTALDURATION` — must be `≥ decoded.last.time`.
72+
73+
### Exit criteria
74+
75+
- **GO** — one hypothesis scores ≥90% sample-accurate across the 350-dive corpus; remaining failures are either attributable to a small number of distinguishable format variants (implementable in bounded time) or isolated outliers where we emit `profile: []` with a warning.
76+
- **NO-GO** — after 1-2 active days, best hypothesis scores <50% with no clear next hypothesis. Spec is closed. ZRAWDATA fallback planning begins in a new spec.
77+
- **ESCALATE** — unusual findings (cryptographic signatures, per-dive salting, identical profiles with non-identical bytes) trigger a conversation before further spend.
78+
79+
## Phase 2 — Implementation
80+
81+
### File layout
82+
83+
All new code lives under `lib/features/universal_import/data/services/`:
84+
85+
```
86+
macdive_db_reader.dart (exists) — calls the decoder, stores typed samples
87+
macdive_dive_mapper.dart (exists) — line ~334 stops emitting profile: []
88+
macdive_raw_types.dart (exists) — MacDiveRawDive gains `samples` field
89+
macdive_samples_decoder.dart (new) — public entry point, pure function
90+
macdive_samples/ (new; conditional)
91+
macdive_sqlite_sample.dart — typed model, mirrors MacDiveXmlSample
92+
variants/ — only if >1 format variant exists
93+
<variant>_decoder.dart
94+
```
95+
96+
The `variants/` directory is created only if Phase 1 confirms multiple format families. A single-family decoder stays a single file.
97+
98+
### Typed model
99+
100+
`MacDiveSqliteSample` has field names and units identical to `MacDiveXmlSample` so downstream projection code doesn't branch on source:
101+
102+
```dart
103+
class MacDiveSqliteSample {
104+
final Duration time;
105+
final double? depthMeters;
106+
final double? pressureBar; // tank pressure if present
107+
final double? temperatureCelsius;
108+
final double? ppO2; // bar
109+
final int? ndlSeconds;
110+
// Additional fields if Phase 1 reveals them: heartRate, setpoint, event markers.
111+
}
112+
```
113+
114+
Values are stored in SI canonical units. The decoder reads `ZMETADATA.SystemOfUnits` from the caller and delegates any imperial→SI conversion to the existing `MacDiveUnitConverter`.
115+
116+
### Decoder API
117+
118+
Pure function, no I/O, fully testable in isolation:
119+
120+
```dart
121+
class MacDiveSamplesDecoder {
122+
const MacDiveSamplesDecoder();
123+
124+
/// Returns decoded samples in SI canonical units.
125+
/// Throws MacDiveSamplesDecodeError if the blob is malformed or the header
126+
/// variant is unknown. Returns [] for a header-only blob (no sample body).
127+
List<MacDiveSqliteSample> decode(
128+
Uint8List blob, {
129+
required MacDiveUnitSystem units,
130+
required MacDiveUnitConverter converter,
131+
});
132+
}
133+
134+
class MacDiveSamplesDecodeError implements Exception {
135+
final String reason; // e.g. "unknown header variant 0x9D"
136+
final int? offendingOffset; // for debug / warning surfaces
137+
const MacDiveSamplesDecodeError(this.reason, {this.offendingOffset});
138+
}
139+
```
140+
141+
### Reader integration
142+
143+
`MacDiveDbReader`'s per-dive loop decodes the blob immediately after reading it. Errors become per-dive warnings on the logbook:
144+
145+
```dart
146+
try {
147+
final decoded = (samplesBlob == null)
148+
? const <MacDiveSqliteSample>[]
149+
: decoder.decode(samplesBlob, units: units, converter: converter);
150+
dive = dive.copyWith(samples: decoded);
151+
} on MacDiveSamplesDecodeError catch (e) {
152+
warnings.add(ImportWarning.sampleDecodeFailed(
153+
diveUuid: dive.uuid,
154+
reason: e.reason,
155+
offendingOffset: e.offendingOffset,
156+
));
157+
dive = dive.copyWith(samples: const <MacDiveSqliteSample>[]);
158+
}
159+
```
160+
161+
`MacDiveRawDive` gains `samples: List<MacDiveSqliteSample>` (non-nullable, empty by default). The existing `samplesBlob: Uint8List?` field remains for diagnostic use during the ramp-up period; it can be removed once the decoder is proven.
162+
163+
### Mapper integration
164+
165+
`MacDiveDiveMapper._buildDiveMap()` replaces the current `map['profile'] = const <Map<String, dynamic>>[];` line with a projection of `dive.samples` into payload maps. The projection duplicates the 10-line helper from `MacDiveXmlParser` (lines 279-291). The two are intentionally independent:
166+
167+
- Rationale: `MacDiveXmlSample` and `MacDiveSqliteSample` have identical shape today, but they represent different upstream formats and will evolve on different schedules. A shared projection would couple them.
168+
- Cost of duplication: ~10 lines of trivial mechanical code.
169+
- Benefit of duplication: each format owns its projection; changes to one don't risk the other.
170+
171+
### Warning flow
172+
173+
New variant on `ImportWarning`:
174+
175+
```dart
176+
ImportWarning.sampleDecodeFailed({
177+
required String diveUuid,
178+
required String reason,
179+
int? offendingOffset,
180+
});
181+
```
182+
183+
The wizard UI already renders `ImportWarning`s. Polish: if more than 10 warnings share the same `reason` field, collapse to one aggregated line (`"N dives: <reason>"`). Single-line helper, single test. Worth including because un-decodable blobs will cluster by format variant.
184+
185+
## Testing
186+
187+
### Layer 1 — decoder unit tests
188+
189+
File: `test/features/universal_import/data/services/macdive_samples_decoder_test.dart`
190+
Fixtures: `test/fixtures/macdive_sqlite/zsamples_golden/` (committed, each ≤1KB, redacted if needed).
191+
192+
Coverage targets:
193+
- Each header variant Phase 1 surfaces (`0x19`, `0x9D`, any others).
194+
- Boundary conditions: zero samples (header-only), single sample, maximum observed blob size.
195+
- Unit handling: imperial-unit blob decoded with `MacDiveUnitSystem.imperial`, metric with `MacDiveUnitSystem.metric`.
196+
- Malformed input: truncated body, unknown header variant, garbage bytes after a valid header.
197+
198+
### Layer 2 — decoder golden tests
199+
200+
Same test file, separate group. Decode committed fixture → JSON → `expect(jsonEncode(decoded), matchesGoldenFile('...'))`. Catches any regression that changes byte interpretation.
201+
202+
Curated fixture set: one per header variant, one per combination of optional fields (pressure present/absent, ppO2 present/absent, etc.), one minimal. Target ≤10 fixtures, each <1KB.
203+
204+
**Redaction rule:** any field that could identify a user or computer (serial numbers, timestamps corresponding to real dives) is byte-patched to zero in the committed fixture.
205+
206+
### Layer 3 — real-sample regression (gated)
207+
208+
File: `test/features/universal_import/data/parsers/macdive_sqlite_real_sample_test.dart`
209+
210+
Pattern matches the existing `macdive_xml_real_sample_test.dart`:
211+
- Skipped in CI (no fixtures committed — user's dive log is private).
212+
- Runs locally with `flutter test --dart-define=MACDIVE_SQLITE_SAMPLE=/path/to/MacDive.sqlite --dart-define=MACDIVE_UDDF_SAMPLE=/path/to/sync.uddf --run-skipped --tags=real-data test/features/universal_import/data/parsers/macdive_sqlite_real_sample_test.dart`.
213+
214+
Assertions:
215+
- Every dive with non-null `ZSAMPLES` in the SQLite file produces either a decoded profile or a per-dive warning — no silent data loss.
216+
- For every dive UUID present in both SQLite and UDDF, decoded profiles match within tolerance: timestamp exact, depth ±0.1m, temperature ±0.5°C, sample count ≥ UDDF count × 0.95.
217+
- Total warnings count is bounded (e.g., <5% of dives with `ZSAMPLES`).
218+
219+
## Error handling
220+
221+
| Failure | Response |
222+
|---|---|
223+
| Blob is `null` | `samples = []`, no warning. Normal for manual dives. |
224+
| Blob has known header but truncated body | Throw `MacDiveSamplesDecodeError("truncated at byte N")`, catch in reader, emit warning, `samples = []`. Dive metadata still imports. |
225+
| Blob has unknown header variant | Throw `MacDiveSamplesDecodeError("unknown header variant 0x%02X")`, same recovery. |
226+
| Decoded samples violate sanity bounds (depth < 0 or > 1000m, timestamps non-monotonic) | Decoder returns samples anyway. Post-decode validation in the mapper nulls out offending fields and emits a warning. |
227+
228+
A decode failure never halts the import or drops the dive record. Profile decoding is strictly additive.
229+
230+
## Rollout
231+
232+
### Branch topology
233+
234+
```
235+
main
236+
237+
├── feature/macdive-sqlite (PR #256, open)
238+
│ └── feature/macdive-sqlite-profiles (new; this spec)
239+
240+
└─── merges in order: #256 first, then profiles PR
241+
```
242+
243+
The profiles PR targets `feature/macdive-sqlite`, not `main`, preserving review stackability. When #256 merges, this branch rebases onto `main` and re-targets.
244+
245+
### Sequencing
246+
247+
1. Cut `feature/macdive-sqlite-profiles` from `feature/macdive-sqlite`.
248+
2. Phase 1 commits: scripts under `scripts/reverse_engineering/zsamples/`, format spec under `docs/import-formats/macdive-zsamples.md`, plan update recording the go/no-go decision.
249+
3. If GO: Phase 2 commits add the decoder, types, reader/mapper wiring, and tests.
250+
4. If NO-GO: spec closed, new spec initiated for ZRAWDATA fallback. Phase 1 artifacts remain in-repo for future reference.
251+
252+
### Plan alignment
253+
254+
Append a link to this spec at the tail of `docs/superpowers/plans/2026-04-21-macdive-sqlite-import.md` so future readers can find the continuation.
255+
256+
## Open questions
257+
258+
None material at spec-approval time. Phase 1 will surface any; they get resolved inline in `docs/import-formats/macdive-zsamples.md` before Phase 2 starts.

0 commit comments

Comments
 (0)