|
| 1 | +# MacDive SQLite Profile Decoding — Design |
| 2 | + |
| 3 | +**Status:** Draft |
| 4 | +**Author:** Eric Griffin |
| 5 | +**Created:** 2026-04-23 |
| 6 | +**Context:** Continuation of the MacDive Import Robustness work (`docs/superpowers/specs/2026-04-21-macdive-import-design.md`). Milestone 3 of that plan (PR #256, `feature/macdive-sqlite`) deferred decoding of `ZDIVE.ZSAMPLES` — MacDive's proprietary profile-sample BLOB. This spec covers the follow-up that closes the gap. |
| 7 | + |
| 8 | +## Problem |
| 9 | + |
| 10 | +PR #256 lands MacDive SQLite import with all dive metadata (tags, critters, gear, events, tanks, gases, sites, buddies) but emits `profile: []` for every dive. Users importing their MacDive database currently cannot see time-series profile data (depth-over-time, temperature, tank pressure, ppO2, NDL) unless they also export UDDF separately and run a second import. For the ~65% of dives in a typical MacDive database that have `ZSAMPLES` data, we need the SQLite path to produce the same profile output the UDDF importer already produces. |
| 11 | + |
| 12 | +The existing metadata-only path was a deliberate scope decision in PR #256: MacDive's `ZSAMPLES` format isn't bplist, and the initial probing (zlib / gzip / lzma at offsets 0/4/8/12) didn't yield a known wrapper. Decoding the format requires focused reverse-engineering work, which we now undertake. |
| 13 | + |
| 14 | +## Goals |
| 15 | + |
| 16 | +1. Decode `ZDIVE.ZSAMPLES` for every dive where the blob is non-null and the format is understood, producing the same `List<Map<String, dynamic>>` payload shape the UDDF and MacDive-native-XML importers already emit. |
| 17 | +2. Produce output that is sample-for-sample equivalent to the UDDF import for the same UUID, within documented tolerance. |
| 18 | +3. Degrade gracefully: per-dive decode failures become `ImportWarning`s, never abort the import. Metadata-only dives still land exactly as they do today. |
| 19 | +4. Ship the decoder behind the same test discipline the rest of the MacDive importer uses (unit + golden + gated real-sample regression). |
| 20 | + |
| 21 | +## Non-Goals |
| 22 | + |
| 23 | +- Decoding `ZDIVE.ZRAWDATA` (the raw dive-computer sensor dump). That's a separate fallback path, reserved for a future milestone if `ZSAMPLES` decoding proves infeasible or if we later want to cover the 83 sample-DB dives that have `ZSAMPLES` without a full-fidelity raw dump. |
| 24 | +- Modifying the `DiveProfilePoint` domain entity or the `dive_profiles` Drift table. The importer already projects `Map<String, dynamic>` → domain entity correctly. |
| 25 | +- UI work. Warnings surface through the existing import-wizard warning list with no visual changes. |
| 26 | +- Round-trip re-export to MacDive's format. |
| 27 | +- Solving the ~35% of dives in a typical MacDive database that have no sample data at all (manual entries, non-computer-synced dives). |
| 28 | + |
| 29 | +## Approach summary |
| 30 | + |
| 31 | +Two phases with an explicit gate: |
| 32 | + |
| 33 | +- **Phase 1 (investigation spike):** Build throwaway scripts in `scripts/reverse_engineering/zsamples/` that extract paired corpora (ZSAMPLES blob + UDDF ground truth for the same dive UUID), probe the format, and score candidate decodings. Exits when either (a) a single decoder hypothesis scores ≥90% sample-accurate across the 350 ZSAMPLES-bearing dives in the sample database, or (b) the spike timebox (1-2 active days) expires without a viable hypothesis. |
| 34 | +- **Phase 2 (implementation):** If Phase 1 succeeded, implement `MacDiveSamplesDecoder` + `MacDiveSqliteSample` typed model, wire into the existing reader/mapper, ship tests at three levels. If Phase 1 failed, this spec is closed and a separate plan covers the ZRAWDATA fallback. |
| 35 | + |
| 36 | +The rest of this document describes what each phase produces concretely. |
| 37 | + |
| 38 | +## Phase 1 — Investigation spike |
| 39 | + |
| 40 | +### Deliverables |
| 41 | + |
| 42 | +1. `docs/import-formats/macdive-zsamples.md` — written format specification precise enough that a programmer unfamiliar with the investigation can implement the decoder from it alone. |
| 43 | +2. `scripts/reverse_engineering/zsamples/` — committed scripts, retained for future format drifts. |
| 44 | +3. A go/no-go decision recorded as a commit to the plan file. |
| 45 | + |
| 46 | +### Scripts |
| 47 | + |
| 48 | +| File | Purpose | |
| 49 | +|---|---| |
| 50 | +| `extract_corpus.dart` | Reads `scripts/sample_data/MacDive.sqlite` + `scripts/sample_data/Apr 4 no iPad sync.uddf`, emits paired fixtures under `corpus/<uuid>.zsamples.bin` + `corpus/<uuid>.uddf.json`. UDDF → JSON projection uses the existing `UddfImportParser` so ground-truth data matches exactly what the production importer would produce. | |
| 51 | +| `inspect.dart <fixture>` | Pretty-prints a ZSAMPLES blob: hex dump, candidate interpretations at each offset (u8/u16/u32/float LE and BE), repeating-group detection by offset stride, entropy-per-window graph. Human-driven exploration tool. | |
| 52 | +| `compression_probe.dart <fixture>` | Attempts LZFSE, LZVN, LZ4, zstd, and Apple Archive decompression at every offset 0 through 64. PR #256 only tried zlib/gzip/lzma at offsets 0/4/8/12; this expands the search to cover Apple's native compression codecs, which are the plausible candidates given MacDive is a native macOS app. | |
| 53 | +| `differ.dart <fixture> <hypothesis>` | Takes a Dart function `Uint8List → List<Sample>` as the hypothesis. Decodes, compares to UDDF ground truth. Returns a score: sample-count accuracy, timestamp RMSE, depth RMSE, temperature RMSE, and % samples within tolerance (exact timestamp, ±0.1m depth, ±0.5°C temp). | |
| 54 | +| `batch_score.dart <hypothesis>` | Runs a hypothesis across every fixture in `corpus/`, reports per-dive scores plus an aggregate histogram. This is the go/no-go measurement. | |
| 55 | + |
| 56 | +### Ranked hypotheses (cheap → expensive) |
| 57 | + |
| 58 | +1. **Apple compression wrapper.** Run `compression_probe.dart` against fixtures from both observed header variants (`0x19` and `0x9D`). If LZFSE/LZVN/LZ4 hits at any offset, the rest of the format is whatever lies under the wrapper — likely a simple repeating record. *(Expected: 20 minutes.)* |
| 59 | +2. **Fixed-width sample records after the 8-byte header.** Compute candidate strides from `(blob_size − 8) / expected_sample_count`, where expected count = `duration / sample_interval`. Search for clean integer divisors and byte patterns at that stride. `inspect.dart`'s entropy-per-window graph flags any strong periodicity. *(Expected: 1-2 hours.)* |
| 60 | +3. **Typed record stream (TLV-like).** Each sample is one or more `(tag, length, value)` records; the decoder walks the stream. Consistent with MacDive's likely data model of "depth + optional temp + optional pressure + optional event per timestamp." *(Expected: half a day.)* |
| 61 | +4. **Container-of-blocks (vendor-specific frames).** Second header byte becomes a protocol-family ID; each family decodes differently. libdivecomputer's per-vendor parsers are reference material for framing conventions. *(Remainder of timebox.)* |
| 62 | + |
| 63 | +### Validators |
| 64 | + |
| 65 | +Beyond the differ's sample-by-sample comparison, per-dive aggregates on `ZDIVE` are cheap sanity checks: |
| 66 | + |
| 67 | +- `ZMAXDEPTH` — must match `max(decoded.depth)` within ±0.1m. |
| 68 | +- `ZAVERAGEDEPTH` — must match mean within ±0.1m. |
| 69 | +- `ZTEMPHIGH` / `ZTEMPLOW` — must match decoded temp extremes within ±0.5°C. |
| 70 | +- `ZSAMPLEINTERVAL` — if constant-interval, must match `decoded[1].time - decoded[0].time`. |
| 71 | +- `ZTOTALDURATION` — must be `≥ decoded.last.time`. |
| 72 | + |
| 73 | +### Exit criteria |
| 74 | + |
| 75 | +- **GO** — one hypothesis scores ≥90% sample-accurate across the 350-dive corpus; remaining failures are either attributable to a small number of distinguishable format variants (implementable in bounded time) or isolated outliers where we emit `profile: []` with a warning. |
| 76 | +- **NO-GO** — after 1-2 active days, best hypothesis scores <50% with no clear next hypothesis. Spec is closed. ZRAWDATA fallback planning begins in a new spec. |
| 77 | +- **ESCALATE** — unusual findings (cryptographic signatures, per-dive salting, identical profiles with non-identical bytes) trigger a conversation before further spend. |
| 78 | + |
| 79 | +## Phase 2 — Implementation |
| 80 | + |
| 81 | +### File layout |
| 82 | + |
| 83 | +All new code lives under `lib/features/universal_import/data/services/`: |
| 84 | + |
| 85 | +``` |
| 86 | +macdive_db_reader.dart (exists) — calls the decoder, stores typed samples |
| 87 | +macdive_dive_mapper.dart (exists) — line ~334 stops emitting profile: [] |
| 88 | +macdive_raw_types.dart (exists) — MacDiveRawDive gains `samples` field |
| 89 | +macdive_samples_decoder.dart (new) — public entry point, pure function |
| 90 | +macdive_samples/ (new; conditional) |
| 91 | + macdive_sqlite_sample.dart — typed model, mirrors MacDiveXmlSample |
| 92 | + variants/ — only if >1 format variant exists |
| 93 | + <variant>_decoder.dart |
| 94 | +``` |
| 95 | + |
| 96 | +The `variants/` directory is created only if Phase 1 confirms multiple format families. A single-family decoder stays a single file. |
| 97 | + |
| 98 | +### Typed model |
| 99 | + |
| 100 | +`MacDiveSqliteSample` has field names and units identical to `MacDiveXmlSample` so downstream projection code doesn't branch on source: |
| 101 | + |
| 102 | +```dart |
| 103 | +class MacDiveSqliteSample { |
| 104 | + final Duration time; |
| 105 | + final double? depthMeters; |
| 106 | + final double? pressureBar; // tank pressure if present |
| 107 | + final double? temperatureCelsius; |
| 108 | + final double? ppO2; // bar |
| 109 | + final int? ndlSeconds; |
| 110 | + // Additional fields if Phase 1 reveals them: heartRate, setpoint, event markers. |
| 111 | +} |
| 112 | +``` |
| 113 | + |
| 114 | +Values are stored in SI canonical units. The decoder reads `ZMETADATA.SystemOfUnits` from the caller and delegates any imperial→SI conversion to the existing `MacDiveUnitConverter`. |
| 115 | + |
| 116 | +### Decoder API |
| 117 | + |
| 118 | +Pure function, no I/O, fully testable in isolation: |
| 119 | + |
| 120 | +```dart |
| 121 | +class MacDiveSamplesDecoder { |
| 122 | + const MacDiveSamplesDecoder(); |
| 123 | +
|
| 124 | + /// Returns decoded samples in SI canonical units. |
| 125 | + /// Throws MacDiveSamplesDecodeError if the blob is malformed or the header |
| 126 | + /// variant is unknown. Returns [] for a header-only blob (no sample body). |
| 127 | + List<MacDiveSqliteSample> decode( |
| 128 | + Uint8List blob, { |
| 129 | + required MacDiveUnitSystem units, |
| 130 | + required MacDiveUnitConverter converter, |
| 131 | + }); |
| 132 | +} |
| 133 | +
|
| 134 | +class MacDiveSamplesDecodeError implements Exception { |
| 135 | + final String reason; // e.g. "unknown header variant 0x9D" |
| 136 | + final int? offendingOffset; // for debug / warning surfaces |
| 137 | + const MacDiveSamplesDecodeError(this.reason, {this.offendingOffset}); |
| 138 | +} |
| 139 | +``` |
| 140 | + |
| 141 | +### Reader integration |
| 142 | + |
| 143 | +`MacDiveDbReader`'s per-dive loop decodes the blob immediately after reading it. Errors become per-dive warnings on the logbook: |
| 144 | + |
| 145 | +```dart |
| 146 | +try { |
| 147 | + final decoded = (samplesBlob == null) |
| 148 | + ? const <MacDiveSqliteSample>[] |
| 149 | + : decoder.decode(samplesBlob, units: units, converter: converter); |
| 150 | + dive = dive.copyWith(samples: decoded); |
| 151 | +} on MacDiveSamplesDecodeError catch (e) { |
| 152 | + warnings.add(ImportWarning.sampleDecodeFailed( |
| 153 | + diveUuid: dive.uuid, |
| 154 | + reason: e.reason, |
| 155 | + offendingOffset: e.offendingOffset, |
| 156 | + )); |
| 157 | + dive = dive.copyWith(samples: const <MacDiveSqliteSample>[]); |
| 158 | +} |
| 159 | +``` |
| 160 | + |
| 161 | +`MacDiveRawDive` gains `samples: List<MacDiveSqliteSample>` (non-nullable, empty by default). The existing `samplesBlob: Uint8List?` field remains for diagnostic use during the ramp-up period; it can be removed once the decoder is proven. |
| 162 | + |
| 163 | +### Mapper integration |
| 164 | + |
| 165 | +`MacDiveDiveMapper._buildDiveMap()` replaces the current `map['profile'] = const <Map<String, dynamic>>[];` line with a projection of `dive.samples` into payload maps. The projection duplicates the 10-line helper from `MacDiveXmlParser` (lines 279-291). The two are intentionally independent: |
| 166 | + |
| 167 | +- Rationale: `MacDiveXmlSample` and `MacDiveSqliteSample` have identical shape today, but they represent different upstream formats and will evolve on different schedules. A shared projection would couple them. |
| 168 | +- Cost of duplication: ~10 lines of trivial mechanical code. |
| 169 | +- Benefit of duplication: each format owns its projection; changes to one don't risk the other. |
| 170 | + |
| 171 | +### Warning flow |
| 172 | + |
| 173 | +New variant on `ImportWarning`: |
| 174 | + |
| 175 | +```dart |
| 176 | +ImportWarning.sampleDecodeFailed({ |
| 177 | + required String diveUuid, |
| 178 | + required String reason, |
| 179 | + int? offendingOffset, |
| 180 | +}); |
| 181 | +``` |
| 182 | + |
| 183 | +The wizard UI already renders `ImportWarning`s. Polish: if more than 10 warnings share the same `reason` field, collapse to one aggregated line (`"N dives: <reason>"`). Single-line helper, single test. Worth including because un-decodable blobs will cluster by format variant. |
| 184 | + |
| 185 | +## Testing |
| 186 | + |
| 187 | +### Layer 1 — decoder unit tests |
| 188 | + |
| 189 | +File: `test/features/universal_import/data/services/macdive_samples_decoder_test.dart` |
| 190 | +Fixtures: `test/fixtures/macdive_sqlite/zsamples_golden/` (committed, each ≤1KB, redacted if needed). |
| 191 | + |
| 192 | +Coverage targets: |
| 193 | +- Each header variant Phase 1 surfaces (`0x19`, `0x9D`, any others). |
| 194 | +- Boundary conditions: zero samples (header-only), single sample, maximum observed blob size. |
| 195 | +- Unit handling: imperial-unit blob decoded with `MacDiveUnitSystem.imperial`, metric with `MacDiveUnitSystem.metric`. |
| 196 | +- Malformed input: truncated body, unknown header variant, garbage bytes after a valid header. |
| 197 | + |
| 198 | +### Layer 2 — decoder golden tests |
| 199 | + |
| 200 | +Same test file, separate group. Decode committed fixture → JSON → `expect(jsonEncode(decoded), matchesGoldenFile('...'))`. Catches any regression that changes byte interpretation. |
| 201 | + |
| 202 | +Curated fixture set: one per header variant, one per combination of optional fields (pressure present/absent, ppO2 present/absent, etc.), one minimal. Target ≤10 fixtures, each <1KB. |
| 203 | + |
| 204 | +**Redaction rule:** any field that could identify a user or computer (serial numbers, timestamps corresponding to real dives) is byte-patched to zero in the committed fixture. |
| 205 | + |
| 206 | +### Layer 3 — real-sample regression (gated) |
| 207 | + |
| 208 | +File: `test/features/universal_import/data/parsers/macdive_sqlite_real_sample_test.dart` |
| 209 | + |
| 210 | +Pattern matches the existing `macdive_xml_real_sample_test.dart`: |
| 211 | +- Skipped in CI (no fixtures committed — user's dive log is private). |
| 212 | +- Runs locally with `flutter test --dart-define=MACDIVE_SQLITE_SAMPLE=/path/to/MacDive.sqlite --dart-define=MACDIVE_UDDF_SAMPLE=/path/to/sync.uddf --run-skipped --tags=real-data test/features/universal_import/data/parsers/macdive_sqlite_real_sample_test.dart`. |
| 213 | + |
| 214 | +Assertions: |
| 215 | +- Every dive with non-null `ZSAMPLES` in the SQLite file produces either a decoded profile or a per-dive warning — no silent data loss. |
| 216 | +- For every dive UUID present in both SQLite and UDDF, decoded profiles match within tolerance: timestamp exact, depth ±0.1m, temperature ±0.5°C, sample count ≥ UDDF count × 0.95. |
| 217 | +- Total warnings count is bounded (e.g., <5% of dives with `ZSAMPLES`). |
| 218 | + |
| 219 | +## Error handling |
| 220 | + |
| 221 | +| Failure | Response | |
| 222 | +|---|---| |
| 223 | +| Blob is `null` | `samples = []`, no warning. Normal for manual dives. | |
| 224 | +| Blob has known header but truncated body | Throw `MacDiveSamplesDecodeError("truncated at byte N")`, catch in reader, emit warning, `samples = []`. Dive metadata still imports. | |
| 225 | +| Blob has unknown header variant | Throw `MacDiveSamplesDecodeError("unknown header variant 0x%02X")`, same recovery. | |
| 226 | +| Decoded samples violate sanity bounds (depth < 0 or > 1000m, timestamps non-monotonic) | Decoder returns samples anyway. Post-decode validation in the mapper nulls out offending fields and emits a warning. | |
| 227 | + |
| 228 | +A decode failure never halts the import or drops the dive record. Profile decoding is strictly additive. |
| 229 | + |
| 230 | +## Rollout |
| 231 | + |
| 232 | +### Branch topology |
| 233 | + |
| 234 | +``` |
| 235 | +main |
| 236 | + │ |
| 237 | + ├── feature/macdive-sqlite (PR #256, open) |
| 238 | + │ └── feature/macdive-sqlite-profiles (new; this spec) |
| 239 | + │ |
| 240 | + └─── merges in order: #256 first, then profiles PR |
| 241 | +``` |
| 242 | + |
| 243 | +The profiles PR targets `feature/macdive-sqlite`, not `main`, preserving review stackability. When #256 merges, this branch rebases onto `main` and re-targets. |
| 244 | + |
| 245 | +### Sequencing |
| 246 | + |
| 247 | +1. Cut `feature/macdive-sqlite-profiles` from `feature/macdive-sqlite`. |
| 248 | +2. Phase 1 commits: scripts under `scripts/reverse_engineering/zsamples/`, format spec under `docs/import-formats/macdive-zsamples.md`, plan update recording the go/no-go decision. |
| 249 | +3. If GO: Phase 2 commits add the decoder, types, reader/mapper wiring, and tests. |
| 250 | +4. If NO-GO: spec closed, new spec initiated for ZRAWDATA fallback. Phase 1 artifacts remain in-repo for future reference. |
| 251 | + |
| 252 | +### Plan alignment |
| 253 | + |
| 254 | +Append a link to this spec at the tail of `docs/superpowers/plans/2026-04-21-macdive-sqlite-import.md` so future readers can find the continuation. |
| 255 | + |
| 256 | +## Open questions |
| 257 | + |
| 258 | +None material at spec-approval time. Phase 1 will surface any; they get resolved inline in `docs/import-formats/macdive-zsamples.md` before Phase 2 starts. |
0 commit comments