Skip to content

Commit eea760b

Browse files
committed
feat(import): MacDive SQLite import (Milestone 3 of 4)
Adds direct import from MacDive's Core Data SQLite database, the most complete MacDive metadata path. Captures the same entity set as MacDive XML — dives, sites, buddies, tags, and gear inventory — plus per-dive tank/gas mix linkage from ZTANKANDGAS and per-dive equipmentRefs linking dives to their gear. Cross-format deduplication via source UUIDs means re-importing the same dives via MacDive UDDF/XML/SQLite won't create duplicates. ## What landed - `ImportFormat.macdiveSqlite` with `MacDive (SQLite)` source override. - `_detectFormat` extended: SQLite → Shearwater check → MacDive check → generic. Single SQLite probe reused across flavors to avoid doubling temp-file I/O on large DBs. - `BPlistDecoder` (Apple binary plist v00) in `lib/core/utils/bplist/` — supports dict, array, string (ASCII + UTF-16BE), int (1..8 byte; 16-byte best-effort low-64-bit), real, bool, null, data, date, and NSKeyedArchiver UID markers (1..4 bytes per spec). - `MacDiveDbReader`: schema validation (required tables enforced up front; optional tables read safely), typed row graph, filters null-FK tombstones in ZTANKANDGAS. - `MacDiveDiveMapper`: joins raw rows into a unified `ImportPayload` matching M2's key conventions. Emits per-dive `equipmentRefs` so `UddfEntityImporter.equipmentIdMapping` can link gear back to dives. Gear maps require `name` and carry a stable `uddfId` (MacDive gear UUID, with name fallback). - `MacDiveSqliteParser`: `ImportParser` implementation wrapping reader + mapper. - `ImportDuplicateChecker`: first-pass exact match on `source_uuid` via a new `DiveRepository.getSourceUuidByDiveId({diverId})` helper that optionally scopes the UUID map to one diver. ## Known limitations - `ZDIVE.ZSAMPLES` (MacDive's proprietary profile-sample BLOB) is NOT decoded. MacDive's format isn't bplist and doesn't match common compressions. Users who want profile time-series data should use MacDive UDDF import (M1). M3 is the rich-metadata path; UDDF is the sample-data path. - Critters (marine-life sightings), dive events, service records, and certifications are read into the typed row graph but not yet emitted into the import payload — tracked as follow-up work (the unified importer lacks entity types for critters/events/service records). ## Test coverage - Synthetic-DB tests: reader, mapper, parser (`macdive_db_reader_test`, `macdive_dive_mapper_test`, `macdive_sqlite_parser_test`). - BPlist decoder golden tests (Python-plistlib fixtures + real MacDive ZTIMEZONE BLOB). Regression tests for 1/2/4-byte UID indexes. - Gated real-sample regression against a 6.7MB MacDive.sqlite (enable via `MACDIVE_SQLITE_REAL_SAMPLE_PATH`): 540 dives / 373 sites / 33 buddies / 39 tags / 32 gear, equipmentRefs resolution verified. - New `dive_repository_new_methods_test` coverage for scoped and unscoped `getSourceUuidByDiveId`. Closes #179.
1 parent 14e24af commit eea760b

37 files changed

Lines changed: 3319 additions & 9 deletions

File tree

CHANGELOG.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,15 @@ All notable changes to Submersion are documented in this file.
1919
internal units at the reader boundary.
2020
- **MacDive (XML) source override** in the import wizard's detected-source
2121
dropdown, alongside the existing MacDive (CSV) option.
22+
- **MacDive SQLite import.** Direct import from MacDive's Core Data
23+
SQLite database. Captures the same entity set as MacDive XML — dives,
24+
sites, buddies, tags, and gear inventory — plus per-dive tank and gas
25+
mix linkage drawn from the `ZTANKANDGAS` join table. Cross-format
26+
deduplication via source UUIDs: if you've already imported the same
27+
dives via MacDive UDDF or XML, re-importing from SQLite won't create
28+
duplicates.
29+
- **MacDive (SQLite) source override** in the import wizard's
30+
detected-source dropdown, alongside MacDive (CSV) and MacDive (XML).
2231
- Cross-format import deduplication: stable per-dive UUIDs from MacDive,
2332
Shearwater Cloud, Subsurface SSRF, and generic UDDF are now preserved on
2433
the `dive_data_sources` sidecar. Re-importing the same dives in a
@@ -42,6 +51,22 @@ All notable changes to Submersion are documented in this file.
4251
parsed but not yet persisted to the profile samples table. A future
4352
milestone will wire them through, likely via the dive-events table.
4453

54+
### Known limitations
55+
56+
- Profile samples (depth/time-series data) are NOT imported from
57+
MacDive SQLite. MacDive stores sample data in `ZDIVE.ZSAMPLES`
58+
using a proprietary binary format that isn't publicly documented
59+
and isn't standard bplist or any common compression. Users who
60+
need time-series profile data should use the MacDive UDDF import
61+
path instead, which decodes MacDive's UDDF profile correctly.
62+
- The MacDive SQLite path reads critters (marine-life sightings),
63+
dive events, service records, and certifications into its typed
64+
row graph, but these are not yet emitted into the import payload —
65+
the unified importer doesn't have entity types for critters,
66+
events, or service records, and certification emission is scoped
67+
for a follow-up. For now, only the dive/site/buddy/tag/gear subset
68+
is persisted.
69+
4570

4671
## 1.4.6 (2026-04-22)
4772

docs/superpowers/plans/2026-04-21-macdive-sqlite-import.md

Lines changed: 42 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,13 @@
22

33
> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
44
5-
**Goal:** Import directly from MacDive's Core Data SQLite database (`MacDive.sqlite`). This is the most complete path: tags, critters, events, service records, full relationship graph, and per-dive profile samples encoded as Apple binary plists.
5+
**Goal:** Import directly from MacDive's Core Data SQLite database (`MacDive.sqlite`). Rich METADATA path — tags, critters, events, service records, full relationship graph, certifications, gear inventory.
6+
7+
**Scope adjustment (discovered during Task 4):** `ZDIVE.ZSAMPLES` is NOT bplist — MacDive uses a proprietary binary format (entropy 7.85 bits/byte, all 256 byte values present — either bit-packed+delta-encoded or compressed with a non-standard algorithm). Tried zlib/gzip/lzma at offsets 0/4/8/12 — nothing works. Reverse-engineering MacDive's sample format is out of M3 scope.
8+
9+
**Consequence:** M3 imports dive metadata only. `profile: []` is emitted for every dive. Users who want profile time-series data can use M1's UDDF import instead (which decodes profiles correctly from MacDive UDDF exports). SQLite import becomes the "rich metadata" path; UDDF remains the "sample data" path.
10+
11+
`ZDIVE.ZTIMEZONE` IS bplist (NSKeyedArchiver format with UID markers) — handled correctly by the bplist decoder from Tasks 1-3.
612

713
**Architecture:** Hand-rolled `BPlistDecoder` (binary plist v00) for decoding MacDive's BLOB columns (`ZRAWDATA`, `ZSAMPLES`, `ZTIMEZONE`). New `MacDiveDbReader` modeled on `ShearwaterDbReader` validates the schema and produces typed raw rows. New `MacDiveDiveMapper` joins the rows (dive ↔ site, dive ↔ buddy, dive ↔ tank ↔ gas, dive ↔ tag, dive ↔ critter) and maps them to a unified `ImportPayload`. Pipeline wiring mirrors Shearwater Cloud.
814

@@ -14,6 +20,41 @@
1420

1521
---
1622

23+
## Milestone 3 Status — COMPLETE
24+
25+
- All 14 tasks landed; bplist decoder (Tasks 1-4) shipped with UID
26+
support after real-sample probing revealed NSKeyedArchiver format
27+
in ZTIMEZONE.
28+
- ZSAMPLES profile-sample decoding **descoped** — MacDive uses a
29+
proprietary binary format (entropy 7.85 bits/byte; not bplist; not
30+
zlib/gzip/lzma at any reasonable offset). Users wanting profile
31+
samples should use M1's UDDF import. M3 is the "rich metadata"
32+
path; UDDF remains the "sample data" path.
33+
- New `ImportFormat.macdiveSqlite` + source override. Detector
34+
chain: SQLite magic → Shearwater check → MacDive check → generic.
35+
- `MacDiveDbReader` validates schema (ZDIVE + ZDIVESITE + ZGAS +
36+
ZTANKANDGAS) and returns typed row graph keyed by PK; junctions
37+
as dive_pk → related_pks maps. Filters null-FK tombstones in
38+
ZTANKANDGAS discovered on real data.
39+
- `MacDiveDiveMapper` reuses M2's `MacDiveUnitConverter` and
40+
`MacDiveValueMapper`. Dive map keys match M2's `MacDiveXmlParser`
41+
exactly so the same `UddfEntityImporter` downstream consumes
42+
both sources uniformly.
43+
- `ImportDuplicateChecker` now short-circuits on `source_uuid`
44+
when incoming dives match existing `dive_data_sources.source_uuid`.
45+
Separate parameter map keeps the `Dive` entity unchanged
46+
(multi-source-per-dive semantics preserved).
47+
- Gated `@Tags(['real-data'])` test asserts against user's real
48+
6.7MB DB: 540 dives, 373 sites, 33 buddies, 39 tags, 32 gear —
49+
all with sourceUuid populated. Profile always empty per descope.
50+
- Full test suite passes.
51+
52+
Next: M4 (Photos) extends M2's XML parser and M3's SQLite reader
53+
to emit `imageRefs` on payloads and adds a photo-linking wizard
54+
step.
55+
56+
---
57+
1758
## File Structure
1859

1960
| File | Role | New / Modified |
Lines changed: 223 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,223 @@
1+
import 'dart:convert';
2+
import 'dart:typed_data';
3+
4+
import 'package:submersion/core/utils/bplist/bplist_object.dart';
5+
6+
/// Decoder for Apple binary property list v00 ("bplist00"). Supports
7+
/// the subset of types observed in MacDive Core Data BLOBs: null,
8+
/// bool, int (1/2/4/8-byte; 16-byte ints decoded as best-effort from
9+
/// their low 64 bits with a warning in a comment — MacDive hasn't been
10+
/// seen to emit them), real (4/8-byte IEEE754), string (ASCII + UTF-16BE),
11+
/// data, date, dict, array, and CFKeyedArchiver UID refs. Sets throw
12+
/// FormatException.
13+
class BPlistDecoder {
14+
final Uint8List _bytes;
15+
final int _offsetIntSize;
16+
final int _objectRefSize;
17+
final int _offsetTableOffset;
18+
19+
BPlistDecoder._(
20+
this._bytes, {
21+
required int offsetIntSize,
22+
required int objectRefSize,
23+
required int offsetTableOffset,
24+
}) : _offsetIntSize = offsetIntSize,
25+
_objectRefSize = objectRefSize,
26+
_offsetTableOffset = offsetTableOffset;
27+
28+
/// Entry point: decode [bytes] into the root [BPlistObject].
29+
/// Throws [FormatException] if [bytes] is not a valid bplist00 stream
30+
/// or uses unsupported type markers (sets).
31+
static BPlistObject decode(Uint8List bytes) {
32+
if (bytes.length < 8 + 32) {
33+
throw const FormatException('bplist00 stream too short');
34+
}
35+
// Magic: bytes 0..7 must equal "bplist00".
36+
const magic = [0x62, 0x70, 0x6C, 0x69, 0x73, 0x74, 0x30, 0x30];
37+
for (var i = 0; i < 8; i++) {
38+
if (bytes[i] != magic[i]) {
39+
throw const FormatException('not a bplist00 stream');
40+
}
41+
}
42+
43+
final trailer = bytes.length - 32;
44+
final offsetIntSize = bytes[trailer + 6];
45+
final objectRefSize = bytes[trailer + 7];
46+
final topObjectIndex = _readBigEndianInt(bytes, trailer + 16, 8);
47+
final offsetTableOffset = _readBigEndianInt(bytes, trailer + 24, 8);
48+
49+
final decoder = BPlistDecoder._(
50+
bytes,
51+
offsetIntSize: offsetIntSize,
52+
objectRefSize: objectRefSize,
53+
offsetTableOffset: offsetTableOffset,
54+
);
55+
return decoder._readObject(topObjectIndex);
56+
}
57+
58+
int _offsetOfObject(int index) {
59+
final pos = _offsetTableOffset + index * _offsetIntSize;
60+
return _readBigEndianInt(_bytes, pos, _offsetIntSize);
61+
}
62+
63+
BPlistObject _readObject(int index) {
64+
final offset = _offsetOfObject(index);
65+
final marker = _bytes[offset];
66+
final type = marker >> 4;
67+
final info = marker & 0x0F;
68+
69+
switch (type) {
70+
case 0x0: // singletons
71+
return switch (info) {
72+
0x0 => const BPlistNull(),
73+
0x8 => const BPlistBool(false),
74+
0x9 => const BPlistBool(true),
75+
_ => throw FormatException(
76+
'unknown bplist singleton marker 0x${marker.toRadixString(16)}',
77+
),
78+
};
79+
80+
case 0x1: // int
81+
// info bits 0..3 encode log2(byteCount): 0->1, 1->2, 2->4, 3->8, 4->16.
82+
// 16-byte ints are decoded as a best-effort truncation to the low
83+
// 64 bits: Dart's `int` is 64-bit on native platforms, and the
84+
// `(value << 8) | byte` accumulator in `_readBigEndianInt` silently
85+
// drops the high bytes once the value overflows — which for a big-
86+
// endian read keeps exactly the low 64 bits of the source integer.
87+
// MacDive has not been seen to emit 16-byte ints in practice.
88+
final byteCount = 1 << info;
89+
return BPlistInt(_readBigEndianInt(_bytes, offset + 1, byteCount));
90+
91+
case 0x2: // real
92+
final byteCount = 1 << info;
93+
return BPlistReal(_readBigEndianReal(_bytes, offset + 1, byteCount));
94+
95+
case 0x3: // date — always 8-byte big-endian IEEE754
96+
return BPlistDate(_readBigEndianReal(_bytes, offset + 1, 8));
97+
98+
case 0x4: // data
99+
final li = _readLenAndStart(offset, info);
100+
return BPlistData(
101+
Uint8List.sublistView(_bytes, li.start, li.start + li.length),
102+
);
103+
104+
case 0x5: // ASCII string
105+
final li = _readLenAndStart(offset, info);
106+
return BPlistString(
107+
ascii.decode(
108+
_bytes.sublist(li.start, li.start + li.length),
109+
allowInvalid: true,
110+
),
111+
);
112+
113+
case 0x6: // UTF-16BE string; length is char count
114+
final li = _readLenAndStart(offset, info);
115+
return BPlistString(_decodeUtf16Be(_bytes, li.start, li.length));
116+
117+
case 0xA: // array
118+
final li = _readLenAndStart(offset, info);
119+
final refs = <int>[];
120+
for (var i = 0; i < li.length; i++) {
121+
refs.add(
122+
_readBigEndianInt(
123+
_bytes,
124+
li.start + i * _objectRefSize,
125+
_objectRefSize,
126+
),
127+
);
128+
}
129+
return BPlistArray(refs.map(_readObject).toList(growable: false));
130+
131+
case 0x8: // UID — CFKeyedArchiver reference (1..4 bytes; info encodes byteCount - 1)
132+
final byteCount = info + 1;
133+
return BPlistUID(_readBigEndianInt(_bytes, offset + 1, byteCount));
134+
135+
case 0xD: // dict
136+
final li = _readLenAndStart(offset, info);
137+
final keys = <int>[];
138+
final values = <int>[];
139+
for (var i = 0; i < li.length; i++) {
140+
keys.add(
141+
_readBigEndianInt(
142+
_bytes,
143+
li.start + i * _objectRefSize,
144+
_objectRefSize,
145+
),
146+
);
147+
}
148+
for (var i = 0; i < li.length; i++) {
149+
values.add(
150+
_readBigEndianInt(
151+
_bytes,
152+
li.start + (li.length + i) * _objectRefSize,
153+
_objectRefSize,
154+
),
155+
);
156+
}
157+
final map = <String, BPlistObject>{};
158+
for (var i = 0; i < li.length; i++) {
159+
final key = _readObject(keys[i]);
160+
if (key is! BPlistString) {
161+
throw FormatException(
162+
'bplist dict key at index $i is not a string',
163+
);
164+
}
165+
map[key.value] = _readObject(values[i]);
166+
}
167+
return BPlistDict(map);
168+
169+
default:
170+
throw FormatException(
171+
'unsupported bplist marker 0x${marker.toRadixString(16)}',
172+
);
173+
}
174+
}
175+
176+
/// Decodes a length field that appears inline in a marker byte or, for
177+
/// info == 0x0F, as a trailing integer marker followed by the int bytes.
178+
/// Returns the length and the start offset of the payload data.
179+
_LenAndStart _readLenAndStart(int markerOffset, int info) {
180+
if (info != 0x0F) {
181+
return _LenAndStart(info, markerOffset + 1);
182+
}
183+
final intMarker = _bytes[markerOffset + 1];
184+
if ((intMarker >> 4) != 0x1) {
185+
throw FormatException(
186+
'expected int marker after 0x0F length, got 0x${intMarker.toRadixString(16)}',
187+
);
188+
}
189+
final intByteCount = 1 << (intMarker & 0x0F);
190+
final length = _readBigEndianInt(_bytes, markerOffset + 2, intByteCount);
191+
return _LenAndStart(length, markerOffset + 2 + intByteCount);
192+
}
193+
194+
static int _readBigEndianInt(Uint8List bytes, int offset, int size) {
195+
var value = 0;
196+
for (var i = 0; i < size; i++) {
197+
value = (value << 8) | bytes[offset + i];
198+
}
199+
return value;
200+
}
201+
202+
static double _readBigEndianReal(Uint8List bytes, int offset, int size) {
203+
final bd = ByteData.sublistView(bytes, offset, offset + size);
204+
if (size == 4) return bd.getFloat32(0, Endian.big);
205+
if (size == 8) return bd.getFloat64(0, Endian.big);
206+
throw FormatException('unsupported bplist real size: $size');
207+
}
208+
209+
static String _decodeUtf16Be(Uint8List bytes, int start, int charCount) {
210+
final units = <int>[];
211+
for (var i = 0; i < charCount; i++) {
212+
final offset = start + i * 2;
213+
units.add((bytes[offset] << 8) | bytes[offset + 1]);
214+
}
215+
return String.fromCharCodes(units);
216+
}
217+
}
218+
219+
class _LenAndStart {
220+
final int length;
221+
final int start;
222+
const _LenAndStart(this.length, this.start);
223+
}

0 commit comments

Comments
 (0)