fix(sqle,admin): preserve legacy encoding on ALTER + add safe schema-encoding-drift heal toolkit#11126
Open
realies wants to merge 1 commit into
Open
Conversation
7dd44f4 to
b54cd10
Compare
…a safe schema-encoding-drift heal toolkit UseAdaptiveEncoding became the default in v2.0.7 (dolthub#11017), surfacing a class of encoding-drift bugs where a TEXT/BLOB/JSON column's persisted encoding tag and its on-disk row data disagree, causing `panic: invalid hash length: N` on read and write. - Preserve the legacy storage encoding across ALTER TABLE ... MODIFY (and the rewrite/ADD/DROP path), extended to JSON + geometry families, so a 1.x-written StringAddr/BytesAddr column is not silently re-tagged adaptive. - Add `dolt admin schema-encoding-drift` (check/repair/recover-rows/migrate-adaptive) to detect and safely heal already-drifted columns. `repair` is hardened to refuse heterogeneous/genuine-adaptive/dolt_ignore'd columns; `migrate-adaptive` is the forward heal; dolt_ignore'd tables are force-inlined (zero out-of-line refs) so the working-set persist cannot dangle; the ambiguous 0x00-leading-20-byte decision uses an authoritative read that fails loud on a read error. See the PR description for the full root-cause analysis, design rationale, and test plan.
b54cd10 to
02564b0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix adaptive-encoding drift: preserve legacy encoding on ALTER, and add a safe
schema-encoding-driftheal toolkitSummary
UseAdaptiveEncodingbecame the default in v2.0.7 (#11017). This surfaced a class ofencoding-drift bugs where a TEXT/BLOB/JSON column's persisted encoding tag and its
on-disk row data disagree, producing
panic: invalid hash length: Non read and on write.This PR:
ALTER TABLE ... MODIFYso a 1.x-writtenStringAddr/BytesAddrcolumn is not silently re-tagged adaptive while its rows remainraw 20-byte hashes.
dolt admin schema-encoding-drifttoolkit to detect and safely heal columnsthat already drifted, including a forward
migrate-adaptiveheal.repairflipcould strand genuine adaptive rows under a legacy schema tag.
repair/recover-rows/checkare hardened, andmigrate-adaptiveis the safe forward path.Background
In v2.0.7+,
blobStringType.Encoding()(TEXT) /varBinaryType.Encoding()(BLOB) consult theglobal
UseAdaptiveEncodingflag when a column's per-fieldencis the zero value. A column whosein-memory
TypeInfowas rebuilt from a SQL type (rather than deserialized from disk, whichpins
enc) therefore serialized as the adaptive variant even though its rows were still inlegacy raw-hash (
StringAddrEnc/BytesAddrEnc) form. The adaptive reader then parses a raw20-byte hash as
[varint length][address]and crashes inhash.New.The reverse mismatch is just as damaging and is the focus of most of this PR (see Root cause).
Root cause
1. ALTER MODIFY re-tagged surviving columns (encoder side is fine)
ToDoltCol/ToDoltSchemarebuild columns viaFromSqlType, which returns aTypeInfowith
enc == 0. On an in-placeMODIFY(or a rewrite ALTER), the surviving TEXT/BLOBcolumns were re-serialized with
enc == 0→ theUseAdaptiveEncodingfallback flipped theirpersisted tag to adaptive while their rows were untouched.
Note: the steady-state INSERT/UPDATE value encoder is not implicated — it builds the
value
TupleDescriptorfrom the deserialized schema, whoseencis pinned bydeserializeColumnsviaWithEncoding. The fix belongs at the schema-write boundary (ALTER),not the writer.
2. The heal command could cause corruption:
repair's schema-only flipThe drift toolkit's
repairsubcommand flips a column's persisted tag from its adaptivevariant back to the legacy sibling without rewriting row data ("data unchanged"). That is
only safe if every row is already in legacy form. On a column that still holds genuine
adaptive content, the flip strands every adaptive row under a legacy schema tag, producing
two failure modes:
GetStringAddr → GetAddr → hash.Newruns on the adaptive-inline payload(e.g. a 59-byte
[0x00]<utf8>value) and panicsinvalid hash length: 59.countAddresseswalks everylegacy-AddrEnc field of every row in the node and calls
hash.Newon each. A strandednon-20-byte adaptive field panics there — so a single stranded row poisons all writes to
the table.
The gap that let
repairflip such a column: its scan classified a non-emptyadaptive-inline value (small
[0x00]<content>, not spilled out-of-band) as merely"ambiguous" rather than "definitive adaptive". A column with both a legacy witness and inline
adaptive content therefore fell through to the flip branch. A
[0x00]-leading 20-bytevalue (inline content of 19 bytes) is additionally ambiguous with a legacy hash that happens
to start
0x00, and was silently misread as a bogus address — a quiet corruption, not aloud panic.
3.
dolt_ignore'd tables: out-of-band adaptive values dangleA
dolt_ignore'd table's data lives only in the working set; its content chunks are neverrooted in a commit, and GC can reclaim never-committed chunks. So an out-of-band (addressed)
adaptive value on such a table references a chunk that isn't durable: persisting the working
set dangling-faults on the un-rooted chunk, and GC would reclaim it even if it persisted.
Forward-migrating these tables therefore has to avoid out-of-line refs entirely.
Changes
Preserve legacy encoding across ALTER
MODIFY: inherit the existing column's encoding onto the freshly-builtTypeInfowhen the column kind is unchanged.
preserveSurvivingColumnEncodingsre-pins every surviving TEXT/BLOBcolumn's encoding after
ToDoltSchema. Extended to JSON + the geometry families.dolt admin schema-encoding-drifttoolkitcheck— multi-database, full-payload scan; classifies each row's field bytes(legacy raw-hash / adaptive-inline / adaptive-addressed / empty); reports drift and hints the
correct heal command.
repair— schema-only tag flip, now guarded by a full-payload scan: a non-emptynon-20-byte adaptive-inline value is counted as definitive adaptive;
repairrefuses anycolumn with both legacy and adaptive rows (heterogeneous), any genuine-adaptive column, and
any
dolt_ignore'd table. It only flips a column proven homogeneously legacy (or, with--include-empty, the all-empty bucket).recover-rows— row-by-row migration to canonical legacy form for committed tables; refusesdolt_ignore'd tables.migrate-adaptive(new) — the forward heal: rewrites legacy rows to canonical adaptive,keeps already-adaptive rows, flips the schema to the adaptive sibling. Row data + schema move
together (a commit for committed tables; a working-set persist for
dolt_ignore'd ones).Content is preserved; supports
--dry-run.Force-inline for
dolt_ignore'd tables (the persist fix)migrate-adaptivetargets adolt_ignore'd table it force-inlines every value ofevery out-of-band-capable column (String/Bytes/JSON/geometry) in the table — zero out-of-line refs — so the values are
self-contained, GC-immune, and persistable without a dangling ref. An in-code invariant
asserts zero addressed refs before persisting (fail-fast), and the command prints a
storage report:
storage: N values inline, M addressed (M MUST be 0 for a dolt_ignore'd table).Authoritative reads fail loud on transient errors
legacy-vs-inline.
readContentAuthoritativenow returns(content, present, err)anddistinguishes a genuine absence (recovered empty-chunk → treat as inline) from a
transient read error (propagated). The migrate path no longer risks inlining a present
legacy row's 20 address bytes as content because a read momentarily failed.
migrate-adaptivepath uses thisReadBytes-attempt; the legacy-directionrecover-rowsresolver intentionally keeps thechunkStore.Hascheck, since switching it isnot behavior-equivalent there —
Haspropagates a transient chunkstore error that theReadBytes-attempt would swallow, and on the table's own (complete) chunkstoreHas⟺ReadBytes-success are consistent.Why these choices
UseAdaptiveEncodingfallback in the type getter — that would regressthe v2.0.7 product default for newly-created tables. The encoding decision belongs at the
schema-write boundary.
dolt_ignore'd tables — committed tables can hold out-of-bandadaptive values safely (their chunks are rooted). Inlining is required precisely where chunks
are never committed.
repairrefuses rather than guesses — a schema-only flip cannot be safe on a column withreal content;
migrate-adaptive/recover-rows(which rewrite rows) are the safe heals.Testing
Go regressions (each fails on the pre-fix code, passes on this PR):
repairmust refuse a legacy-witness + adaptive-inline heterogeneous column (theinline-stranding shape that the first-witness scan missed).
repairmust refuse a 20-byte0x00-leading inline value coexisting with a legacy witness(the quiet-misread shape).
migrate-adaptiveheals a stranded column byte-identically; reading the previously-panickingrows returns their exact content.
migrate-adaptiveon adolt_ignore'd table reports0 addressed(force-inline invariant).resolveFieldToAdaptivepropagates a transient read error during 0x00-lead disambiguationinstead of swallowing it as inline.
ALTER MODIFY preserves encodingandrecover-rowsheterogeneous /sibling-AddrEnc cases.
bats (cross-version compat suite): a panic-shape assertion (
invalid hash length/panic recovered/runtime errormust not leak into output) on the legacy-TEXT/BLOB read path; aCLI
migrate-adaptiveheal +repairrefusal; and thedolt_ignore'd-table force-inlinestorage-report assertion (
0 addressed).Validated end-to-end on a production dolt database: every affected column read back
byte-identical after the heal (content hashes diffed pre/post), zero genuinely-missing chunks,
zero out-of-line refs on the ignored tables, and both the read and the write/flush panics
resolved.
Minor
dolt admin schema-encoding-drift --helpdoes not list its subcommands (check,repair,recover-rows,migrate-adaptive); they work when invoked but aren't discoverable. Worth aone-line registration fix so the subcommand list renders.