Skip to content

Keyless table semantics diverge from Dolt — adopt cardinality-based row identity #804

@timsehn

Description

@timsehn

Summary

Doltlite's handling of keyless tables (tables created without a PRIMARY KEY) doesn't match Dolt's. Dolt uses content-addressed row identity with cardinality counters, so cross-branch merges Just Work. Doltlite keeps SQLite's default auto-assigned rowid, so cross-branch inserts collide on rowid and produce phantom merge conflicts.

This is acknowledged today in src/build.c's convertToWithoutRowidTable comment:

Tables with no primary key at all keep sqlite's default rowid behavior — they work for normal queries but their inserts can collide on rowid across branches, so they're not suitable for version control. We don't reject them: sqlite-parity tests rely on keyless tables, and the failure mode for cross-branch use is loud enough (phantom merge conflicts) that users will figure it out without an upfront error.

That's a defensible position for an MVP, but it leaves doltlite incompatible with Dolt's keyless semantics — which matters as soon as someone tries to push/pull a doltlite database to/from Dolt or expects the two engines to agree on what a keyless merge should look like.

How Dolt does it

Per the Dolt docs and the dolt source:

  1. Row identity = the row's content. No hidden rowid. The bytes of all columns hash to a single key.
  2. Duplicates are tracked via a cardinality counter. Inserting the same row twice doesn't create two entries — it increments the counter on the single hash-keyed entry. SELECT COUNT(*) reads back the counters.
  3. Merges are 3-way over cardinalities. Conflict tables expose base_cardinality, our_cardinality, and their_cardinality for each row-hash. The merge algorithm:
    • identical content + same counter delta → no conflict, sum the deltas
    • both sides increment the same row → no conflict, sum counters
    • one side deletes (decrements) while the other increments → conflict surfaced via the three cardinality columns

Two branches each inserting ('Alice', 30) results in a single row with cardinality = 2 after merge. Two branches inserting different rows give two separate hash entries, no conflict. The "phantom conflict on rowid 1 vs rowid 1" failure mode is impossible by construction.

How doltlite does it today

  1. Row identity = auto-assigned rowid (SQLite's default INTEGER PRIMARY KEY alias on the hidden column).
  2. Duplicates are separate rows, each with its own rowid.
  3. Merges 3-way over rowids. Two branches both insert "Alice"; both get rowid 1; merge sees rowid 1 = Alice (A) vs rowid 1 = Alice (B) and either trivially dedupes (if exactly equal) or conflicts (if anything differs). Two branches inserting different rows at the same auto-rowid hit the phantom conflict.

The on-disk representation is SQLite's standard rowid-keyed prolly tree — none of Dolt's cardinality-counter machinery exists.

Why this matters

  • Doltlite ↔ Dolt interop. Anyone pushing a doltlite database to a Dolt remote (or vice versa) on a schema with keyless tables will hit semantic divergence: the same merge produces different conflict shapes depending on which engine ran it.
  • Cross-branch use within doltlite. Users following the SQLite-parity ergonomics (no PRIMARY KEY declaration, just insert) will hit phantom conflicts on their first merge. The current "users will figure it out from the error" stance is fine for a beta but not great for a 1.0 story.
  • Documentation gap. Doltlite's README sells itself as a SQLite-compatible drop-in with Git-like version control. Keyless tables are the one place where those two stories don't compose cleanly today.

Proposed direction (research/design needed)

Adopt Dolt's content-addressed cardinality model for keyless tables. Concretely:

  1. At CREATE TABLE time (in convertToWithoutRowidTable and friends), recognize tables with TF_HasPrimaryKey == 0 and route them to a new keyless storage path instead of falling through to SQLite's rowid pager.
  2. New storage layout: each row's serialized content hashes to its key; a per-row cardinality counter lives alongside the value. Probably reuses sortkey/blake3 infrastructure.
  3. INSERT semantics: hash the row, increment the cardinality (or set to 1 for a new hash).
  4. DELETE semantics: decrement; remove the entry at 0.
  5. SELECT semantics: iterate hash-keyed rows, expanding by cardinality (COUNT(*) reads counter sums).
  6. Merge: extend the 3-way merge logic to compare cardinality deltas. Surface base_cardinality / our_cardinality / their_cardinality in dolt_conflicts_<table> the same way Dolt does.

Alternative (smaller scope): keep the current rowid-based storage but at least reject keyless tables at CREATE TABLE time with a clear error message, telling users to add a PRIMARY KEY (or pass a flag to opt into the SQLite-parity rowid behavior with documented merge limitations). That gives parity-by-erroring-out without doing the cardinality work — but breaks the SQLite-parity tests that the existing comment calls out as the reason we accept keyless tables today.

Out of scope here

  • WITHOUT ROWID tables, INTEGER PRIMARY KEY tables, and tables with a declared PRIMARY KEY all work correctly today — only the no-PK case is affected.
  • Virtual tables, temp tables, views, and internal sqlite_* tables are explicitly skipped by convertToWithoutRowidTable and shouldn't change.

References

  • src/build.cconvertToWithoutRowidTable, especially the comment block around the keyless skip
  • src/record_codec.hDoltliteColInfo and the aColToRec[] comment (mentions the "identity for rowid-aliased and keyless tables" case)
  • Dolt's keyless table impl in dolthub/dolt — search for "keyless" / "cardinality"

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions