Skip to content

InsertRecords: union columns + emit DEFAULT for missing, instead of rejecting mixed-shape batches #53

@klaidliadon

Description

@klaidliadon

Problem

StatementBuilder.InsertRecords rejects batches where rows produce different column sets. After #50 added column-shape validation, any heterogeneous batch fails at build time:

type Order struct {
    Name      string    `db:"name"`
    Tags      []string  `db:"tags,omitzero"`
    CreatedAt time.Time `db:"created_at,omitempty"`
}
records := []Order{
    {Name: "first"},                                  // cols = [name]
    {Name: "second", Tags: []string{"foo"}},          // cols = [name, tags]
    {Name: "third",  CreatedAt: time.Now()},          // cols = [created_at, name]
}
DB.SQL.InsertRecords(records, "orders")
// err: "record 1 columns [name tags] differ from record 0 columns [name]"

This is a real footgun for any batch insert that mixes ,omitzero / ,omitempty fields. Workarounds today are all bad: loop with InsertRecord per row (kills batching), set "missing" fields to non-zero just to force emission (defeats the tag), or hand-write SQL with sq.Expr.

Why fail-fast was the conservative initial answer

Squirrel's InsertBuilder freezes the column list from row 0 and emits later rows' value tuples without re-checking width. Mixed-shape batches produced malformed multi-row SQL that only surfaced as a Postgres parse error at exec time. #50 caught it earlier with a clean build-time error.

But fail-fast is restrictive. Real callers will hit it.

Proposed fix

Union the columns across all rows; emit sq.Expr("DEFAULT") for any row that didn't include a given column. PostgreSQL accepts DEFAULT in any VALUES position:

INSERT INTO orders (created_at, name, tags) VALUES
  (DEFAULT, 'first',  DEFAULT),
  (DEFAULT, 'second', '{foo}'),
  (NOW(),   'third',  DEFAULT);

Every "missing" column corresponds to a tag (,omitzero / ,omitempty) where the user explicitly opted into DB-default semantics. DEFAULT is the correct fill.

Sketch

In builder.go InsertRecords:

  1. Pass 1: walk records, compute per-row (cols, vals) via Map. Build colSet := union of all cols across rows. Convert to sorted slice as allCols.
  2. Pass 2: for each row, pad to allCols shape: for any col in allCols not in the row's cols, substitute sq.Expr("DEFAULT").
  3. Emit insert.Columns(allCols...).Values(paddedRow0...).Values(paddedRow1...)....

Memory cost: holding all per-row (cols, vals) slices through pass 1. Not a concern for realistic batch sizes.

Edge cases:

  • All rows have identical shape → behavior unchanged from current.
  • Row's tagged-but-non-zero column appears in some rows but not others → tracked correctly by the union.
  • Column with no DB default missed from a row → PG returns NULL (if nullable) or constraint error (if NOT NULL) — same as if the user had emitted DEFAULT directly. PG-native, no special masking.

Out of scope

  • Mixed shapes where a column appears in some rows with no tag and absent in others (impossible — tags determine emission). Tagged columns are the only path to "absent in some rows."
  • DEFAULT in InsertRecord (single-row). Tracked in #51, addressed in #52 via InsertDefaults.
  • Multi-row DEFAULT VALUES (i.e. every column DEFAULT for every row). PG doesn't support multi-row DEFAULT VALUES; the union-with-DEFAULT approach above doesn't change that.

Tests

  • Unit: heterogeneous batch with three different shapes → verify INSERT (allCols...) VALUES ..., ..., ... with DEFAULT in the right slots.
  • Unit: existing uniform-shape batches → unchanged.
  • Integration: round-trip a heterogeneous batch against PG, verify each row landed with the expected mix of caller values and DB defaults.

Replaces current build-time error

The record N columns ... differ from record 0 columns ... error from #50 goes away. The fix is strictly more permissive — anything that built before still builds; cases that previously errored now produce valid SQL.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions