Skip to content

Add ODBC-native row-wise array fetch for fast bulk retrieval#511

Merged
christianparpart merged 6 commits into
masterfrom
feature/odbc-native-fast-fetch
Jun 23, 2026
Merged

Add ODBC-native row-wise array fetch for fast bulk retrieval#511
christianparpart merged 6 commits into
masterfrom
feature/odbc-native-fast-fetch

Conversation

@christianparpart

@christianparpart christianparpart commented Jun 22, 2026

Copy link
Copy Markdown
Member

Customers on high-latency links wait the longest when retrieving many rows because each row is its own SQLFetch round-trip — fetching 1000 rows costs 1000 round-trips. This adds the read-side counterpart of the existing CreateAll/UpdateAll batch-write path: Query<Record>().All()/Range() (and two-record JOIN tuples) now bind result columns row-wise directly into the caller's std::vector<Record> storage and pull whole row blocks per SQLFetchScroll, collapsing the round-trips from N to ~ceil(N/depth). Values land in place (zero-copy); results are byte-identical to the per-row path.

The fast path is transparent and gated — a record qualifies when every result column is row-bindable (the same SqlRowBindableColumn set the write side uses: primitives, date/time/datetime, numeric, char fixed-capacity strings, and non-numeric optionals of those) and the driver supports row-array fetching. Anything else (growable strings/binary, GUID, variant) transparently falls back to the unchanged per-row path. Verified against SQLite, SQL Server 2022 and PostgreSQL.

Changes

  • SqlStatement::FetchAllRowWise — new low-level primitive mirroring ExecuteBatchNativeRowWise: sets SQL_ATTR_ROW_BIND_TYPE = sizeof(Record) + SQL_ATTR_ROW_ARRAY_SIZE, grows-and-rebinds the destination per block, clamps depth to a memory budget, and restores single-row statement state via a Finally guard on every exit (including exceptions). Nullable columns use an over-allocated row-strided NULL indicator; optionals are pre-engaged and reset on NULL. Char fixed strings bind SQL_C_CHAR inline with a per-row length/trim fixup.
  • SqlConnection::SupportsNativeRowArrayFetch and RoundTripsNarrowTextByteExact — driver capability checks kept on the connection. The latter carves out PostgreSQL (whose psqlODBC transcodes SQL_C_CHAR through the client codepage), so records carrying a fixed-capacity string fall back to the per-row wide path there rather than risk mangling non-ASCII bytes.
  • DataMapper retrieval wiring — compile-time eligibility (CanRowWiseFetchRecord/CanRowWiseFetchTuple plus the narrow-text carve-out) and the ReadResults branch for both single-record and two-record tuple result sets.
  • RowWiseFetchTests — covers fixed/nullable/temporal/fixed-string columns, NULL/empty/full-capacity values, multi-block boundaries, empty results, Where/Range, statement reuse and the std::string fallback, asserting via a block-fetch-counting logger that the fast path actually ran (and that fixed-string records fall back on PostgreSQL). A hidden opt-in benchmark ([.rowwisefetchbench]) reproduces the comparison below.

Performance

The structural win is round-trip count: for 200k rows the row-wise path issues ~196 SQLFetchScroll calls instead of 200,000 SQLFetch calls (~1000x fewer round-trips). Measured end-to-end (release build, median of 5 reps, Query<>().All() vs the per-row fallback, both materializing the same std::vector<Record>):

Backend Transport per-row SQLFetch row-wise block fetch speedup
SQLite in-process 643 ms / 500k 541 ms 1.19x
PostgreSQL localhost TCP 167 ms / 200k 109 ms 1.53x
SQL Server 2022 localhost TCP 134 ms / 200k 67 ms 2.01x

The speedup scales with per-round-trip transport cost. In-process SQLite has no socket, so its ~1.2x is purely reduced per-call CPU/ODBC overhead — the floor, and a latency-independent constant. Over a localhost socket (sub-millisecond RTT) it is already 1.5-2x.

On a high-latency link the round-trip term dominates: wall-clock ≈ round-trips × RTT, so the per-row path pays ~N × RTT while the block path pays ~ceil(N/depth) × RTT, and the speedup approaches the effective array depth (up to ~1000x for narrow records, less for wide rows that clamp to a smaller depth, and divided by whatever the driver already prefetches per fetch). For example, modelled at 50 ms RTT for 100k rows: ~83 min (per-row) vs ~5 s (block). The local numbers above are a conservative lower bound demonstrating the mechanism; the round-trip-count reduction is what carries the win to the WAN case.

…eval

Query<Record>().All()/Range() (and two-record JOIN tuples) now bind result
columns row-wise directly into the caller's std::vector<Record> storage and
pull whole row blocks per SQLFetchScroll round-trip, instead of one SQLFetch
per row. This is the read-side mirror of the CreateAll/UpdateAll batch-write
path and collapses ODBC round-trips from N to ~ceil(N/depth) — the win on
high-latency links where receiving e.g. 1000 rows previously cost 1000 fetches.

The path is transparent and gated: a record qualifies when every result column
is row-bindable (the write-side SqlRowBindableColumn set: primitives, date/
time/datetime, numeric, char fixed-capacity strings, and non-numeric optionals
of those) and the driver supports row-array fetching. Anything else (growable
strings/binary, GUID, variant) falls back to the unchanged per-row path with
byte-identical results. Values land in place (zero-copy); nullable columns use
an over-allocated row-strided NULL indicator, and optionals are pre-engaged and
reset on NULL. Char fixed strings bind SQL_C_CHAR inline with a per-row
length/trim fixup; on PostgreSQL (psqlODBC transcodes SQL_C_CHAR through the
client codepage) records carrying one fall back to the per-row wide path,
gated by the new SqlConnection::RoundTripsNarrowTextByteExact capability so the
server-type decision stays on the connection.

- SqlStatement::FetchAllRowWise + BindRowWiseValue/FinalizeRowWiseOutputColumn:
  SQL_ATTR_ROW_BIND_TYPE = sizeof(Record), grow-and-rebind per block, memory-
  budget-clamped depth, and a Finally guard restoring single-row state on every
  exit (incl. exceptions).
- SqlConnection::SupportsNativeRowArrayFetch / RoundTripsNarrowTextByteExact.
- DataMapper eligibility (CanRowWiseFetchRecord/CanRowWiseFetchTuple + narrow-
  text carve-out) and the ReadResults wiring for both single and tuple results.

Tested against sqlite3, mssql2022 and postgres: new RowWiseFetchTests cover
fixed/nullable/temporal/fixed-string types, NULL/empty/full-capacity values,
multi-block boundaries, empty results, Where/Range, statement reuse and the
std::string fallback, asserting via a block-fetch-counting logger that the
fast path actually ran (and that fixed-string records fall back on PostgreSQL).

Signed-off-by: Christian Parpart <c.parpart@lastrada.net>
Christian Parpart added 3 commits June 22, 2026 11:13
A hidden ([.rowwisefetchbench]) benchmark times the shipped row-wise block
fetch (Query<>().All()) against a faithful reproduction of the per-row fallback
over a large dataset, materializing the same vector. Tunable via the
ROWFETCH_BENCH_ROWS env var (default 500'000).

Measured (release, median of 5 reps):
  - SQLite in-process : ~1.2x (per-SQLFetch overhead only; no socket)
  - PostgreSQL (TCP)  : ~1.5x
  - SQL Server (TCP)  : ~2.0x

The win grows with per-round-trip transport cost: in-process gains little, a
localhost socket already 1.5-2x. For 200k rows the row-wise path issues ~196
SQLFetchScroll calls vs 200k SQLFetch calls (~1000x fewer round-trips), so on a
high-latency link — where wall-clock is dominated by RTT * round-trips — the
speedup approaches the array depth.

Signed-off-by: Christian Parpart <c.parpart@lastrada.net>
- RowWiseFetchTests: parenthesize depth*2+7 (readability-math-missing-parentheses)
  and use std::cmp_equal for the depth/total spot-checks
  (modernize-use-integer-sign-comparison).
- Doc comments: doxygen cannot resolve @ref to private members or concepts, so
  the public comments now use @c for SqlRowBindableColumn, FetchAllRowWise,
  BindRowWiseOutputColumn, FinalizeRowWiseOutputColumn and
  RoundTripsNarrowTextByteExact; also fix a stale @ref to a renamed method.

Signed-off-by: Christian Parpart <c.parpart@lastrada.net>
…ch loops

Classic result iteration (while(cursor.FetchRow()){ GetColumn<T>(i) },
bound output columns, SqlRowIterator<T>, SqlVariantRowCursor) issued one
SQLFetch per row -- one network round-trip per row, which dominates
wall-clock for large result sets on TCP backends. The recent row-wise
array fetch only sped up the materializing DataMapper paths (All/Range/
First); the lazy cursor loops were left on the per-row path and cannot be
rewritten across all client code.

Back the classic cursor transparently with the existing RowArrayCursor:
on the first fetch of an eligible result set the statement arms a block
buffer (SQL_ATTR_ROW_ARRAY_SIZE) and serves FetchRow()/GetColumn<T>() and
bound-column scatters from it, cutting round-trips from N to
ceil(N/depth). On by default; depth is a single connection-level knob
(SqlConnection::SetDefaultPrefetchDepth, default PrefetchDepthDefault =
1000; <= 1 disables). Capability-gated by SupportsNativeRowArrayFetch().

Eligibility is restricted to fixed-width numeric, temporal and GUID
columns, whose block reconstruction is byte-identical to the per-row
binder on every backend. Result sets carrying character/text, NUMERIC,
TIME, binary or LOB columns transparently stay on the per-row path
(faithful materialization of those is not uniform across backends: MSSQL
returns narrow text in the client codepage, SQLite's dynamic typing
reports unreliable text sizes). A new SqlLogger::OnFetchBlock hook makes
the round-trip reduction observable/testable.

Performance: round-trips drop ~1000x at depth 1000 for eligible sets;
no regression on the per-row path (no allocation when disabled/ineligible).
Risk: an active cursor reads ahead up to one block (a few MB, budget-
clamped); the connection knob is the global escape hatch.

Tested: sqlite3, mssql2022 (Docker), postgres (Docker 16.4) -- full suite
shows no regression vs the pre-change baseline; new [prefetch] suite green
on all three. Build clean under clangcl-debug (PEDANTIC /WX).

Signed-off-by: Christian Parpart <c.parpart@lastrada.net>
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Jun 22, 2026
Christian Parpart added 2 commits June 22, 2026 18:44
…g-tidy, docs, style

Address the CI matrix failures from the block-prefetch commit:

- Cross-type read regression (the PostgreSQL/Windows dbtool failures): reading a
  prefetched numeric/temporal/GUID column as a string (GetColumn<std::string>,
  as dbtool's generic `exec` printer does) returned an empty string because
  ConvertCell only rendered character-bound cells. RenderCellAsUtf8 now formats
  every bound type to text (integers byte-identical to the driver; floating/
  temporal/GUID via std::formatter), matching the per-row SQLGetData(SQL_C_CHAR)
  behaviour. Adds a [prefetch] regression test for the all-numeric-read-as-text
  case.
- clang-tidy (-warnings-as-errors): split is moot — fixed at source. Test file:
  math-missing-parentheses, integer-sign-comparison, nested conditional operator,
  std::move on trivially-copyable fixed strings, unchecked optional access.
  Header: unused-lambda-capture (explicit this-> on the member call). ConvertCell
  was also split into per-category helpers to stay under the cognitive-complexity
  threshold.
- Doc coverage (doxygen): @ref PrefetchDepthDefault -> @c (it is a value, not a
  ref target) in SqlConnection.hpp and SqlConnectInfo.hpp; drop the @param naming
  an unnamed parameter on SqlLogger::OnFetchBlock (described in the brief instead).
- C++ style (clang-format-22): restore the single-line empty deleter lambda.

Verified: clangcl-debug builds clean; [prefetch] suite green on sqlite3,
mssql2022 (Docker), postgres (Docker); dbtool `exec` renders numeric columns.

Signed-off-by: Christian Parpart <c.parpart@lastrada.net>
The block-prefetch GUID test dereferenced the TryParse result after a Catch2
REQUIRE, which clang-tidy's bugprone-unchecked-optional-access does not track as
a guard (-warnings-as-errors). Add an explicit `if (has_value())` so the optional
access is statically checked while keeping the REQUIRE as the failure signal.

Signed-off-by: Christian Parpart <c.parpart@lastrada.net>

@Yaraslaut Yaraslaut left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the improvement

@christianparpart christianparpart merged commit 01db167 into master Jun 23, 2026
29 checks passed
@christianparpart christianparpart deleted the feature/odbc-native-fast-fetch branch June 23, 2026 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Core API Data Mapper documentation Improvements or additions to documentation tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants