Add ODBC-native row-wise array fetch for fast bulk retrieval#511
Merged
Conversation
…eval Query<Record>().All()/Range() (and two-record JOIN tuples) now bind result columns row-wise directly into the caller's std::vector<Record> storage and pull whole row blocks per SQLFetchScroll round-trip, instead of one SQLFetch per row. This is the read-side mirror of the CreateAll/UpdateAll batch-write path and collapses ODBC round-trips from N to ~ceil(N/depth) — the win on high-latency links where receiving e.g. 1000 rows previously cost 1000 fetches. The path is transparent and gated: a record qualifies when every result column is row-bindable (the write-side SqlRowBindableColumn set: primitives, date/ time/datetime, numeric, char fixed-capacity strings, and non-numeric optionals of those) and the driver supports row-array fetching. Anything else (growable strings/binary, GUID, variant) falls back to the unchanged per-row path with byte-identical results. Values land in place (zero-copy); nullable columns use an over-allocated row-strided NULL indicator, and optionals are pre-engaged and reset on NULL. Char fixed strings bind SQL_C_CHAR inline with a per-row length/trim fixup; on PostgreSQL (psqlODBC transcodes SQL_C_CHAR through the client codepage) records carrying one fall back to the per-row wide path, gated by the new SqlConnection::RoundTripsNarrowTextByteExact capability so the server-type decision stays on the connection. - SqlStatement::FetchAllRowWise + BindRowWiseValue/FinalizeRowWiseOutputColumn: SQL_ATTR_ROW_BIND_TYPE = sizeof(Record), grow-and-rebind per block, memory- budget-clamped depth, and a Finally guard restoring single-row state on every exit (incl. exceptions). - SqlConnection::SupportsNativeRowArrayFetch / RoundTripsNarrowTextByteExact. - DataMapper eligibility (CanRowWiseFetchRecord/CanRowWiseFetchTuple + narrow- text carve-out) and the ReadResults wiring for both single and tuple results. Tested against sqlite3, mssql2022 and postgres: new RowWiseFetchTests cover fixed/nullable/temporal/fixed-string types, NULL/empty/full-capacity values, multi-block boundaries, empty results, Where/Range, statement reuse and the std::string fallback, asserting via a block-fetch-counting logger that the fast path actually ran (and that fixed-string records fall back on PostgreSQL). Signed-off-by: Christian Parpart <c.parpart@lastrada.net>
added 3 commits
June 22, 2026 11:13
A hidden ([.rowwisefetchbench]) benchmark times the shipped row-wise block fetch (Query<>().All()) against a faithful reproduction of the per-row fallback over a large dataset, materializing the same vector. Tunable via the ROWFETCH_BENCH_ROWS env var (default 500'000). Measured (release, median of 5 reps): - SQLite in-process : ~1.2x (per-SQLFetch overhead only; no socket) - PostgreSQL (TCP) : ~1.5x - SQL Server (TCP) : ~2.0x The win grows with per-round-trip transport cost: in-process gains little, a localhost socket already 1.5-2x. For 200k rows the row-wise path issues ~196 SQLFetchScroll calls vs 200k SQLFetch calls (~1000x fewer round-trips), so on a high-latency link — where wall-clock is dominated by RTT * round-trips — the speedup approaches the array depth. Signed-off-by: Christian Parpart <c.parpart@lastrada.net>
- RowWiseFetchTests: parenthesize depth*2+7 (readability-math-missing-parentheses) and use std::cmp_equal for the depth/total spot-checks (modernize-use-integer-sign-comparison). - Doc comments: doxygen cannot resolve @ref to private members or concepts, so the public comments now use @c for SqlRowBindableColumn, FetchAllRowWise, BindRowWiseOutputColumn, FinalizeRowWiseOutputColumn and RoundTripsNarrowTextByteExact; also fix a stale @ref to a renamed method. Signed-off-by: Christian Parpart <c.parpart@lastrada.net>
…ch loops
Classic result iteration (while(cursor.FetchRow()){ GetColumn<T>(i) },
bound output columns, SqlRowIterator<T>, SqlVariantRowCursor) issued one
SQLFetch per row -- one network round-trip per row, which dominates
wall-clock for large result sets on TCP backends. The recent row-wise
array fetch only sped up the materializing DataMapper paths (All/Range/
First); the lazy cursor loops were left on the per-row path and cannot be
rewritten across all client code.
Back the classic cursor transparently with the existing RowArrayCursor:
on the first fetch of an eligible result set the statement arms a block
buffer (SQL_ATTR_ROW_ARRAY_SIZE) and serves FetchRow()/GetColumn<T>() and
bound-column scatters from it, cutting round-trips from N to
ceil(N/depth). On by default; depth is a single connection-level knob
(SqlConnection::SetDefaultPrefetchDepth, default PrefetchDepthDefault =
1000; <= 1 disables). Capability-gated by SupportsNativeRowArrayFetch().
Eligibility is restricted to fixed-width numeric, temporal and GUID
columns, whose block reconstruction is byte-identical to the per-row
binder on every backend. Result sets carrying character/text, NUMERIC,
TIME, binary or LOB columns transparently stay on the per-row path
(faithful materialization of those is not uniform across backends: MSSQL
returns narrow text in the client codepage, SQLite's dynamic typing
reports unreliable text sizes). A new SqlLogger::OnFetchBlock hook makes
the round-trip reduction observable/testable.
Performance: round-trips drop ~1000x at depth 1000 for eligible sets;
no regression on the per-row path (no allocation when disabled/ineligible).
Risk: an active cursor reads ahead up to one block (a few MB, budget-
clamped); the connection knob is the global escape hatch.
Tested: sqlite3, mssql2022 (Docker), postgres (Docker 16.4) -- full suite
shows no regression vs the pre-change baseline; new [prefetch] suite green
on all three. Build clean under clangcl-debug (PEDANTIC /WX).
Signed-off-by: Christian Parpart <c.parpart@lastrada.net>
added 2 commits
June 22, 2026 18:44
…g-tidy, docs, style Address the CI matrix failures from the block-prefetch commit: - Cross-type read regression (the PostgreSQL/Windows dbtool failures): reading a prefetched numeric/temporal/GUID column as a string (GetColumn<std::string>, as dbtool's generic `exec` printer does) returned an empty string because ConvertCell only rendered character-bound cells. RenderCellAsUtf8 now formats every bound type to text (integers byte-identical to the driver; floating/ temporal/GUID via std::formatter), matching the per-row SQLGetData(SQL_C_CHAR) behaviour. Adds a [prefetch] regression test for the all-numeric-read-as-text case. - clang-tidy (-warnings-as-errors): split is moot — fixed at source. Test file: math-missing-parentheses, integer-sign-comparison, nested conditional operator, std::move on trivially-copyable fixed strings, unchecked optional access. Header: unused-lambda-capture (explicit this-> on the member call). ConvertCell was also split into per-category helpers to stay under the cognitive-complexity threshold. - Doc coverage (doxygen): @ref PrefetchDepthDefault -> @c (it is a value, not a ref target) in SqlConnection.hpp and SqlConnectInfo.hpp; drop the @param naming an unnamed parameter on SqlLogger::OnFetchBlock (described in the brief instead). - C++ style (clang-format-22): restore the single-line empty deleter lambda. Verified: clangcl-debug builds clean; [prefetch] suite green on sqlite3, mssql2022 (Docker), postgres (Docker); dbtool `exec` renders numeric columns. Signed-off-by: Christian Parpart <c.parpart@lastrada.net>
The block-prefetch GUID test dereferenced the TryParse result after a Catch2 REQUIRE, which clang-tidy's bugprone-unchecked-optional-access does not track as a guard (-warnings-as-errors). Add an explicit `if (has_value())` so the optional access is statically checked while keeping the REQUIRE as the failure signal. Signed-off-by: Christian Parpart <c.parpart@lastrada.net>
Yaraslaut
approved these changes
Jun 23, 2026
Yaraslaut
left a comment
Member
There was a problem hiding this comment.
Thanks a lot for the improvement
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Customers on high-latency links wait the longest when retrieving many rows because each row is its own
SQLFetchround-trip — fetching 1000 rows costs 1000 round-trips. This adds the read-side counterpart of the existingCreateAll/UpdateAllbatch-write path:Query<Record>().All()/Range()(and two-record JOIN tuples) now bind result columns row-wise directly into the caller'sstd::vector<Record>storage and pull whole row blocks perSQLFetchScroll, collapsing the round-trips from N to ~ceil(N/depth). Values land in place (zero-copy); results are byte-identical to the per-row path.The fast path is transparent and gated — a record qualifies when every result column is row-bindable (the same
SqlRowBindableColumnset the write side uses: primitives, date/time/datetime, numeric, char fixed-capacity strings, and non-numeric optionals of those) and the driver supports row-array fetching. Anything else (growable strings/binary, GUID, variant) transparently falls back to the unchanged per-row path. Verified against SQLite, SQL Server 2022 and PostgreSQL.Changes
SqlStatement::FetchAllRowWise— new low-level primitive mirroringExecuteBatchNativeRowWise: setsSQL_ATTR_ROW_BIND_TYPE = sizeof(Record)+SQL_ATTR_ROW_ARRAY_SIZE, grows-and-rebinds the destination per block, clamps depth to a memory budget, and restores single-row statement state via aFinallyguard on every exit (including exceptions). Nullable columns use an over-allocated row-strided NULL indicator; optionals are pre-engaged and reset on NULL. Char fixed strings bindSQL_C_CHARinline with a per-row length/trim fixup.SqlConnection::SupportsNativeRowArrayFetchandRoundTripsNarrowTextByteExact— driver capability checks kept on the connection. The latter carves out PostgreSQL (whose psqlODBC transcodesSQL_C_CHARthrough the client codepage), so records carrying a fixed-capacity string fall back to the per-row wide path there rather than risk mangling non-ASCII bytes.DataMapperretrieval wiring — compile-time eligibility (CanRowWiseFetchRecord/CanRowWiseFetchTupleplus the narrow-text carve-out) and theReadResultsbranch for both single-record and two-record tuple result sets.RowWiseFetchTests— covers fixed/nullable/temporal/fixed-string columns, NULL/empty/full-capacity values, multi-block boundaries, empty results,Where/Range, statement reuse and thestd::stringfallback, asserting via a block-fetch-counting logger that the fast path actually ran (and that fixed-string records fall back on PostgreSQL). A hidden opt-in benchmark ([.rowwisefetchbench]) reproduces the comparison below.Performance
The structural win is round-trip count: for 200k rows the row-wise path issues ~196
SQLFetchScrollcalls instead of 200,000SQLFetchcalls (~1000x fewer round-trips). Measured end-to-end (release build, median of 5 reps,Query<>().All()vs the per-row fallback, both materializing the samestd::vector<Record>):SQLFetchThe speedup scales with per-round-trip transport cost. In-process SQLite has no socket, so its ~1.2x is purely reduced per-call CPU/ODBC overhead — the floor, and a latency-independent constant. Over a localhost socket (sub-millisecond RTT) it is already 1.5-2x.
On a high-latency link the round-trip term dominates: wall-clock ≈ round-trips × RTT, so the per-row path pays ~N × RTT while the block path pays ~
ceil(N/depth)× RTT, and the speedup approaches the effective array depth (up to ~1000x for narrow records, less for wide rows that clamp to a smaller depth, and divided by whatever the driver already prefetches per fetch). For example, modelled at 50 ms RTT for 100k rows: ~83 min (per-row) vs ~5 s (block). The local numbers above are a conservative lower bound demonstrating the mechanism; the round-trip-count reduction is what carries the win to the WAN case.