Summary
When an explicit transaction inserts rows into a Fuse table and COMMIT has to retry because another session committed the same table first, the final table data is correct, but system.statistics.stats_row_count can remain stale.
This is observable with SQL: after the retried commit, count(*) and system.statistics.actual_row_count report 115 rows, while system.statistics.stats_row_count still reports 105 rows.
Reproduction
Observed on a local standalone debug build:
databend-query v1.2.919-nightly-9ee8544ffd(rust-1.94.0-nightly-2026-06-14T09:04:25.972757091Z)
Setup:
DROP DATABASE IF EXISTS txn_stats_repro;
CREATE DATABASE txn_stats_repro;
USE txn_stats_repro;
SET enable_table_snapshot_stats = 1;
CREATE TABLE t(a INT);
INSERT INTO t SELECT number::int FROM numbers(100);
ANALYZE TABLE t;
SELECT column_name, stats_row_count, actual_row_count, distinct_count, min, max
FROM system.statistics
WHERE database = 'txn_stats_repro' AND table = 't'
ORDER BY column_name;
Baseline output:
column_name stats_row_count actual_row_count distinct_count min max
a 100 100 100 0 99
Session A:
USE txn_stats_repro;
SET enable_table_snapshot_stats = 1;
BEGIN;
INSERT INTO t SELECT (1000 + number)::int FROM numbers(10);
SELECT count(*) AS count_seen_in_a_before_commit FROM t;
Session A sees the uncommitted insert:
count_seen_in_a_before_commit
110
Session B:
USE txn_stats_repro;
SET enable_table_snapshot_stats = 1;
INSERT INTO t SELECT (2000 + number)::int FROM numbers(5);
SELECT count(*) AS count_seen_in_b_after_commit FROM t;
SELECT column_name, stats_row_count, actual_row_count, distinct_count, min, max
FROM system.statistics
WHERE database = 'txn_stats_repro' AND table = 't'
ORDER BY column_name;
Session B output:
count_seen_in_b_after_commit
105
column_name stats_row_count actual_row_count distinct_count min max
a 105 105 105 0 2004
Back in Session A:
COMMIT;
SELECT count(*) AS count_after_a_commit FROM t;
SELECT column_name, stats_row_count, actual_row_count, distinct_count, min, max
FROM system.statistics
WHERE database = 'txn_stats_repro' AND table = 't'
ORDER BY column_name;
Actual output:
count_after_a_commit
115
column_name stats_row_count actual_row_count distinct_count min max
a 105 115 115 0 2004
Expected output:
column_name stats_row_count actual_row_count
a 115 115
Notes
The query log file shows Session A's COMMIT going through the retry path:
retry::commit: Computing segments diff for table 25 (base: 1 segments, txn: 2 segments)
retry::commit: try_rebuild_req: update_failed_tbls=[... statistics: TableStatistics { number_of_rows: 105, ... }]
After the commit, table metadata has the correct table row count (TableStatistics { number_of_rows: 115, ... }), but snapshot statistics row count remains stale and is exposed by system.statistics.stats_row_count.
A likely cause is that the retry path only reads inserted rows from the multi-table-insert-specific transaction buffer:
src/query/storages/fuse/src/retry/commit.rs: try_rebuild_req reads ctx.txn_mgr().multi_table_insert_rows() and only updates additional_stats_meta.row_count / HLL when that map contains rows for the table.
src/query/storages/common/session/src/transaction.rs: multi_table_insert_rows is only populated through add_multi_table_insert_rows.
- Current callers of
add_multi_table_insert_rows are in the multi-table insert commit path, so ordinary INSERT inside an explicit transaction has no entry even though its transaction-generated CommitMeta already contains insert_rows and HLL data.
This makes the explicit transaction retry path depend on a multi-table-insert-only side channel for snapshot statistics. The retry merge should use the transaction-generated commit/snapshot metadata as the source of truth for the inserted rows and column statistics, not only TxnManager::multi_table_insert_rows.
Summary
When an explicit transaction inserts rows into a Fuse table and
COMMIThas to retry because another session committed the same table first, the final table data is correct, butsystem.statistics.stats_row_countcan remain stale.This is observable with SQL: after the retried commit,
count(*)andsystem.statistics.actual_row_countreport 115 rows, whilesystem.statistics.stats_row_countstill reports 105 rows.Reproduction
Observed on a local standalone debug build:
Setup:
Baseline output:
Session A:
Session A sees the uncommitted insert:
Session B:
Session B output:
Back in Session A:
Actual output:
Expected output:
Notes
The query log file shows Session A's
COMMITgoing through the retry path:After the commit, table metadata has the correct table row count (
TableStatistics { number_of_rows: 115, ... }), but snapshot statistics row count remains stale and is exposed bysystem.statistics.stats_row_count.A likely cause is that the retry path only reads inserted rows from the multi-table-insert-specific transaction buffer:
src/query/storages/fuse/src/retry/commit.rs:try_rebuild_reqreadsctx.txn_mgr().multi_table_insert_rows()and only updatesadditional_stats_meta.row_count/ HLL when that map contains rows for the table.src/query/storages/common/session/src/transaction.rs:multi_table_insert_rowsis only populated throughadd_multi_table_insert_rows.add_multi_table_insert_rowsare in the multi-table insert commit path, so ordinaryINSERTinside an explicit transaction has no entry even though its transaction-generatedCommitMetaalready containsinsert_rowsand HLL data.This makes the explicit transaction retry path depend on a multi-table-insert-only side channel for snapshot statistics. The retry merge should use the transaction-generated commit/snapshot metadata as the source of truth for the inserted rows and column statistics, not only
TxnManager::multi_table_insert_rows.