Skip to content

Commit 10b447c

Browse files
committed
ORCA: strip trailing spaces from bpchar payload in stats LINT mapping
PG's bpchareq treats trailing spaces as insignificant: a char(50) MCV stored as 'Books' is space-padded to 50 bytes in pg_statistic, while a query constant ``i_category='Books'`` arrives unpadded. In ExtractLintValueFromDatum the two produce different sort-key prefixes and therefore different LINTs, so CDatumGenericGPDB::StatsAreEqual (via IDatum::StatsAreEqual on the LINT mapping) misses every MCV bucket -- ORCA falls back to MinRows = 1 and downstream plans go wildly wrong (e.g. TPC-DS sf=5 Q33 estimates 1 row for i_category ='Books', actual 5331, then cascades into a 14M-row IndexNL). Trim trailing ASCII spaces from the bpchar payload before feeding it to the locale sort-key transform. Only the LINT path is affected; the round-tripped varlena bytes are preserved so that bpchar constants in INSERT/SELECT plans still carry their declared padding.
1 parent 92bba5f commit 10b447c

1 file changed

Lines changed: 16 additions & 0 deletions

File tree

src/backend/gpopt/translate/CTranslatorScalarToDXL.cpp

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2659,6 +2659,22 @@ CTranslatorScalarToDXL::ExtractLintValueFromDatum(const IMDType *md_type,
26592659
payload = (const BYTE *) VARDATA_ANY((const void *) bytes);
26602660
payload_len = (ULONG) VARSIZE_ANY_EXHDR((const void *) bytes);
26612661
is_varlena_string = true;
2662+
2663+
// PG bpchareq treats trailing spaces as insignificant: a
2664+
// char(50) MCV stored as 'Books' is space-padded to 50 bytes,
2665+
// while a query constant ``i_category='Books'`` arrives
2666+
// unpadded. Without trimming, the two produce different
2667+
// sort-key prefixes (and hence different LINTs), so
2668+
// StatsAreEqual misses every MCV bucket and the filter falls
2669+
// back to MinRows -- e.g. TPC-DS sf=5 Q33 estimating 1 row
2670+
// for ``i_category='Books'`` (actual 5331).
2671+
if (mdid->Equals(&CMDIdGPDB::m_mdid_bpchar) ||
2672+
(base_mdid->IsValid() &&
2673+
base_mdid->Equals(&CMDIdGPDB::m_mdid_bpchar)))
2674+
{
2675+
while (payload_len > 0 && payload[payload_len - 1] == ' ')
2676+
payload_len--;
2677+
}
26622678
}
26632679

26642680
// For non-C collation on varlena strings, run the payload through

0 commit comments

Comments
 (0)