Fix RawStmt stmt_len with Unicode statements by alexmac · Pull Request #194 · lelit/pglast

alexmac · 2026-05-25T21:34:50Z

RawStmt.stmt_len was being run through the byte-offset-to-string-index mapper as if it were an absolute offset, but PostgreSQL reports it as a length relative to stmt_location. That works by accident for plain ASCII, but in multi-statement SQL with Unicode earlier in the string, the length gets shortened because it’s converted from byte offset stmt_len instead of from the real byte end position stmt_location + stmt_len.

The fix converts both byte positions, start and end, into Python string indexes, then subtracts them. So stmt_len stays a character length that can be used with sql[stmt_location:stmt_location + stmt_len], even when earlier statements contain multibyte characters like €.

codecov-commenter · 2026-05-26T06:33:15Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (v7@deb6263). Learn more about missing BASE report.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@          Coverage Diff          @@
##             v7     #194   +/-   ##
=====================================
  Coverage      ?   99.55%           
=====================================
  Files         ?       22           
  Lines         ?     7259           
  Branches      ?        0           
=====================================
  Hits          ?     7227           
  Misses        ?       32           
  Partials      ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Fix RawStmt stmt_len with Unicode statements

8ebbf85

alexmac force-pushed the alex/fix-rawstmt-stmt-len-unicode branch from 78c75fe to 8ebbf85 Compare May 28, 2026 01:50

alexmac mentioned this pull request May 28, 2026

Followup to #194 with generated code #196

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix RawStmt stmt_len with Unicode statements#194

Fix RawStmt stmt_len with Unicode statements#194
alexmac wants to merge 1 commit into
lelit:v7from
alexmac:alex/fix-rawstmt-stmt-len-unicode

alexmac commented May 25, 2026

Uh oh!

codecov-commenter commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

alexmac commented May 25, 2026

Uh oh!

codecov-commenter commented May 26, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants