Skip to content

[OSS-ONLY] Reset pg_strtok state after PLtsql node deserialization#4848

Draft
manisha-deshpande wants to merge 1 commit into
babelfish-for-postgresql:BABEL_5_X_DEVfrom
amazon-aurora:jira-babel-6037
Draft

[OSS-ONLY] Reset pg_strtok state after PLtsql node deserialization#4848
manisha-deshpande wants to merge 1 commit into
babelfish-for-postgresql:BABEL_5_X_DEVfrom
amazon-aurora:jira-babel-6037

Conversation

@manisha-deshpande

@manisha-deshpande manisha-deshpande commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Description

This is a follow-up cleanup to the cross-session ANTLR parse tree caching feature added in #4547, which introduced an extension-side node serializer/deserializer (pltsql_stringToNode) that uses the engine's tokenizer via the new pg_strtok_init() setter (added in the corresponding engine PR #733).

Currently, pltsql_stringToNode() calls pg_strtok_init(str) on entry but does not reset the static pg_strtok_ptr before returning. Because the input string lives in a per-statement memory context that gets reset between SQL commands, the tokenizer pointer is left pointing into freed memory until the next caller of stringToNode() overwrites it. With this change, pltsql_stringToNode() resets pg_strtok_ptr to NULL before returning, matching PG's clean-state convention.

Raised during code review of the engine-side pg_strtok_init() addition. Behavior is unchanged in practice - the dangling pointer was never dereferenced because PG's stringToNodeInternal() always overwrites it with its own input before any read (verified via gdb on pg_get_expr / pg_get_constraintdef paths after a cached procedure EXEC). This is a defensive cleanup to remove the stale-pointer footgun for any future caller that might read the static directly.

Issues Resolved

BABEL-6037

Test Scenarios Covered

  • Use case based -
  • Ran the regression test suite, everything passed as expected, confirming no behavioral change.
  • gdb-traced an EXEC of a persistently cached procedure followed by a pg_get_expr call in the same backend; confirmed pg_strtok_ptr is reset to 0x0 after pltsql_stringToNode returns instead of being left dangling, pointing to a wiped memory address.
  • Boundary conditions -
    N/A

  • Arbitrary inputs -
    N/A

  • Negative test cases -
    N/A

  • Minor version upgrade tests -
    N/A

  • Major version upgrade tests -
    N/A

  • Performance tests -
    N/A

  • Tooling impact -
    N/A

  • Client tests -
    N/A

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Reset pg_strtok_ptr to NULL after pltsql_nodeRead() returns so the static
tokenizer state isn't left pointing into a memory context that gets reset
between statements.

Task: BABEL-6037
Signed-off-by: Manisha Deshpande <mmdeshp@amazon.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant