Skip to content

Fix: ENR temp table creation fails for long table names#4846

Open
anju15bharti wants to merge 6 commits into
babelfish-for-postgresql:BABEL_6_X_DEVfrom
amazon-aurora:BABEL-6433
Open

Fix: ENR temp table creation fails for long table names#4846
anju15bharti wants to merge 6 commits into
babelfish-for-postgresql:BABEL_6_X_DEVfrom
amazon-aurora:BABEL-6433

Conversation

@anju15bharti

@anju15bharti anju15bharti commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Description

Store original untruncated name for long temp table identifiers

Previously, the tsql_ttname scanner rule did not truncate #temp table
names. The full untruncated name was stored in the ENR name field via
DefineRelation. However, the syscache (pg_class Name field) is limited
to NAMEDATALEN-1 bytes, so it stored a simple-clipped version.
This mismatch between ENR (full name) and syscache (clipped name)
caused temp table creation to fail for names >= NAMEDATALEN, as
subsequent lookups could not consistently resolve the table.

New approach: The scanner truncates #temp table names like all other
identifiers (hash-based truncation via truncate_identifier). The
truncated name is stored consistently in both ENR and syscache.
The original untruncated name is preserved separately in pg_class
reloptions (bbf_original_rel_name) using the existing infrastructure.
This works for both ENR and non-ENR temp tables since ENR synthetic
cattups are accessible via syscache. No PG engine changes are needed.

For indexes on temp tables, the parser stores the index name location
in IndexStmt options. At execution time, the full original name is
extracted from the query string and passed to exec_add_original_index_name.

Display functions (OBJECT_NAME, babelfish_get_enr_list) read the
original name from reloptions via a shared helper get_orig_temp_table_name().

Issues Resolved

BABEL-6433

Engine PR: babelfish-for-postgresql/postgresql_modified_for_babelfish#769

Test Scenarios Covered

  • Use case based -

  • Boundary conditions -

  • Arbitrary inputs -

  • Negative test cases -

  • Minor version upgrade tests -

  • Major version upgrade tests -

  • Performance tests -

  • Tooling impact -

  • Client tests -

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: anju15bharti <abanju@amazon.com>
@anju15bharti anju15bharti changed the title fix: ENR temp table creation fails for long table names Fix: ENR temp table creation fails for long table names Jun 2, 2026
@anju15bharti anju15bharti reopened this Jun 2, 2026
@anju15bharti anju15bharti requested a review from ayushdsh June 2, 2026 10:10

@anju15bharti anju15bharti left a comment

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/code-review

@ayushdsh

ayushdsh commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

@anju15bharti , why are we simply not truncating temp table identifiers (names) as well? I understand that the view with babelfish_get_enr_list might break with this, but its way simpler to simply truncate like we do for other identifier rather than supporting full name.

let me know what you think?

@ayushdsh

ayushdsh commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

/code-review

@ayushdsh

ayushdsh commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

@anju15bharti , why are we simply not truncating temp table identifiers (names) as well? I understand that the view with babelfish_get_enr_list might break with this, but its way simpler to simply truncate like we do for other identifier rather than supporting full name.

let me know what you think?

This probably is going to be a 1 line change in our parser (no changes in engine)

<tsql>{tsql_ttname}	{
					char *ident;
					SET_YYLLOC();
					ident = downcase_truncate_identifier(yytext, yyleng, true);
					yylval->str = ident;
					return IDENT;
				}

@kuntalghosh kuntalghosh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review

Verdict: COMMENT

Based on the provided code changes, here are my feedback and suggestions:

Issue 1 - In the pl_handler.c at line 3082: ENR fallback missing currentQueryEnv NULL guard:

	if (!OidIsValid(relid) && relname[0] == '#')
	{
		EphemeralNamedRelation enr = get_ENR(currentQueryEnv, relname, true);
		if (enr)
			relid = enr->md.reliddesc;
	}

While get_ENR handles NULL queryEnv gracefully (returns NULL), other call sites in this file (line 8335) do explicitly check !currentQueryEnv before accessing it. Adding a guard here would be a defensive improvement for clarity and consistency with the rest of the file, though the behavior is correct without it.

Suggested fix:

	if (!OidIsValid(relid) && relname[0] == '#' && currentQueryEnv)
	{
		EphemeralNamedRelation enr = get_ENR(currentQueryEnv, relname, true);
		if (enr)
			relid = enr->md.reliddesc;
	}

Issue 2 - In the temp_table_long_name_enr.sql at line 37: Test 2 expected output contradicts its own comment:

-- Type 'V' (view) should return NULL for a table
SELECT CASE WHEN OBJECT_ID('#long_temp_for_typed_object_id_test_exceeding_namedatalen_limit_x', 'V') IS NULL THEN 'PASS' ELSE 'FAIL' END AS test2_type_V_null
GO

The comment says type 'V' "should return NULL for a table," but the expected output is FAIL — meaning OBJECT_ID with type filter 'V' still returns a non-NULL OID for temp tables. If this is a known limitation of ENR-based lookup (type filtering not applied), the comment should document this as a known gap rather than state the opposite of what happens. Otherwise, future readers will be confused about whether the test is passing or failing.

Suggested fix:

-- Type 'V' (view) — known limitation: OBJECT_ID type filter is not enforced for
-- ENR-based temp table lookups, so this returns non-NULL even for type 'V'.
SELECT CASE WHEN OBJECT_ID('#long_temp_for_typed_object_id_test_exceeding_namedatalen_limit_x', 'V') IS NULL THEN 'PASS' ELSE 'FAIL' END AS test2_type_V_null
GO

Issue 3 - In the temp_table_long_name_enr-vu-verify.sql at line 47: Duplicate scenario numbering:

-- Scenario 3: Basic CREATE, INSERT, SELECT, DROP with ENR validation
EXEC temp_long_name_enr_basic_proc
GO

-- Scenario 2: ALTER TABLE on long-name temp table
EXEC temp_long_name_enr_alter_proc
GO

The scenario numbers in the procedure-based section restart at "Scenario 2" after "Scenario 3". The numbering should continue sequentially (Scenario 4, 5, 6...) or restart clearly from 1 for the procedure-based section.

Suggested fix:

-- Scenario 3: Basic CREATE, INSERT, SELECT, DROP with ENR validation
EXEC temp_long_name_enr_basic_proc
GO

-- Scenario 4: ALTER TABLE on long-name temp table
EXEC temp_long_name_enr_alter_proc
GO

Issue 4 - In the pltsql_utils.c at line 2978: Comment style — opening /* should be on its own line for multi-line block comments:

	/* Do not truncate temp table names (starting with '#') since ENR stores the
	 * full name. The {tsql_ttname} scanner rule preserves temp table names without
	 * truncation, so the ENR entry uses the original untruncated name. Applying
	 * hash-based truncation here would produce a name that doesn't match the ENR. */

Per project conventions, multi-line comment blocks should have /* and */ on their own lines.

Suggested fix:

	/*
	 * Do not truncate temp table names (starting with '#') since ENR stores the
	 * full name. The {tsql_ttname} scanner rule preserves temp table names without
	 * truncation, so the ENR entry uses the original untruncated name. Applying
	 * hash-based truncation here would produce a name that doesn't match the ENR.
	 */

Issue 5 - PR Description: Test Scenarios section is empty:

The PR description template has all test scenario subsections (use case, boundary conditions, arbitrary inputs, negative test cases, upgrade tests, etc.) left blank. While the actual test files are comprehensive, the PR description should document what is covered so reviewers can assess coverage without reading all test files.

Notes

  • The core approach (skip truncation for #-prefixed names in the bracket-delimiter scanner rule and OBJECT_ID path, plus an ENR fallback in bbf_table_var_lookup) is sound and follows established patterns already used in catalog.c and procedures.c.
  • The change correctly does NOT affect the {tsql_ttname} rule which already preserves the full name without truncation via pstrdup(yytext).
  • Security review of C source changes: 0 findings. No buffer overflows, NULL dereferences, injection risks, or use-after-free conditions introduced.
  • The reviewer comment from @ayushdsh suggests an alternative simpler approach (just truncate temp table names like other identifiers). This is an open design discussion — the current approach preserves full fidelity with T-SQL behavior where temp table names can be up to 116 characters, but the alternative would be simpler at the cost of breaking the babelfish_get_enr_list view and potentially OBJECT_ID lookups.
  • Upgrade test schedules are properly updated across all relevant minor versions.
  • The enrtype check (verifying enr->md.enrtype == ENR_TSQL_TEMP) is not performed in the new fallback. Other sibling call sites do check this. While non-TSQL-TEMP ENRs cannot have #-prefixed names through user-reachable syntax, adding the check would be a defensive consistency improvement.

Note: This AI review is meant to augment, not replace, human review. Use the suggestions with discretion.

Comment thread contrib/babelfishpg_tsql/src/pl_handler.c Outdated
Comment thread contrib/babelfishpg_tsql/src/pltsql_utils.c Outdated
Comment thread contrib/babelfishpg_tsql/src/backend_parser/scan-tsql-rule.l Outdated
Reverts code changes from commits f313cde and 798c7dd.
Test files are kept. Replaced by new approach using reloptions.
@anju15bharti anju15bharti force-pushed the BABEL-6433 branch 2 times, most recently from 5e82847 to a509bd4 Compare June 11, 2026 12:47
@anju15bharti

Copy link
Copy Markdown
Contributor Author

/code-review

@kuntalghosh kuntalghosh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review

Verdict: COMMENT

Based on the provided code changes, here are my feedback and suggestions:

Issue 1 - In the functions.c at line 2836: get_orig_temp_table_name returns pointer into partially-freed buffer:

static char *
get_orig_temp_table_name(Oid relid)
{
	HeapTuple	tuple = SearchSysCache1(RELOID, ObjectIdGetDatum(relid));
	char	   *orig = NULL;

	if (HeapTupleIsValid(tuple))
	{
		Datum	datum;
		bool	isnull;

		datum = SysCacheGetAttr(RELOID, tuple, Anum_pg_class_reloptions, &isnull);
		if (!isnull)
		{
			ArrayType *reloptions = DatumGetArrayTypeP(datum);
			orig = get_value_by_name_from_array(reloptions, ATTOPTION_BBF_ORIGINAL_TABLE_NAME);
		}
		ReleaseSysCache(tuple);
	}
	return orig;
}

get_value_by_name_from_array returns a pointer into the middle of a TextDatumGetCString-allocated buffer (it splits on = and returns the value portion). For non-TOAST reloptions, DatumGetArrayTypeP returns a pointer directly into the syscache tuple. When ReleaseSysCache is called, the tuple may be freed, invalidating the reloptions datum. However, because TextDatumGetCString inside get_value_by_name_from_array copies the text datum to a palloc'd buffer via array_ref, the returned string should remain valid after ReleaseSysCache.

This is safe but subtle. Consider adding a brief comment noting that orig remains valid after ReleaseSysCache because get_value_by_name_from_array copies the value into CurrentMemoryContext, or alternatively pstrdup the result before releasing the syscache for defensive clarity.

Issue 2 - In the functions.c at line 2904: result_text uninitialized before conditional assignment:

		if (pg_class->relpersistence == RELPERSISTENCE_TEMP &&
			NameStr(pg_class->relname)[0] == '#')
		{
			char *orig = get_orig_temp_table_name(object_id);
			if (orig)
				result_text = cstring_to_text(orig);
		}

		if (!result_text)
			result_text = cstring_to_text(NameStr(pg_class->relname));

The code relies on result_text being NULL at this point. Looking at the broader context of object_name(), result_text is declared earlier as text *result_text = NULL, and the ENR path above returns early via PG_RETURN_VARCHAR_P. So when we reach this code, result_text is indeed NULL. This is correct but the logic flow is non-obvious. Consider adding a comment like /* result_text is still NULL here since ENR path returns early */ for clarity.

Issue 3 - In the pl_handler.c at line 5407: No bounds validation on loc before accessing queryString:

int loc = intVal(opt->arg);
char *full_name = extract_identifier(queryString + loc, NULL);

The loc value comes from @7 in the grammar (the parser's position token for the index name), which is guaranteed to be within the query string bounds since the parser produced it from parsing that exact string. This follows the same pattern as existing name_location usage elsewhere in the file (lines 3900, 4087, 4142). No change needed, but a brief comment noting that loc is a parser-generated position would improve readability.

Issue 4 - In the hooks.c at line 2502: Comment is now stale after condition expansion:

	/*
	 * Only store original_name if there's a difference, and if the difference
	 * is only in capitalization
	 */
	if ((strncmp(relname, original_name, strlen(relname)) != 0 && strncasecmp(relname, original_name, strlen(relname)) == 0) ||
		(relation->relpersistence == RELPERSISTENCE_TEMP && original_name[0] == '#' && strlen(original_name) >= NAMEDATALEN))

The comment above this condition says "Only store original_name if there's a difference, and if the difference is only in capitalization." This no longer describes the full condition — the second clause stores the original name for truncated temp tables regardless of case differences. The comment should be updated to describe both conditions.

Suggested fix:

	/*
	 * Store original_name in reloptions when:
	 * 1. There is a case-only difference between relname and original_name, OR
	 * 2. The identifier is a temp table name that was truncated (>= NAMEDATALEN)
	 */

Issue 5 - In the gram-tsql-rule.y at line 3560: name_location added unconditionally for all indexes:

				if (n->idxname)
					n->options = lappend(n->options,
						makeDefElem("name_location",
							(Node *) makeInteger(@7), @7));

The name_location option is appended for every index statement that has a name, not just temp table indexes. While the consuming code in pl_handler.c only uses it when the table is temp (relpersistence == RELPERSISTENCE_TEMP && relname[0] == '#'), and the option is removed from stmt->options before execution proceeds, the option node is still allocated and added to the list for all non-temp indexes as well. Since the consumer removes it via foreach_delete_current, this is functionally correct but adds minor unnecessary overhead for all regular index creation statements. This is acceptable for simplicity, but worth noting.

Missing Tests

The test coverage is thorough for the core functionality. A few edge cases that could strengthen it:

  • Concurrent sessions: Two sessions creating temp tables with the same long name. This would verify the truncation hash produces distinct internal names. (Could be an isolation test.)
  • SELECT INTO with long temp table name: SELECT * INTO #very_long_name... FROM ... — this goes through a different DDL path than explicit CREATE TABLE.

Notes

  • Prior AI review issues have been addressed: the currentQueryEnv NULL guard issue is no longer relevant (the approach was completely redesigned to use reloptions instead of an ENR lookup fallback), and the misleading test comment about OBJECT_ID type filtering was clarified.
  • The .out file changes show that babelfish_get_enr_list() now returns the original user-facing index names (e.g., #idx1_intermediate) instead of the internal composite names (e.g., #idx1_intermediate#t_intermediac7f5...). This is a user-visible behavior change for the babelfish_get_enr_list() system function that improves usability.
  • The approach is well-designed: truncating at scanner level ensures consistent names in ENR and syscache, while preserving the original name in reloptions uses existing infrastructure without engine changes.
  • Target branch BABEL_6_X_DEV is correct for this feature work.
  • The PR description test scenarios section is empty (only headers, no specific scenarios listed). While the test files themselves are comprehensive, the PR template checklist items for test scenarios should ideally be filled in for reviewer context.

Note: This AI review is meant to augment, not replace, human review. Use the suggestions with discretion.

@anju15bharti anju15bharti force-pushed the BABEL-6433 branch 3 times, most recently from 89cde96 to d339237 Compare June 12, 2026 13:28

@ayushdsh ayushdsh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the approach looks fine to me .. i just have some questions -

  1. why have we explicitly added temp table handling? wouldn't the same be required for other identifiers as well?
  2. w.r.t the function get_enr_list(), we are not returning the original name .. but with this .. we are returning the orig_name of indexes as well however, indexes with same name can exists on multiple tables. do you think this can cause confusion? I am okay with it, but just something to consider.

/* Use original untruncated name from reloptions if available */
if (md->enrtype == ENR_TSQL_TEMP)
{
char *orig = get_orig_temp_table_name(md->reliddesc);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: can we instead make the utility itself handle all of this - get_orig_temp_table_name(Enr) returns fullname or truncated name.

Comment on lines +2909 to +2911
if (pg_class->relpersistence == RELPERSISTENCE_TEMP &&
NameStr(pg_class->relname)[0] == '#')
{

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't we be handling this for all identifiers? are we doing this for now for temp tables only?

stmt->excludeOpNames = NIL;

/*
* For indexes on temp tables, extract the full original

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why just temp tables? why aren't we doing this for all indexes

Comment on lines +8310 to +8313
pfree(original_name);
}
else if (original_name)
pfree(original_name);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: unnecessary. pfree handles NULLs

static List *
transformSelectIntoStmt(CreateTableAsStmt *stmt)
transformSelectIntoStmt
(CreateTableAsStmt *stmt, const char *queryString)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we doing the same for other SELECT INTOs as well? passing the queryString here and will use it to extract the actual name? why can't we do the same we do for CREATE TABLE ... storing the orig_name in reloptions?

Comment on lines +3560 to +3564
if (n->idxname)
n->options = lappend(n->options,
makeDefElem("name_location",
(Node *) makeInteger(@7), @7));

@ayushdsh ayushdsh Jun 15, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unconditional - we are populating the name_location field no matter what? should we add a NAMEDATALEN check? and btw this will also happen for indexes on permanent tables as well .. this is expected right?

Comment on lines +2909 to +2910
if (pg_class->relpersistence == RELPERSISTENCE_TEMP &&
NameStr(pg_class->relname)[0] == '#')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: lets create a utility for this?

bool IsTsqlTempTable(Relation)...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants