Skip to content

Rewriting objectpropertyex function in C#4864

Open
RuchaSK1 wants to merge 1 commit into
babelfish-for-postgresql:BABEL_4_X_DEVfrom
amazon-aurora:jira-babel-6435-4x
Open

Rewriting objectpropertyex function in C#4864
RuchaSK1 wants to merge 1 commit into
babelfish-for-postgresql:BABEL_4_X_DEVfrom
amazon-aurora:jira-babel-6435-4x

Conversation

@RuchaSK1

Copy link
Copy Markdown
Contributor

Description

The PL/pgSQL implementation of OBJECTPROPERTYEX relies on sys.all_objects, an expensive view that joins across multiple catalogs with multiple UNION branches, causing significant overhead which gets carried forward to any function using OBJECTPROPERTYEX.

Hence we are replacing OBJECTPROPERTYEX with a C implementation that uses direct syscache lookups via shared catalog helpers, avoiding view materialization entirely. For non-BaseType properties, it delegates to the existing objectproperty_internal.

The performance improvement observed by this is as follows on 17.10 instance with total object counts being approx 36K:

Query Time Before Fix Time After Fix Improvement (Before/After ratio)
EXPLAIN ANALYZE SELECT OBJECTPROPERTYEX(OBJECT_ID('dbo.tbl_1'), 'BaseType'); 636 ms 2.83 ms 225x
EXPLAIN ANALYZE SELECT OBJECTPROPERTYEX(OBJECT_ID('dbo.tbl_1'), 'IsSchemaBound'); 634ms 0.344 ms 1843x
EXPLAIN ANALYZE SELECT OBJECTPROPERTYEX(999999999, 'BaseType'); 648ms 0.894 ms 725x
EXPLAIN ANALYZE SELECT OBJECTPROPERTYEX(999999999, 'IsSchemaBound'); 775ms 0.329 ms 2355x
EXPLAIN ANALYZE SELECT OBJECTPROPERTYEX(OBJECT_ID('dbo.vw_1'), 'BaseType'); 644ms 0.659 ms 977x
EXPLAIN ANALYZE SELECT OBJECTPROPERTYEX(OBJECT_ID('dbo.sp_1'), 'BaseType'); 643ms 0.423 ms 1520x
EXPLAIN ANALYZE select count(*) from sys.all_objects; 890 ms 880 ms 1x (no change as expected)
Full SSMS view enumeration query 11 min 51s 11.31 s 63x

Performance for Joins with Predicated OBJECTPROPERTYEX:

oid_test is a plain user table containing all object IDs (isolates function cost from sys view overhead).

Query Time Before Fix Time After Fix Improvement
SELECT TOP 100 t.name, OBJECTPROPERTYEX(t.object_id, 'BaseType') FROM sys.objects t JOIN sys.schemas s ON t.schema_id = s.schema_id; 25.2s 2.1s 12x
SELECT TOP 200 t.name, OBJECTPROPERTYEX(t.object_id, 'BaseType') FROM sys.objects t JOIN sys.schemas s ON t.schema_id = s.schema_id; 49.7s 3.3s 15x
SELECT TOP 20 name, object_id FROM sys.objects WHERE OBJECTPROPERTYEX(object_id, 'BaseType') = CAST('U ' AS sql_variant); >8min (timeout) 1m 45s >4.5x
SELECT object_id, OBJECTPROPERTYEX(object_id, 'BaseType') FROM oid_test; (36K rows) >8min (timeout) 1m 48s >4.5x
SELECT TOP 200 o.object_id, c.relname FROM oid_test o JOIN pg_class c ON c.oid = o.object_id JOIN pg_namespace n ON n.oid = c.relnamespace WHERE OBJECTPROPERTYEX(o.object_id, 'IsTable') = CAST(1 AS sql_variant); 27.5s 2.7s 10x
SELECT TOP 200 o.object_id, OBJECTPROPERTYEX(o.object_id, 'BaseType') AS type, OBJECTPROPERTYEX(o.object_id, 'IsMSShipped') AS shipped FROM oid_test o JOIN pg_class c ON c.oid = o.object_id JOIN pg_namespace n ON n.oid = c.relnamespace WHERE OBJECTPROPERTYEX(o.object_id, 'IsTable') = CAST(1 AS sql_variant) OR OBJECTPROPERTYEX(o.object_id, 'IsView') = CAST(1 AS sql_variant); 1m 44s 7.8s 13x
SELECT TOP 200 object_id, OBJECTPROPERTYEX(object_id, 'BaseType'), OBJECTPROPERTYEX(object_id, 'IsTable'), OBJECTPROPERTYEX(object_id, 'IsMSShipped') FROM oid_test; 1m 43s 7.9s 13x
SELECT TOP 200 o.object_id, c.relname FROM oid_test o JOIN pg_class c ON c.oid = o.object_id WHERE OBJECTPROPERTYEX(o.object_id, 'BaseType') = CAST('U ' AS sql_variant) ORDER BY c.relname; >8min (timeout) 1m 45s >4.5x
SELECT TOP 200 a.object_id AS table_id, b.object_id AS view_id FROM oid_test a JOIN oid_test b ON a.object_id <> b.object_id WHERE OBJECTPROPERTYEX(a.object_id, 'BaseType') = CAST('U ' AS sql_variant) AND OBJECTPROPERTYEX(b.object_id, 'BaseType') = CAST('V ' AS sql_variant); >8min (timeout) 1m 40s >4.5x

Additional Improvements:

  • Trigger support: Triggers are now resolved via pg_trigger lookup. Previously, passing a trigger OID to OBJECTPROPERTY/OBJECTPROPERTYEX returned NULL for all properties. Now correctly returns values matching T-SQL behavior (e.g., IsTrigger → 1, BaseType → "TR"). This also fixes DDL scripting output (SET ANSI_NULLS ON instead of OFF for triggers).
  • Unique constraint support: OBJECTPROPERTYEX(unique_constraint_id, 'BaseType') now returns "UQ".
  • Database scoping fix: Replaced hardcoded schema blacklist with is_shared_schema() + is_schema_from_db(). Correctly hides cross-database objects while keeping shared schemas (sys, information_schema) visible. Also handles pg_dump/restore scenarios.
  • ACL alignment with T-SQL: Metadata visibility now uses all 5 T-SQL table permissions (SELECT, INSERT, UPDATE, DELETE, REFERENCES). Added column-level permission fallback for default constraints.
  • Code refactoring: Extracted shared catalog helpers (get_object_from_pg_class, get_object_from_pg_proc, get_object_from_pg_trigger, get_object_from_pg_attrdef, get_object_from_pg_constraint) called by both objectproperty_internal and objectpropertyex_internal via objectproperty_helper, eliminating code duplication.

Cherry-picked from: #4691

Issues Resolved

Task: BABEL-6435

Authored-by: Rucha Kulkarni ruchask@amazon.com
Signed-off-by: Rucha Kulkarni ruchask@amazon.com

Test Scenarios Covered

There were already existing tests since the function was already present and we have just rewritten it for better performance, but still some additional test cases have been added.

  • Use case based - Yes

  • Boundary conditions - Yes

  • Arbitrary inputs - Yes

  • Negative test cases - Yes

  • Minor version upgrade tests -

  • Major version upgrade tests -

  • Performance tests - Yes (Benchmarked above with 36K objects having 2500 views)

  • Tooling impact -

  • Client tests -

Check List

  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is under the terms of the Apache 2.0 and PostgreSQL licenses, and grant any person obtaining a copy of the contribution permission to relicense all or a portion of my contribution to the PostgreSQL License solely to contribute all or a portion of my contribution to the PostgreSQL open source project.

For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Rucha Kulkarni <ruchask@amazon.com>
@RuchaSK1

Copy link
Copy Markdown
Contributor Author

/code-review

@kuntalghosh kuntalghosh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI Code Review

Verdict: COMMENT

Based on the provided code changes, here are my feedback and suggestions:

Issue 1 - In contrib/babelfishpg_tsql/runtime/functions.c at line 5095: Missing PG_ARGISNULL defense-in-depth in objectproperty_internal:

Datum
objectproperty_internal(PG_FUNCTION_ARGS)
{
	Oid		object_id;
	char	*raw;
	...
	object_id = (Oid) PG_GETARG_INT32(0);
	raw = text_to_cstring(PG_GETARG_TEXT_P(1));

The prior implementation had explicit PG_ARGISNULL(0) || PG_ARGISNULL(1) guards. The new code relies solely on the SQL-level STRICT modifier to prevent NULL arguments from reaching the C function. While STRICT is correct today, a future upgrade script that replaces the function declaration without STRICT (or an ALTER FUNCTION ... CALLED ON NULL INPUT) would cause text_to_cstring(NULL) to dereference a NULL pointer, crashing the backend. Adding the check costs two branch instructions and provides defense-in-depth.

Suggested fix:

Datum
objectproperty_internal(PG_FUNCTION_ARGS)
{
	Oid		object_id;
	char	*raw;
	char	*property;
	bool	is_null;
	int		result;

	if (PG_ARGISNULL(0) || PG_ARGISNULL(1))
		PG_RETURN_NULL();

	object_id = (Oid) PG_GETARG_INT32(0);
	raw = text_to_cstring(PG_GETARG_TEXT_P(1));

Apply the same pattern to objectpropertyex_internal.

Issue 2 - In contrib/babelfishpg_tsql/src/pltsql_utils.c at line 1782: Broadened ACL in shared helper affects object_schema_name asymmetrically:

if (pg_class_aclcheck(con->conrelid, user_id, ACL_SELECT | ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_REFERENCES) == ACLCHECK_OK)
    namespace_oid = con->connamespace;

tsql_get_constraint_nsp_oid is called from two sites: (1) the new get_object_from_pg_constraint helper and (2) object_schema_name (line ~4031). In object_schema_name, the table path still requires ACL_SELECT, and the trigger path still requires ACL_SELECT. After this PR, the constraint path is more permissive — a user with only INSERT on a table can learn the schema name via OBJECT_SCHEMA_NAME(constraint_oid) but not via OBJECT_SCHEMA_NAME(table_oid). This asymmetry is likely unintentional.

Suggested fix:

Either keep the broadened check only inline in get_object_from_pg_constraint (revert tsql_get_constraint_nsp_oid to ACL_SELECT), or also broaden the table/trigger paths in object_schema_name for consistency. Please confirm which behavior is intended.

Issue 3 - In contrib/babelfishpg_tsql/runtime/functions.c at line 4773-4781: Database scoping uses is_shared_schema() which is wider than prior behavior:

if (!(nspname && is_shared_schema(nspname)) &&
    OidIsValid(get_cur_db_id()) && !is_schema_from_db(schema_id, get_cur_db_id()))

The old code explicitly blocked objects in pg_catalog, pg_toast, and public. The new code permits objects in any "shared schema" (including public, pg_catalog, pg_toast, aws_commons, aws_s3, aws_lambda, pglogical). This means objects in public that were previously hidden by OBJECTPROPERTY/OBJECTPROPERTYEX are now visible to any T-SQL user with appropriate ACL. If this is intentional (and it does seem correct for multi-database scoping), it should be documented in the PR description as a behavior change. If it's unintentional for public, a narrower predicate should be used.

Suggested fix:

Add a brief note to the PR description acknowledging the metadata visibility expansion for objects in public/pg_catalog/pg_toast.

Issue 4 - In contrib/babelfishpg_tsql/runtime/functions.c at line 4415: Sequence detection fix — good catch but verify test coverage:

// Old (broken):
else if (pg_class->relkind == 's')
    type = OBJECT_TYPE_SEQUENCE_OBJECT;

// New (correct):
else if (pg_class->relkind == 'S')
    temp_type = OBJECT_TYPE_SEQUENCE_OBJECT;

The prior code used lowercase 's' which never matches (PostgreSQL uses uppercase 'S' for sequences in relkind). This means OBJECTPROPERTY was silently returning wrong results for sequences. The fix is correct and the new test for OBJECTPROPERTYEX(OBJECT_ID('objectpropertyex_test_seq'), 'BaseType') exercises it. However, there is no corresponding test added for OBJECTPROPERTY(..., 'IsTable') on a sequence to verify it returns 0 (ensuring it doesn't fall through to the table classification path). Consider adding one.

Issue 5 - In contrib/babelfishpg_tsql/runtime/functions.c at line 5158: Minor memory leak in objectpropertyex_internal basetype path:

VarChar *vch = (*common_utility_plugin_ptr->tsql_varchar_input)(type_code, strlen(type_code), -1);
PG_RETURN_BYTEA_P((*common_utility_plugin_ptr->convertVarcharToSQLVariantByteA)(vch, PG_GET_COLLATION()));

vch is allocated by tsql_varchar_input and leaked because PG_RETURN_BYTEA_P returns immediately. In queries like SELECT OBJECTPROPERTYEX(object_id, 'BaseType') FROM sys.all_objects, each row leaks one VarChar allocation. This is bounded (memory context cleanup happens at statement end) and the payload is only ~2 bytes, so it's low severity. Consider adding pfree(vch) if convertVarcharToSQLVariantByteA does not retain the pointer.

Issue 6 - In contrib/babelfishpg_tsql/runtime/functions.c at line 4856: isusertable evaluated before the main else block:

if (pg_strcasecmp(property, "ismsshipped") == 0)
{ ... }
else if (pg_strcasecmp(property, "isusertable") == 0)
{ ... }
else
{
    if (object_name)
        pfree(object_name);
    /* OwnerId */
    if (pg_strcasecmp(property, "ownerid") == 0)
    ...

The isusertable property is handled in a branch that still has object_name available and calls is_ms_shipped(object_name, ...). However, the main else block starts by calling pfree(object_name) before evaluating remaining properties. This works correctly because the else block properties (ownerid, isdefaultcnst, etc.) don't use object_name. Just noting that the structure is correct but the inconsistent placement of isusertable outside the main else block is intentional since it needs object_name — this is fine.

Missing Tests

  • No OBJECTPROPERTY test for sequence objects (e.g., SELECT OBJECTPROPERTY(OBJECT_ID('seq'), 'IsTable') should return 0 — this exercises the 'S' fix from the other direction).
  • The "before-16_5-or-15_9" variant of tests expects <NULL> for PK and FK constraints' BaseType (comments say "older databases"). This is fine for the older-version test file, but consider verifying in the main test variant that these return PK/F on the current version (unless sys.key_constraints/sys.foreign_keys are intentionally empty on BABEL_4_X_DEV).
  • No minor version upgrade test is listed in the PR checklist. Given that objectproperty is being replaced via CREATE OR REPLACE FUNCTION in the 4.10.0--4.11.0 upgrade script, a minor version upgrade test verifying the function works post-upgrade would be valuable.

Notes

  • The PR is cherry-picked from #4691 (presumably targeting a different branch). The performance improvements are substantial (225x–2355x for single-call scenarios) and the refactoring into per-catalog helper functions is clean and well-structured.
  • The IsTrigger fix (previously checking OBJECT_TYPE_ASSEMBLY_DML_TRIGGER which the resolver never produces, now correctly checking OBJECT_TYPE_TSQL_DML_TRIGGER) is a real correctness bug fix — the old code always returned NULL/0 for triggers. The test change from <NULL> to 1 in objectproperty-vu-verify.out confirms this.
  • The trigger lookup via pg_trigger is a new catalog path that didn't exist before — triggers are now properly resolved even when passed directly as OIDs rather than through OBJECT_ID('name', 'TR').
  • The unique constraint support ('UQ' BaseType) is a new addition that aligns with T-SQL behavior.
  • Test coverage is comprehensive: ACL tests with per-permission grants/revokes, cross-database scoping, NULL inputs, trigger resolution, constraint types, IsSchemaBound on views, and permission edge cases are all well-covered.
  • Security review (incorporated from separate analysis): 0 critical/high findings. 1 medium finding (Issue 2 above — ACL asymmetry in object_schema_name). Remaining findings are low/informational defense-in-depth suggestions.

Note: This AI review is meant to augment, not replace, human review. Use the suggestions with discretion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants