Skip to content

Optimize vertex/edge field access with direct array indexing#2302

Merged
MuhammadTahaNaveed merged 1 commit into
apache:masterfrom
jrgemignani:optimize_vertex_edge_field_access
Jan 16, 2026
Merged

Optimize vertex/edge field access with direct array indexing#2302
MuhammadTahaNaveed merged 1 commit into
apache:masterfrom
jrgemignani:optimize_vertex_edge_field_access

Conversation

@jrgemignani

Copy link
Copy Markdown
Contributor

NOTE: This PR was created using AI tools and a human.

Leverage deterministic key ordering from uniqueify_agtype_object() to access vertex/edge fields in O(1) instead of O(log n) binary search.

Fields are sorted by key length, giving fixed positions:

  • Vertex: id(0), label(1), properties(2)
  • Edge: id(0), label(1), end_id(2), start_id(3), properties(4)

Changes:

  • Add field index constants and accessor macros to agtype.h
  • Update age_id(), age_start_id(), age_end_id(), age_label(), age_properties() to use direct field access
  • Add fill_agtype_value_no_copy() for read-only scalar extraction without memory allocation
  • Add compare_agtype_scalar_containers() fast path for scalar comparison
  • Update hash_agtype_value(), equals_agtype_scalar_value(), and compare_agtype_scalar_values() to use direct field access macros
  • Add fast path in get_one_agtype_from_variadic_args() bypassing extract_variadic_args() for single argument case
  • Add comprehensive regression test (30 tests)

Performance impact: Improves ORDER BY, hash joins, aggregations, and Cypher functions (id, start_id, end_id, label, properties) on vertices and edges.

All previous regression tests were not impacted.
Additional regression test added to enhance coverage.

modified: Makefile
new file: regress/expected/direct_field_access.out
new file: regress/sql/direct_field_access.sql
modified: src/backend/utils/adt/agtype.c
modified: src/backend/utils/adt/agtype_util.c
modified: src/include/utils/agtype.h

@github-actions github-actions Bot added master override-stale To keep issues/PRs untouched from stale action labels Jan 10, 2026

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes vertex and edge field access by leveraging the deterministic key ordering from uniqueify_agtype_object() to use direct array indexing instead of binary search, providing O(1) instead of O(log n) access time.

Key Changes:

  • Introduces direct field access macros and constants for vertex (3 fields) and edge (5 fields) objects based on key length sorting
  • Adds fill_agtype_value_no_copy() for read-only scalar extraction without memory allocation
  • Implements fast path in compare_agtype_scalar_containers() for scalar comparisons
  • Optimizes get_one_agtype_from_variadic_args() to bypass extract_variadic_args() for single agtype arguments

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
src/include/utils/agtype.h Adds field index constants and direct access macros for vertex/edge objects
src/backend/utils/adt/agtype_util.c Implements no-copy value extraction and scalar comparison fast paths; fixes bounds checking in comparison functions
src/backend/utils/adt/agtype.c Updates accessor functions (id, start_id, end_id, label, properties) to use direct field access; adds fast path for variadic arg handling; fixes 4 assert bugs (= to ==)
regress/sql/direct_field_access.sql Comprehensive test suite covering scalar comparisons, ORDER BY, vertex/edge operations, and edge cases
regress/expected/direct_field_access.out Expected test output
Makefile Registers new regression test

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/backend/utils/adt/agtype.c
Comment thread src/backend/utils/adt/agtype.c
Comment thread src/backend/utils/adt/agtype_util.c Outdated
Comment thread src/backend/utils/adt/agtype.c Outdated
Comment thread src/backend/utils/adt/agtype.c
Comment thread src/backend/utils/adt/agtype.c
Comment thread src/backend/utils/adt/agtype_util.c
@jrgemignani jrgemignani force-pushed the optimize_vertex_edge_field_access branch from 7537dd0 to 6a3fe88 Compare January 13, 2026 23:54
NOTE: This PR was created using AI tools and a human.

Leverage deterministic key ordering from uniqueify_agtype_object() to
access vertex/edge fields in O(1) instead of O(log n) binary search.

Fields are sorted by key length, giving fixed positions:
- Vertex: id(0), label(1), properties(2)
- Edge: id(0), label(1), end_id(2), start_id(3), properties(4)

Changes:
- Add field index constants and accessor macros to agtype.h
- Update age_id(), age_start_id(), age_end_id(), age_label(),
  age_properties() to use direct field access
- Add fill_agtype_value_no_copy() for read-only scalar extraction
  without memory allocation
- Add compare_agtype_scalar_containers() fast path for scalar comparison
- Update hash_agtype_value(), equals_agtype_scalar_value(), and
  compare_agtype_scalar_values() to use direct field access macros
- Add fast path in get_one_agtype_from_variadic_args() bypassing
  extract_variadic_args() for single argument case
- Add comprehensive regression test (30 tests)

Performance impact: Improves ORDER BY, hash joins, aggregations, and
Cypher functions (id, start_id, end_id, label, properties) on vertices
and edges.

All previous regression tests were not impacted.
Additional regression test added to enhance coverage.

modified:   Makefile
new file:   regress/expected/direct_field_access.out
new file:   regress/sql/direct_field_access.sql
modified:   src/backend/utils/adt/agtype.c
modified:   src/backend/utils/adt/agtype_util.c
modified:   src/include/utils/agtype.h
@jrgemignani jrgemignani force-pushed the optimize_vertex_edge_field_access branch from 6a3fe88 to d863820 Compare January 13, 2026 23:58
@MuhammadTahaNaveed MuhammadTahaNaveed merged commit b9d0982 into apache:master Jan 16, 2026
7 checks passed
jrgemignani added a commit to jrgemignani/age that referenced this pull request Jan 21, 2026
…2302)

NOTE: This PR was created using AI tools and a human.

Leverage deterministic key ordering from uniqueify_agtype_object() to
access vertex/edge fields in O(1) instead of O(log n) binary search.

Fields are sorted by key length, giving fixed positions:
- Vertex: id(0), label(1), properties(2)
- Edge: id(0), label(1), end_id(2), start_id(3), properties(4)

Changes:
- Add field index constants and accessor macros to agtype.h
- Update age_id(), age_start_id(), age_end_id(), age_label(),
  age_properties() to use direct field access
- Add fill_agtype_value_no_copy() for read-only scalar extraction
  without memory allocation
- Add compare_agtype_scalar_containers() fast path for scalar comparison
- Update hash_agtype_value(), equals_agtype_scalar_value(), and
  compare_agtype_scalar_values() to use direct field access macros
- Add fast path in get_one_agtype_from_variadic_args() bypassing
  extract_variadic_args() for single argument case
- Add comprehensive regression test (30 tests)

Performance impact: Improves ORDER BY, hash joins, aggregations, and
Cypher functions (id, start_id, end_id, label, properties) on vertices
and edges.

All previous regression tests were not impacted.
Additional regression test added to enhance coverage.

modified:   Makefile
new file:   regress/expected/direct_field_access.out
new file:   regress/sql/direct_field_access.sql
modified:   src/backend/utils/adt/agtype.c
modified:   src/backend/utils/adt/agtype_util.c
modified:   src/include/utils/agtype.h
MuhammadTahaNaveed pushed a commit that referenced this pull request Jan 21, 2026
NOTE: This PR was created using AI tools and a human.

Leverage deterministic key ordering from uniqueify_agtype_object() to
access vertex/edge fields in O(1) instead of O(log n) binary search.

Fields are sorted by key length, giving fixed positions:
- Vertex: id(0), label(1), properties(2)
- Edge: id(0), label(1), end_id(2), start_id(3), properties(4)

Changes:
- Add field index constants and accessor macros to agtype.h
- Update age_id(), age_start_id(), age_end_id(), age_label(),
  age_properties() to use direct field access
- Add fill_agtype_value_no_copy() for read-only scalar extraction
  without memory allocation
- Add compare_agtype_scalar_containers() fast path for scalar comparison
- Update hash_agtype_value(), equals_agtype_scalar_value(), and
  compare_agtype_scalar_values() to use direct field access macros
- Add fast path in get_one_agtype_from_variadic_args() bypassing
  extract_variadic_args() for single argument case
- Add comprehensive regression test (30 tests)

Performance impact: Improves ORDER BY, hash joins, aggregations, and
Cypher functions (id, start_id, end_id, label, properties) on vertices
and edges.

All previous regression tests were not impacted.
Additional regression test added to enhance coverage.

modified:   Makefile
new file:   regress/expected/direct_field_access.out
new file:   regress/sql/direct_field_access.sql
modified:   src/backend/utils/adt/agtype.c
modified:   src/backend/utils/adt/agtype_util.c
modified:   src/include/utils/agtype.h
jrgemignani added a commit to jrgemignani/age that referenced this pull request Jan 30, 2026
…2302)

NOTE: This PR was created using AI tools and a human.

Leverage deterministic key ordering from uniqueify_agtype_object() to
access vertex/edge fields in O(1) instead of O(log n) binary search.

Fields are sorted by key length, giving fixed positions:
- Vertex: id(0), label(1), properties(2)
- Edge: id(0), label(1), end_id(2), start_id(3), properties(4)

Changes:
- Add field index constants and accessor macros to agtype.h
- Update age_id(), age_start_id(), age_end_id(), age_label(),
  age_properties() to use direct field access
- Add fill_agtype_value_no_copy() for read-only scalar extraction
  without memory allocation
- Add compare_agtype_scalar_containers() fast path for scalar comparison
- Update hash_agtype_value(), equals_agtype_scalar_value(), and
  compare_agtype_scalar_values() to use direct field access macros
- Add fast path in get_one_agtype_from_variadic_args() bypassing
  extract_variadic_args() for single argument case
- Add comprehensive regression test (30 tests)

Performance impact: Improves ORDER BY, hash joins, aggregations, and
Cypher functions (id, start_id, end_id, label, properties) on vertices
and edges.

All previous regression tests were not impacted.
Additional regression test added to enhance coverage.

modified:   Makefile
new file:   regress/expected/direct_field_access.out
new file:   regress/sql/direct_field_access.sql
modified:   src/backend/utils/adt/agtype.c
modified:   src/backend/utils/adt/agtype_util.c
modified:   src/include/utils/agtype.h
MuhammadTahaNaveed pushed a commit that referenced this pull request Feb 3, 2026
NOTE: This PR was created using AI tools and a human.

Leverage deterministic key ordering from uniqueify_agtype_object() to
access vertex/edge fields in O(1) instead of O(log n) binary search.

Fields are sorted by key length, giving fixed positions:
- Vertex: id(0), label(1), properties(2)
- Edge: id(0), label(1), end_id(2), start_id(3), properties(4)

Changes:
- Add field index constants and accessor macros to agtype.h
- Update age_id(), age_start_id(), age_end_id(), age_label(),
  age_properties() to use direct field access
- Add fill_agtype_value_no_copy() for read-only scalar extraction
  without memory allocation
- Add compare_agtype_scalar_containers() fast path for scalar comparison
- Update hash_agtype_value(), equals_agtype_scalar_value(), and
  compare_agtype_scalar_values() to use direct field access macros
- Add fast path in get_one_agtype_from_variadic_args() bypassing
  extract_variadic_args() for single argument case
- Add comprehensive regression test (30 tests)

Performance impact: Improves ORDER BY, hash joins, aggregations, and
Cypher functions (id, start_id, end_id, label, properties) on vertices
and edges.

All previous regression tests were not impacted.
Additional regression test added to enhance coverage.

modified:   Makefile
new file:   regress/expected/direct_field_access.out
new file:   regress/sql/direct_field_access.sql
modified:   src/backend/utils/adt/agtype.c
modified:   src/backend/utils/adt/agtype_util.c
modified:   src/include/utils/agtype.h
jrgemignani added a commit to jrgemignani/age that referenced this pull request Mar 24, 2026
…2302)

NOTE: This PR was created using AI tools and a human.

Leverage deterministic key ordering from uniqueify_agtype_object() to
access vertex/edge fields in O(1) instead of O(log n) binary search.

Fields are sorted by key length, giving fixed positions:
- Vertex: id(0), label(1), properties(2)
- Edge: id(0), label(1), end_id(2), start_id(3), properties(4)

Changes:
- Add field index constants and accessor macros to agtype.h
- Update age_id(), age_start_id(), age_end_id(), age_label(),
  age_properties() to use direct field access
- Add fill_agtype_value_no_copy() for read-only scalar extraction
  without memory allocation
- Add compare_agtype_scalar_containers() fast path for scalar comparison
- Update hash_agtype_value(), equals_agtype_scalar_value(), and
  compare_agtype_scalar_values() to use direct field access macros
- Add fast path in get_one_agtype_from_variadic_args() bypassing
  extract_variadic_args() for single argument case
- Add comprehensive regression test (30 tests)

Performance impact: Improves ORDER BY, hash joins, aggregations, and
Cypher functions (id, start_id, end_id, label, properties) on vertices
and edges.

All previous regression tests were not impacted.
Additional regression test added to enhance coverage.

modified:   Makefile
new file:   regress/expected/direct_field_access.out
new file:   regress/sql/direct_field_access.sql
modified:   src/backend/utils/adt/agtype.c
modified:   src/backend/utils/adt/agtype_util.c
modified:   src/include/utils/agtype.h
MuhammadTahaNaveed pushed a commit that referenced this pull request Mar 26, 2026
NOTE: This PR was created using AI tools and a human.

Leverage deterministic key ordering from uniqueify_agtype_object() to
access vertex/edge fields in O(1) instead of O(log n) binary search.

Fields are sorted by key length, giving fixed positions:
- Vertex: id(0), label(1), properties(2)
- Edge: id(0), label(1), end_id(2), start_id(3), properties(4)

Changes:
- Add field index constants and accessor macros to agtype.h
- Update age_id(), age_start_id(), age_end_id(), age_label(),
  age_properties() to use direct field access
- Add fill_agtype_value_no_copy() for read-only scalar extraction
  without memory allocation
- Add compare_agtype_scalar_containers() fast path for scalar comparison
- Update hash_agtype_value(), equals_agtype_scalar_value(), and
  compare_agtype_scalar_values() to use direct field access macros
- Add fast path in get_one_agtype_from_variadic_args() bypassing
  extract_variadic_args() for single argument case
- Add comprehensive regression test (30 tests)

Performance impact: Improves ORDER BY, hash joins, aggregations, and
Cypher functions (id, start_id, end_id, label, properties) on vertices
and edges.

All previous regression tests were not impacted.
Additional regression test added to enhance coverage.

modified:   Makefile
new file:   regress/expected/direct_field_access.out
new file:   regress/sql/direct_field_access.sql
modified:   src/backend/utils/adt/agtype.c
modified:   src/backend/utils/adt/agtype_util.c
modified:   src/include/utils/agtype.h
jrgemignani added a commit to jrgemignani/age that referenced this pull request Jun 1, 2026
Targets the agtype value-tree allocation cost that dominates LDBC IC4 and
other sort/comparator-bound queries. Two complementary optimizations
(A1 arena, A2 fast-id) plus the sanitizer issues uncovered while
verifying the change.

A1 — per-tuple agtype arena (Pattern B applied):
  Add agt_arena_begin / AGT_ARENA_BEGIN_SHARED / agt_arena_reset /
  agt_arena_end helpers (agtype_util.c, agtype.h). Two usage patterns:
  disposable per-call arenas for one-shot tree builds, and a long-lived
  shared arena (file-static, parented under CacheMemoryContext) for hot
  inner loops that would otherwise pay AllocSetCreate cost per call.

  Apply to compare_agtype_scalar_containers: when the comparator
  deserializes VERTEX/EDGE/PATH composites via fill_agtype_value_no_copy,
  route those allocations into a shared arena and reset (O(1)) at the
  end of the comparison instead of recursively pfree'ing the tree (O(N)).
  Master IC4 spent 17.64% of total CPU in pfree_agtype_value_content from
  this comparator; the arena collapses that walk to one MemoryContextReset.

A2 — binary-direct id extraction:
  When compare_agtype_scalar_containers compares two VERTEX or two EDGE
  values, only the graphid 'id' determines order (compare_agtype_scalar_values
  ignores all other fields). Master IC4 spent 73.48% inclusive in
  ag_deserialize_composite called from this comparator, building the entire
  agtype_value tree (label, properties, nested values) just to read one int64.

  Add extract_composite_id_fast: walks the binary VERTEX/EDGE container
  directly and reads the int64 id at field index 0 (always 'id' due to
  length-sorted key ordering established by uniqueify_agtype_object — the
  same invariant PR apache#2302 leveraged for direct array indexing).

  Wired into compare_agtype_scalar_containers as a fast path before the
  existing arena+slow path. Falls through on cross-type comparisons
  (VERTEX vs EDGE), PATH, and malformed extended-type entries.

ASAN/UBSan fixes (10 issues found while verifying A1+A2 under
-fsanitize=address -fsanitize=undefined; 1 introduced by A2, 9 pre-existing).
The 11th finding (uninitialized parent_cpstate fields in cypher_analyze.c)
was independently fixed upstream by PR apache#2423; this branch is rebased on
top of that fix and no longer carries a duplicate change.
  - HIGH: heap-buffer-overflow in agtype_raw.c:write_container — VARSIZE
    included the 4-byte varlena header but copy started past it, reading
    VARHDRSZ bytes past source allocation. Subtract VARHDRSZ. ASan flagged
    4 times in installcheck.
  - 7 unaligned int64/float8 reads/writes (AGT_HEADER is 4 bytes, leaving
    payload 4-byte- but not 8-byte-aligned). UB under strict-alignment
    rules. Replaced typed loads/stores with memcpy in agtype_ext.c (4 sites),
    agtype_util.c (3 sites incl. extract_composite_id_fast), agtype_raw.c
    (write_graphid).
  - agtype.c:agtype_hash_cmp: reading r->val.array.raw_scalar at
    WAGT_END_ARRAY where the iterator does not populate *val. Replaced
    with explicit raw_scalar bool stack tracked across BEGIN/END pairs.
    Hash output unchanged.
  - agtype_util.c:copy_to_buffer: memcpy(NULL, NULL, 0) is UB. Guard with
    if (len > 0).

Performance (SF3 LDBC, paired same-session vs master):
  IC sum:   201,930 -> 157,129 ms = -22.19%
  IC4:       50,570 ->   7,615 ms = -84.94%
  IC5:       21,779 ->  20,646 ms =  -5.20%
  IS, IU:   within noise (<= +/-3% on sub-second queries)

Tests:
  34/34 installcheck on production PG18.3
  34/34 installcheck on ASAN+UBSan PG18.3 (zero unique runtime errors,
        zero heap-buffer-overflows; the original audit found 10 unique
        runtime issues plus 1 HBO with 4 instances, all addressed here
        or upstream)

Co-authored-by: Claude <noreply@anthropic.com>

modified:   src/backend/utils/adt/agtype.c
modified:   src/backend/utils/adt/agtype_ext.c
modified:   src/backend/utils/adt/agtype_raw.c
modified:   src/backend/utils/adt/agtype_util.c
modified:   src/include/utils/agtype.h
jrgemignani added a commit to jrgemignani/age that referenced this pull request Jun 1, 2026
Targets the agtype value-tree allocation cost that dominates LDBC IC4 and
other sort/comparator-bound queries. Two complementary optimizations
(A1 arena, A2 fast-id) plus the sanitizer issues uncovered while
verifying the change.

A1 — per-tuple agtype arena (Pattern B applied):
  Add agt_arena_begin / AGT_ARENA_BEGIN_SHARED / agt_arena_reset /
  agt_arena_end helpers (agtype_util.c, agtype.h). Two usage patterns:
  disposable per-call arenas for one-shot tree builds, and a long-lived
  shared arena (file-static, parented under CacheMemoryContext) for hot
  inner loops that would otherwise pay AllocSetCreate cost per call.

  Apply to compare_agtype_scalar_containers: when the comparator
  deserializes VERTEX/EDGE/PATH composites via fill_agtype_value_no_copy,
  route those allocations into a shared arena and reset (O(1)) at the
  end of the comparison instead of recursively pfree'ing the tree (O(N)).
  Master IC4 spent 17.64% of total CPU in pfree_agtype_value_content from
  this comparator; the arena collapses that walk to one MemoryContextReset.

A2 — binary-direct id extraction:
  When compare_agtype_scalar_containers compares two VERTEX or two EDGE
  values, only the graphid 'id' determines order (compare_agtype_scalar_values
  ignores all other fields). Master IC4 spent 73.48% inclusive in
  ag_deserialize_composite called from this comparator, building the entire
  agtype_value tree (label, properties, nested values) just to read one int64.

  Add extract_composite_id_fast: walks the binary VERTEX/EDGE container
  directly and reads the int64 id at field index 0 (always 'id' due to
  length-sorted key ordering established by uniqueify_agtype_object — the
  same invariant PR apache#2302 leveraged for direct array indexing).

  Wired into compare_agtype_scalar_containers as a fast path before the
  existing arena+slow path. Falls through on cross-type comparisons
  (VERTEX vs EDGE), PATH, and malformed extended-type entries.

ASAN/UBSan fixes (10 issues found while verifying A1+A2 under
-fsanitize=address -fsanitize=undefined; 1 introduced by A2, 9 pre-existing).
The 11th finding (uninitialized parent_cpstate fields in cypher_analyze.c)
was independently fixed upstream by PR apache#2423; this branch is rebased on
top of that fix and no longer carries a duplicate change.
  - HIGH: heap-buffer-overflow in agtype_raw.c:write_container — VARSIZE
    included the 4-byte varlena header but copy started past it, reading
    VARHDRSZ bytes past source allocation. Subtract VARHDRSZ. ASan flagged
    4 times in installcheck.
  - 7 unaligned int64/float8 reads/writes (AGT_HEADER is 4 bytes, leaving
    payload 4-byte- but not 8-byte-aligned). UB under strict-alignment
    rules. Replaced typed loads/stores with memcpy in agtype_ext.c (4 sites),
    agtype_util.c (3 sites incl. extract_composite_id_fast), agtype_raw.c
    (write_graphid).
  - agtype.c:agtype_hash_cmp: reading r->val.array.raw_scalar at
    WAGT_END_ARRAY where the iterator does not populate *val. Replaced
    with explicit raw_scalar bool stack tracked across BEGIN/END pairs.
    Hash output unchanged.
  - agtype_util.c:copy_to_buffer: memcpy(NULL, NULL, 0) is UB. Guard with
    if (len > 0).

Performance (SF3 LDBC, paired same-session vs master):
  IC sum:   201,930 -> 157,129 ms = -22.19%
  IC4:       50,570 ->   7,615 ms = -84.94%
  IC5:       21,779 ->  20,646 ms =  -5.20%
  IS, IU:   within noise (<= +/-3% on sub-second queries)

Tests:
  34/34 installcheck on production PG18.3
  34/34 installcheck on ASAN+UBSan PG18.3 (zero unique runtime errors,
        zero heap-buffer-overflows; the original audit found 10 unique
        runtime issues plus 1 HBO with 4 instances, all addressed here
        or upstream)

Co-authored-by: Claude <noreply@anthropic.com>

modified:   src/backend/utils/adt/agtype.c
modified:   src/backend/utils/adt/agtype_ext.c
modified:   src/backend/utils/adt/agtype_raw.c
modified:   src/backend/utils/adt/agtype_util.c
modified:   src/include/utils/agtype.h
jrgemignani added a commit to jrgemignani/age that referenced this pull request Jun 1, 2026
Targets the agtype value-tree allocation cost that dominates LDBC IC4 and
other sort/comparator-bound queries. Two complementary optimizations
(A1 arena, A2 fast-id) plus the sanitizer issues uncovered while
verifying the change.

A1 — per-tuple agtype arena (Pattern B applied):
  Add agt_arena_begin / AGT_ARENA_BEGIN_SHARED / agt_arena_reset /
  agt_arena_end helpers (agtype_util.c, agtype.h). Two usage patterns:
  disposable per-call arenas for one-shot tree builds, and a long-lived
  shared arena (file-static, parented under CacheMemoryContext) for hot
  inner loops that would otherwise pay AllocSetCreate cost per call.

  Apply to compare_agtype_scalar_containers: when the comparator
  deserializes VERTEX/EDGE/PATH composites via fill_agtype_value_no_copy,
  route those allocations into a shared arena and reset (O(1)) at the
  end of the comparison instead of recursively pfree'ing the tree (O(N)).
  Master IC4 spent 17.64% of total CPU in pfree_agtype_value_content from
  this comparator; the arena collapses that walk to one MemoryContextReset.

A2 — binary-direct id extraction:
  When compare_agtype_scalar_containers compares two VERTEX or two EDGE
  values, only the graphid 'id' determines order (compare_agtype_scalar_values
  ignores all other fields). Master IC4 spent 73.48% inclusive in
  ag_deserialize_composite called from this comparator, building the entire
  agtype_value tree (label, properties, nested values) just to read one int64.

  Add extract_composite_id_fast: walks the binary VERTEX/EDGE container
  directly and reads the int64 id at field index 0 (always 'id' due to
  length-sorted key ordering established by uniqueify_agtype_object — the
  same invariant PR apache#2302 leveraged for direct array indexing).

  Wired into compare_agtype_scalar_containers as a fast path before the
  existing arena+slow path. Falls through on cross-type comparisons
  (VERTEX vs EDGE), PATH, and malformed extended-type entries.

ASAN/UBSan fixes (10 issues found while verifying A1+A2 under
-fsanitize=address -fsanitize=undefined; 1 introduced by A2, 9 pre-existing).
The 11th finding (uninitialized parent_cpstate fields in cypher_analyze.c)
was independently fixed upstream by PR apache#2423; this branch is rebased on
top of that fix and no longer carries a duplicate change.
  - HIGH: heap-buffer-overflow in agtype_raw.c:write_container — VARSIZE
    included the 4-byte varlena header but copy started past it, reading
    VARHDRSZ bytes past source allocation. Subtract VARHDRSZ. ASan flagged
    4 times in installcheck.
  - 7 unaligned int64/float8 reads/writes (AGT_HEADER is 4 bytes, leaving
    payload 4-byte- but not 8-byte-aligned). UB under strict-alignment
    rules. Replaced typed loads/stores with memcpy in agtype_ext.c (4 sites),
    agtype_util.c (3 sites incl. extract_composite_id_fast), agtype_raw.c
    (write_graphid).
  - agtype.c:agtype_hash_cmp: reading r->val.array.raw_scalar at
    WAGT_END_ARRAY where the iterator does not populate *val. Replaced
    with explicit raw_scalar bool stack tracked across BEGIN/END pairs.
    Hash output unchanged.
  - agtype_util.c:copy_to_buffer: memcpy(NULL, NULL, 0) is UB. Guard with
    if (len > 0).

Performance (SF3 LDBC, paired same-session vs master):
  IC sum:   201,930 -> 157,129 ms = -22.19%
  IC4:       50,570 ->   7,615 ms = -84.94%
  IC5:       21,779 ->  20,646 ms =  -5.20%
  IS, IU:   within noise (<= +/-3% on sub-second queries)

Tests:
  34/34 installcheck on production PG18.3
  34/34 installcheck on ASAN+UBSan PG18.3 (zero unique runtime errors,
        zero heap-buffer-overflows; the original audit found 10 unique
        runtime issues plus 1 HBO with 4 instances, all addressed here
        or upstream)

Co-authored-by: Claude noreply@anthropic.com

modified:   src/backend/utils/adt/agtype.c
modified:   src/backend/utils/adt/agtype_ext.c
modified:   src/backend/utils/adt/agtype_raw.c
modified:   src/backend/utils/adt/agtype_util.c
modified:   src/include/utils/agtype.h
jrgemignani added a commit that referenced this pull request Jun 1, 2026
Targets the agtype value-tree allocation cost that dominates LDBC IC4 and
other sort/comparator-bound queries. Two complementary optimizations
(A1 arena, A2 fast-id) plus the sanitizer issues uncovered while
verifying the change.

A1 — per-tuple agtype arena (Pattern B applied):
  Add agt_arena_begin / AGT_ARENA_BEGIN_SHARED / agt_arena_reset /
  agt_arena_end helpers (agtype_util.c, agtype.h). Two usage patterns:
  disposable per-call arenas for one-shot tree builds, and a long-lived
  shared arena (file-static, parented under CacheMemoryContext) for hot
  inner loops that would otherwise pay AllocSetCreate cost per call.

  Apply to compare_agtype_scalar_containers: when the comparator
  deserializes VERTEX/EDGE/PATH composites via fill_agtype_value_no_copy,
  route those allocations into a shared arena and reset (O(1)) at the
  end of the comparison instead of recursively pfree'ing the tree (O(N)).
  Master IC4 spent 17.64% of total CPU in pfree_agtype_value_content from
  this comparator; the arena collapses that walk to one MemoryContextReset.

A2 — binary-direct id extraction:
  When compare_agtype_scalar_containers compares two VERTEX or two EDGE
  values, only the graphid 'id' determines order (compare_agtype_scalar_values
  ignores all other fields). Master IC4 spent 73.48% inclusive in
  ag_deserialize_composite called from this comparator, building the entire
  agtype_value tree (label, properties, nested values) just to read one int64.

  Add extract_composite_id_fast: walks the binary VERTEX/EDGE container
  directly and reads the int64 id at field index 0 (always 'id' due to
  length-sorted key ordering established by uniqueify_agtype_object — the
  same invariant PR #2302 leveraged for direct array indexing).

  Wired into compare_agtype_scalar_containers as a fast path before the
  existing arena+slow path. Falls through on cross-type comparisons
  (VERTEX vs EDGE), PATH, and malformed extended-type entries.

ASAN/UBSan fixes (10 issues found while verifying A1+A2 under
-fsanitize=address -fsanitize=undefined; 1 introduced by A2, 9 pre-existing).
The 11th finding (uninitialized parent_cpstate fields in cypher_analyze.c)
was independently fixed upstream by PR #2423; this branch is rebased on
top of that fix and no longer carries a duplicate change.
  - HIGH: heap-buffer-overflow in agtype_raw.c:write_container — VARSIZE
    included the 4-byte varlena header but copy started past it, reading
    VARHDRSZ bytes past source allocation. Subtract VARHDRSZ. ASan flagged
    4 times in installcheck.
  - 7 unaligned int64/float8 reads/writes (AGT_HEADER is 4 bytes, leaving
    payload 4-byte- but not 8-byte-aligned). UB under strict-alignment
    rules. Replaced typed loads/stores with memcpy in agtype_ext.c (4 sites),
    agtype_util.c (3 sites incl. extract_composite_id_fast), agtype_raw.c
    (write_graphid).
  - agtype.c:agtype_hash_cmp: reading r->val.array.raw_scalar at
    WAGT_END_ARRAY where the iterator does not populate *val. Replaced
    with explicit raw_scalar bool stack tracked across BEGIN/END pairs.
    Hash output unchanged.
  - agtype_util.c:copy_to_buffer: memcpy(NULL, NULL, 0) is UB. Guard with
    if (len > 0).

Performance (SF3 LDBC, paired same-session vs master):
  IC sum:   201,930 -> 157,129 ms = -22.19%
  IC4:       50,570 ->   7,615 ms = -84.94%
  IC5:       21,779 ->  20,646 ms =  -5.20%
  IS, IU:   within noise (<= +/-3% on sub-second queries)

Tests:
  34/34 installcheck on production PG18.3
  34/34 installcheck on ASAN+UBSan PG18.3 (zero unique runtime errors,
        zero heap-buffer-overflows; the original audit found 10 unique
        runtime issues plus 1 HBO with 4 instances, all addressed here
        or upstream)

Co-authored-by: Claude noreply@anthropic.com

modified:   src/backend/utils/adt/agtype.c
modified:   src/backend/utils/adt/agtype_ext.c
modified:   src/backend/utils/adt/agtype_raw.c
modified:   src/backend/utils/adt/agtype_util.c
modified:   src/include/utils/agtype.h
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

master override-stale To keep issues/PRs untouched from stale action

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants