Skip to content

Commit d562292

Browse files
committed
perf(agtype): arena passthrough + binary-direct id extraction
Targets the agtype value-tree allocation cost that dominates LDBC IC4 and other sort/comparator-bound queries. Two complementary optimizations (A1 arena, A2 fast-id) plus the sanitizer issues uncovered while verifying the change. A1 — per-tuple agtype arena (Pattern B applied): Add agt_arena_begin / AGT_ARENA_BEGIN_SHARED / agt_arena_reset / agt_arena_end helpers (agtype_util.c, agtype.h). Two usage patterns: disposable per-call arenas for one-shot tree builds, and a long-lived shared arena (file-static, parented under CacheMemoryContext) for hot inner loops that would otherwise pay AllocSetCreate cost per call. Apply to compare_agtype_scalar_containers: when the comparator deserializes VERTEX/EDGE/PATH composites via fill_agtype_value_no_copy, route those allocations into a shared arena and reset (O(1)) at the end of the comparison instead of recursively pfree'ing the tree (O(N)). Master IC4 spent 17.64% of total CPU in pfree_agtype_value_content from this comparator; the arena collapses that walk to one MemoryContextReset. A2 — binary-direct id extraction: When compare_agtype_scalar_containers compares two VERTEX or two EDGE values, only the graphid 'id' determines order (compare_agtype_scalar_values ignores all other fields). Master IC4 spent 73.48% inclusive in ag_deserialize_composite called from this comparator, building the entire agtype_value tree (label, properties, nested values) just to read one int64. Add extract_composite_id_fast: walks the binary VERTEX/EDGE container directly and reads the int64 id at field index 0 (always 'id' due to length-sorted key ordering established by uniqueify_agtype_object — the same invariant PR #2302 leveraged for direct array indexing). Wired into compare_agtype_scalar_containers as a fast path before the existing arena+slow path. Falls through on cross-type comparisons (VERTEX vs EDGE), PATH, and malformed extended-type entries. ASAN/UBSan fixes (10 issues found while verifying A1+A2 under -fsanitize=address -fsanitize=undefined; 1 introduced by A2, 9 pre-existing). The 11th finding (uninitialized parent_cpstate fields in cypher_analyze.c) was independently fixed upstream by PR #2423; this branch is rebased on top of that fix and no longer carries a duplicate change. - HIGH: heap-buffer-overflow in agtype_raw.c:write_container — VARSIZE included the 4-byte varlena header but copy started past it, reading VARHDRSZ bytes past source allocation. Subtract VARHDRSZ. ASan flagged 4 times in installcheck. - 7 unaligned int64/float8 reads/writes (AGT_HEADER is 4 bytes, leaving payload 4-byte- but not 8-byte-aligned). UB under strict-alignment rules. Replaced typed loads/stores with memcpy in agtype_ext.c (4 sites), agtype_util.c (3 sites incl. extract_composite_id_fast), agtype_raw.c (write_graphid). - agtype.c:agtype_hash_cmp: reading r->val.array.raw_scalar at WAGT_END_ARRAY where the iterator does not populate *val. Replaced with explicit raw_scalar bool stack tracked across BEGIN/END pairs. Hash output unchanged. - agtype_util.c:copy_to_buffer: memcpy(NULL, NULL, 0) is UB. Guard with if (len > 0). Performance (SF3 LDBC, paired same-session vs master): IC sum: 201,930 -> 157,129 ms = -22.19% IC4: 50,570 -> 7,615 ms = -84.94% IC5: 21,779 -> 20,646 ms = -5.20% IS, IU: within noise (<= +/-3% on sub-second queries) Tests: 34/34 installcheck on production PG18.3 34/34 installcheck on ASAN+UBSan PG18.3 (zero unique runtime errors, zero heap-buffer-overflows; the original audit found 10 unique runtime issues plus 1 HBO with 4 instances, all addressed here or upstream) Co-authored-by: Claude <noreply@anthropic.com> modified: src/backend/utils/adt/agtype.c modified: src/backend/utils/adt/agtype_ext.c modified: src/backend/utils/adt/agtype_raw.c modified: src/backend/utils/adt/agtype_util.c modified: src/include/utils/agtype.h
1 parent a1b749a commit d562292

5 files changed

Lines changed: 396 additions & 25 deletions

File tree

src/backend/utils/adt/agtype.c

Lines changed: 48 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4871,6 +4871,22 @@ Datum agtype_hash_cmp(PG_FUNCTION_ARGS)
48714871
agtype_iterator_token tok;
48724872
agtype_value *r;
48734873
uint64 seed = 0xF0F0F0F0;
4874+
/*
4875+
* Stack of "is the current open array a raw_scalar pseudo-array?" so
4876+
* that the WAGT_END_ARRAY case can match the choice we made at
4877+
* WAGT_BEGIN_ARRAY. The iterator does not populate *val on END tokens
4878+
* (see comment above agtype_iterator_next), so r->val.array.raw_scalar
4879+
* is uninitialized at that point and reading it as a bool is undefined
4880+
* behavior (UBSan flagged values like 127 here).
4881+
*
4882+
* Arrays in agtype are bounded by AGTYPE_CONTAINER_SIZE which fits in
4883+
* AGT_CMASK; a fixed-size stack of 64 levels is far more than any real
4884+
* agtype graph value will ever produce. Bail out with an error if we
4885+
* exceed it rather than silently corrupting the hash.
4886+
*/
4887+
#define HASH_RAW_SCALAR_STACK_DEPTH 64
4888+
bool raw_scalar_stack[HASH_RAW_SCALAR_STACK_DEPTH];
4889+
int raw_scalar_top = -1;
48744890

48754891
/* this function returns INTEGER which is 32 bits */
48764892
if (PG_ARGISNULL(0))
@@ -4887,18 +4903,46 @@ Datum agtype_hash_cmp(PG_FUNCTION_ARGS)
48874903
{
48884904
if (IS_A_AGTYPE_SCALAR(r) && AGTYPE_ITERATOR_TOKEN_IS_HASHABLE(tok))
48894905
agtype_hash_scalar_value_extended(r, &hash, seed);
4890-
else if (tok == WAGT_BEGIN_ARRAY && !r->val.array.raw_scalar)
4891-
seed = LEFT_ROTATE(seed, 4);
4906+
else if (tok == WAGT_BEGIN_ARRAY)
4907+
{
4908+
/* push raw_scalar state for the matching END token */
4909+
raw_scalar_top++;
4910+
if (raw_scalar_top >= HASH_RAW_SCALAR_STACK_DEPTH)
4911+
{
4912+
ereport(ERROR,
4913+
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
4914+
errmsg("agtype_hash_cmp: array nesting depth exceeds %d",
4915+
HASH_RAW_SCALAR_STACK_DEPTH)));
4916+
}
4917+
raw_scalar_stack[raw_scalar_top] = r->val.array.raw_scalar;
4918+
if (!r->val.array.raw_scalar)
4919+
{
4920+
seed = LEFT_ROTATE(seed, 4);
4921+
}
4922+
}
48924923
else if (tok == WAGT_BEGIN_OBJECT)
48934924
seed = LEFT_ROTATE(seed, 6);
4894-
else if (tok == WAGT_END_ARRAY && !r->val.array.raw_scalar)
4895-
seed = RIGHT_ROTATE(seed, 4);
4925+
else if (tok == WAGT_END_ARRAY)
4926+
{
4927+
bool was_raw_scalar;
4928+
4929+
/* pop matching BEGIN's raw_scalar state */
4930+
Assert(raw_scalar_top >= 0);
4931+
was_raw_scalar = raw_scalar_stack[raw_scalar_top];
4932+
raw_scalar_top--;
4933+
if (!was_raw_scalar)
4934+
{
4935+
seed = RIGHT_ROTATE(seed, 4);
4936+
}
4937+
}
48964938
else if (tok == WAGT_END_OBJECT)
48974939
seed = RIGHT_ROTATE(seed, 4);
48984940

48994941
seed = LEFT_ROTATE(seed, 1);
49004942
}
49014943

4944+
#undef HASH_RAW_SCALAR_STACK_DEPTH
4945+
49024946
pfree_if_not_null(r);
49034947
PG_FREE_IF_COPY(agt, 0);
49044948

src/backend/utils/adt/agtype_ext.c

Lines changed: 17 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,14 @@ bool ag_serialize_extended_type(StringInfo buffer, agtentry *agtentry,
5757
/* copy in the int_value data */
5858
numlen = sizeof(int64);
5959
offset = reserve_from_buffer(buffer, numlen);
60-
*((int64 *)(buffer->data + offset)) = scalar_val->val.int_value;
60+
/*
61+
* Use memcpy because the AGT_HEADER (4 bytes) leaves the buffer
62+
* write position 4-byte-aligned but not necessarily 8-byte-aligned.
63+
* Direct *((int64 *) ...) = ... is undefined behavior under strict
64+
* alignment rules (UBSan flags it; works in practice on x86_64).
65+
*/
66+
memcpy(buffer->data + offset, &scalar_val->val.int_value,
67+
sizeof(int64));
6168

6269
*agtentry = AGTENTRY_IS_AGTYPE | (padlen + numlen + AGT_HEADER_SIZE);
6370
break;
@@ -68,7 +75,8 @@ bool ag_serialize_extended_type(StringInfo buffer, agtentry *agtentry,
6875
/* copy in the float_value data */
6976
numlen = sizeof(scalar_val->val.float_value);
7077
offset = reserve_from_buffer(buffer, numlen);
71-
*((float8 *)(buffer->data + offset)) = scalar_val->val.float_value;
78+
memcpy(buffer->data + offset, &scalar_val->val.float_value,
79+
sizeof(float8));
7280

7381
*agtentry = AGTENTRY_IS_AGTYPE | (padlen + numlen + AGT_HEADER_SIZE);
7482
break;
@@ -154,12 +162,17 @@ void ag_deserialize_extended_type(char *base_addr, uint32 offset,
154162
{
155163
case AGT_HEADER_INTEGER:
156164
result->type = AGTV_INTEGER;
157-
result->val.int_value = *((int64 *)(base + AGT_HEADER_SIZE));
165+
/* See comment in ag_serialize_extended_type. The 4-byte AGT_HEADER
166+
* leaves the int64 at a 4-byte-aligned but not 8-byte-aligned
167+
* address, so a direct typed load is undefined behavior. */
168+
memcpy(&result->val.int_value, base + AGT_HEADER_SIZE,
169+
sizeof(int64));
158170
break;
159171

160172
case AGT_HEADER_FLOAT:
161173
result->type = AGTV_FLOAT;
162-
result->val.float_value = *((float8 *)(base + AGT_HEADER_SIZE));
174+
memcpy(&result->val.float_value, base + AGT_HEADER_SIZE,
175+
sizeof(float8));
163176
break;
164177

165178
case AGT_HEADER_VERTEX:

src/backend/utils/adt/agtype_raw.c

Lines changed: 23 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -197,9 +197,20 @@ void write_graphid(agtype_build_state *bstate, graphid graphid)
197197
write_const(AGT_HEADER_INTEGER, AGT_HEADER_TYPE);
198198
length += AGT_HEADER_SIZE;
199199

200-
/* graphid value */
201-
write_const(graphid, int64);
202-
length += sizeof(int64);
200+
/*
201+
* graphid value (int64). The 4-byte AGT_HEADER above leaves the buffer
202+
* write position 4-byte-aligned but not 8-byte-aligned, so the typed
203+
* write done by write_const(graphid, int64) is undefined behavior under
204+
* strict alignment rules. Use memcpy. (UBSan flags the typed write as
205+
* "store to misaligned address ... requires 8 byte alignment".)
206+
*/
207+
{
208+
int numlen = sizeof(int64);
209+
int g_offset = BUFFER_RESERVE(numlen);
210+
211+
memcpy(bstate->buffer->data + g_offset, &graphid, sizeof(int64));
212+
length += numlen;
213+
}
203214

204215
/* agtentry */
205216
write_agt(AGTENTRY_IS_AGTYPE | length);
@@ -214,8 +225,15 @@ void write_container(agtype_build_state *bstate, agtype *agtype)
214225
/* padding */
215226
length += BUFFER_WRITE_PAD();
216227

217-
/* varlen data */
218-
length += write_ptr((char *) &agtype->root, VARSIZE(agtype));
228+
/*
229+
* Copy the inner agtype_container only, NOT the outer varlena header.
230+
* VARSIZE(agtype) reports the total varlena size (including the 4-byte
231+
* vl_len_ header), but we are starting our copy at &agtype->root, which
232+
* is already past that header. Subtracting VARHDRSZ avoids reading
233+
* VARHDRSZ bytes past the source allocation (caught by ASan as
234+
* heap-buffer-overflow in __interceptor_memcpy from write_pointer).
235+
*/
236+
length += write_ptr((char *) &agtype->root, VARSIZE(agtype) - VARHDRSZ);
219237

220238
/* agtentry */
221239
write_agt(AGTENTRY_IS_CONTAINER | length);

0 commit comments

Comments
 (0)