Skip to content

Enhancement: Optimize Pax table insertion performance#1364

Merged
gongxun0928 merged 1 commit into
apache:mainfrom
gongxun0928:performance/import-tuple-insert-performance-and-eliminate-some-call-overhead
Oct 15, 2025
Merged

Enhancement: Optimize Pax table insertion performance#1364
gongxun0928 merged 1 commit into
apache:mainfrom
gongxun0928:performance/import-tuple-insert-performance-and-eliminate-some-call-overhead

Conversation

@gongxun0928
Copy link
Copy Markdown
Contributor

  1. Explicit inline functions.
  2. Instead of checking the physical size of the file every time a tuple is inserted, it is checked every 16 tuples.

performance result

create table t1(a int, b int, c int, d int, e text, f text,g text, h text) using pax with(compresstype
=zstd,compresslevel=5);

gpadmin=# insert into t1 select i, i+1,i+2,i+3, i::text, i::text, i::text, i::text from generate_series(1,5000000) i;
INSERT 0 5000000
Time: 6124.535 ms (00:06.125)
gpadmin=# insert into t1 select i, i+1,i+2,i+3, i::text, i::text, i::text, i::text from generate_series(1,5000000) i;
INSERT 0 5000000
Time: 5993.682 ms (00:05.994)

-- optimized with this commit
create table t1(a int, b int, c int, d int, e text, f text,g text, h text) using pax with(compresstype
=zstd,compresslevel=5);
gpadmin=# insert into t1 select i, i+1,i+2,i+3, i::text, i::text, i::text, i::text from generate_series(1,5000000) i;
INSERT 0 5000000
Time: 5713.184 ms (00:05.713)
gpadmin=# insert into t1 select i, i+1,i+2,i+3, i::text, i::text, i::text, i::text from generate_series(1,5000000) i;
INSERT 0 5000000
Time: 5430.221 ms (00:05.430)

Fixes #ISSUE_Number

What does this PR do?

Type of Change

  • Bug fix (non-breaking change)
  • New feature (non-breaking change)
  • Breaking change (fix or feature with breaking changes)
  • Documentation update

Breaking Changes

Test Plan

  • Unit tests added/updated
  • Integration tests added/updated
  • Passed make installcheck
  • Passed make -C src/test installcheck-cbdb-parallel

Impact

Performance:

User-facing changes:

Dependencies:

Checklist

Additional Context

CI Skip Instructions


@gongxun0928 gongxun0928 force-pushed the performance/import-tuple-insert-performance-and-eliminate-some-call-overhead branch from 7f0085c to a342620 Compare September 21, 2025 13:54
@gongxun0928 gongxun0928 force-pushed the performance/import-tuple-insert-performance-and-eliminate-some-call-overhead branch from a342620 to 72d88a1 Compare September 22, 2025 15:58
Comment thread contrib/pax_storage/src/cpp/storage/orc/orc_writer.cc
Comment thread contrib/pax_storage/src/cpp/storage/pax.cc Outdated
@gongxun0928 gongxun0928 force-pushed the performance/import-tuple-insert-performance-and-eliminate-some-call-overhead branch 2 times, most recently from 688b7cf to 50ef57c Compare September 23, 2025 07:25
Comment thread contrib/pax_storage/src/cpp/storage/pax.cc Outdated
Comment thread contrib/pax_storage/src/cpp/storage/vec/pax_porc_adpater.cc Outdated
Comment thread contrib/pax_storage/src/cpp/access/pax_dml_state.cc Outdated
@gongxun0928 gongxun0928 force-pushed the performance/import-tuple-insert-performance-and-eliminate-some-call-overhead branch from 50ef57c to 254eb7b Compare September 26, 2025 03:39
Comment thread contrib/pax_storage/src/cpp/comm/cbdb_wrappers.h
Comment thread contrib/pax_storage/src/cpp/storage/vec/pax_porc_adpater.cc Outdated
Comment thread contrib/pax_storage/src/cpp/comm/cbdb_wrappers.h Outdated
@gongxun0928 gongxun0928 force-pushed the performance/import-tuple-insert-performance-and-eliminate-some-call-overhead branch 2 times, most recently from bc3ca05 to d479902 Compare October 9, 2025 07:07
Copy link
Copy Markdown
Contributor

@jiaqizho jiaqizho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with single comment

Comment thread contrib/pax_storage/src/cpp/storage/pax.cc Outdated
@gongxun0928 gongxun0928 force-pushed the performance/import-tuple-insert-performance-and-eliminate-some-call-overhead branch 2 times, most recently from eee39f5 to 4cc222b Compare October 10, 2025 12:34
1. Explicit inline functions.
2. Instead of checking the physical size of the file every time a tuple
is inserted, it is checked every 16 tuples.

performance result
```
create table t1(a int, b int, c int, d int, e text, f text,g text, h text) using pax with(compresstype
=zstd,compresslevel=5);

gpadmin=# insert into t1 select i, i+1,i+2,i+3, i::text, i::text, i::text, i::text from generate_series(1,5000000) i;
INSERT 0 5000000
Time: 6124.535 ms (00:06.125)
gpadmin=# insert into t1 select i, i+1,i+2,i+3, i::text, i::text, i::text, i::text from generate_series(1,5000000) i;
INSERT 0 5000000
Time: 5993.682 ms (00:05.994)

-- optimized with this commit
create table t1(a int, b int, c int, d int, e text, f text,g text, h text) using pax with(compresstype
=zstd,compresslevel=5);
gpadmin=# insert into t1 select i, i+1,i+2,i+3, i::text, i::text, i::text, i::text from generate_series(1,5000000) i;
INSERT 0 5000000
Time: 5713.184 ms (00:05.713)
gpadmin=# insert into t1 select i, i+1,i+2,i+3, i::text, i::text, i::text, i::text from generate_series(1,5000000) i;
INSERT 0 5000000
Time: 5430.221 ms (00:05.430)

```
@my-ship-it my-ship-it force-pushed the performance/import-tuple-insert-performance-and-eliminate-some-call-overhead branch from 4cc222b to 9b7abe4 Compare October 14, 2025 09:30
@gongxun0928 gongxun0928 merged commit 5ad83d3 into apache:main Oct 15, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants