Skip to content

Cherry-pick VACUUM stats & Vacuum progress related commits.#1

Open
reshke wants to merge 8 commits intomainfrom
hudhwuew
Open

Cherry-pick VACUUM stats & Vacuum progress related commits.#1
reshke wants to merge 8 commits intomainfrom
hudhwuew

Conversation

@reshke
Copy link
Copy Markdown
Owner

@reshke reshke commented Aug 30, 2024

fix #ISSUE_Number


Change logs

Describe your change clearly, including what problem is being solved or what feature is being added.

If it has some breaking backward or forward compatibility, please clary.

Why are the changes needed?

Describe why the changes are necessary.

Does this PR introduce any user-facing change?

If yes, please clarify the previous behavior and the change this PR proposes.

How was this patch tested?

Please detail how the changes were tested, including manual tests and any relevant unit or integration tests.

Contributor's Checklist

Here are some reminders and checklists before/when submitting your pull request, please check them:

  • Make sure your Pull Request has a clear title and commit message. You can take git-commit template as a reference.
  • Sign the Contributor License Agreement as prompted for your first-time contribution(One-time setup).
  • Learn the coding contribution guide, including our code conventions, workflow and more.
  • List your communication in the GitHub Issues or Discussions (if has or needed).
  • Document changes.
  • Add tests for the change
  • Pass make installcheck
  • Pass make -C src/test installcheck-cbdb-parallel
  • Feel free to request cloudberrydb/dev team for review and approval when your PR is ready🥳

Haolin Wang and others added 2 commits August 30, 2024 11:41
As GPDB vacuumss auxiliary tables together with base
relation in one distributed transaction, rather than
PostgreSQL where one transaction is opened and closed
for each auxiliary relation, vacuuming auxiliary TOAST
should not be dispatched.
* IndexVacuumInfo.analyze_only was not initialized, leading to possible
no-ops in index_vacuum_cleanup(), if the garbage value for it was
non-zero.

* IndexVacuumInfo.num_heap_tuples should be set to the AO table's old
reltuples value (value prior to kicking off vacuum). Please see
IndexVacuumInfo and lazy_vacuum_index() for details. As explained in
the code comments, num_heap_tuples will always be an estimate during
ambulkdelete(). Since dedd4075c9f, we have been using the index rel's
reltuples instead. This is wrong (eg in scenarios where the index has
been created post data manipulation). So, correctly use the table's
reltuples value to provide the estimate.
Copy link
Copy Markdown

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hiiii, @reshke welcome!🎊 Thanks for taking the effort to make our project better! 🙌 Keep making such awesome contributions!

Since we have already checked the visibility during scan tuples
in aocs compaction, the visibility check code in aocs compaction
is unnecessary and can be removed.

Co-authored-by: linxu.hlx <linxu.hlx@alibaba-inc.com>
@reshke reshke force-pushed the hudhwuew branch 2 times, most recently from b2e88f8 to 03ffe36 Compare August 30, 2024 14:20
When this option is passed, instead of vacuuming main tables look up auxiliary tables for
all indicated AO tables and vacuum those.

If this option is given without a tablename, vacuum ONLY all AO auxiliary tables.

Also add regress test coverage of this behavior.
huansong and others added 4 commits August 30, 2024 14:42
We used to have the pg_appendonly row for the AO/CO table stored
in its relcache entry, but that somehow got lost during the PG12
merge (it is still in 6X). Now bringing it back, and utilize it
wherever possible.

This commit basically a partial cherrypick of ef92bcf151f99c9152a10c8750eee0d4dde39bc1.
But due to a lot of changes in the code base and also we don't
need other changes in that commit, we are not cherrypicking it.
Enable system view "pg_stat_progress_vacuum" to report VACUUM progress
for ao_row and ao_column tables.

Here are details of how each field in this view is interpreted/updated
for append-optimized tables:

-----
phase
-----

Since the mechanism of VACUMM append-optimized tables is significantly
different from VACUUM heap tables, this commit adds three additional
vacuum progress phases dedicated to append-optimized talbes.

Possible VACUUM phases for append-optimized tables include the following:

- PROGRESS_VACUUM_PHASE_AO_PRE_CLEANUP (append-optimized pre-cleanup)
- PROGRESS_VACUUM_PHASE_AO_COMPACT (append-optimized compact)
- PROGRESS_VACUUM_PHASE_AO_POST_CLEANUP (append-optimized post-cleanup)
- PROGRESS_VACUUM_PHASE_VACUUM_INDEX (vacuuming indexes)

Phases for VACUUM append-optimized table progresses in the following order:

- append-optimized pre-cleanup
- append-optimized compact
- append-optimized post-cleanup

"vacuuming index" is a "sub-phases" that could happen as part of
pre-cleanup phase and/or post-cleanup phase, it happens if:

- the relation has index, AND
- there are awaiting-drop segment files that are invisible to all.

While a segment can only be marked as awaiting-drop during VACUUM
compact phase, it can be recycled during either post-cleanup phase or
pre-clean phase of a later vacuum run.

-----------
heap_blks_*
-----------
We divide byte size by heap block size to get heap_blks_* numbers for
AO tables. Since append-optimized tables have variable length block
sizes, measuring heap-equivalent blocks instead of using number of
varblocks better helps users to compare and calibrate.

---------------
heap_blks_total
---------------
- Collected at the begining of pre-cleanup phase by adding up the
  on-disk file sizes of all segment files for the append-optimized
relation and convert that size into number of heap-equivalent blocks.
- The value should not change while VACUUM progresses.

-----------------
heap_blks_scanned
-----------------
- Collected during compact phase.
- ao_row: updated every time we finish scanning a segment file.
- ao_column: updated every time the number advances.
- SPECIAL NOTE for AO: heap_blks_scanned can be smaller or equal to
  heap_blks_total at the end of VACUUM, because for AO tables we need
not to scan blocks after the logical EOF of a segment file.

------------------
heap_blks_vacuumed
------------------
- Collected whenever we truncate a segment file, which could happen
  during either/both/neither pre-cleanup phase and post-cleanup phase.
- SPECIAL NOTE for AO: heap_blks_vacuumed could be either smaller or
  larger than heap_blks_scanned. For AO tables the bottom line is at
the end of VACUUM the sum of heap_blks_vacuumed and heap_blks_scanned
must be smaller or equal to heap_blks_total.

------------------
index_vacuum_count
------------------
- Collected whenever we recycle a dead segment file, which could
  happen during either/both/neither pre-cleanup phase and post-cleanup
phase.

---------------
max_dead_tuples
---------------
- Collected at the beginning of pre_cleanup phase. This is the total
  number of tuples before the logical EOF of all segment files.
- The value should not change while VACUUM progresses.

---------------
num_dead_tuples
---------------
- Collected during compact phase
- ao_row: Update for every time we throw a dead tuple
- ao_column: updated every time we move a tuple and if number of dead
  tuples advances

Reviewed-by: Huansong Fu <fuhuansong@gmail.com>
This commit fixes the following compiler warnings:

vacuum_ao.c:738:1: warning: ‘init_vacrelstats’ was used with no prototype before its definition [-Wmissing-prototypes]
  738 | init_vacrelstats()
      | ^~~~~~~~~~~~~~~~
CPartPruneStepsBuilder.cpp: In member function ‘PartitionedRelPruneInfo* gpdxl::CPartPruneStepsBuilder::CreatePartPruneInfoForOneLevel(gpdxl::CDXLNode*)’:
CPartPruneStepsBuilder.cpp:83:29: warning: comparison of integer expressions of different signedness: ‘gpos::ULONG’ {aka ‘unsigned int’} and ‘int’ [-Wsign-compare]
   83 |         for (ULONG i = 0; i < pinfo->nparts; ++i)
      |                           ~~^~~~~~~~~~~~~~~
The following fields of the pg_stat_all_tables view are now
available for append-optimized tables:

n_live_tup
n_dead_tup
last_vacuum
vacuum_count

Note that n_dead_tup only accounts for dead tuples in
non-awaiting-drop segment files.

Reviewed-by: Hansong Fu <fuhuansong@gmail.com>
reshke pushed a commit that referenced this pull request Jan 7, 2025
We used to rename index-backed constraints in the EXCHANGE PARTITION
command in 6X. Now we don't. We've decided to keep that behavior
in 7X after looking into the opposing arguments:

Argument #1. It might cause problem during upgrade.
- Firstly, we won't be using legacy syntax in the dump scripts so we
just need to worry about the existing tables produced by EXCHANGE
PARTITION. I.e. whether or not they can be restored correctly.
- For upgrading from 6X->7X, since those tables already have matched
constraint and index names with the table names, we should be OK.
- For upgrading 7X->onward, since we implement EXCHANGE PARTIITON
simply as a combination of upstream-syntax commands (see
AtExecGPExchangePartition()), pg_upgrade should be able to handle
them. We've verified that manually and the automated test should
cover that too. In fact, this gives another point that we shouldn't
do our own hacky things as part of EXCHANGE PARTITION which might
confuse pg_upgrade.

Argument #2. It might surprise the users and their existing workloads.
- The indexed constraint names are all implicitly generated and
shouldn't be directly used in most cases.
- They are not the only thing that will appear changed. E.g. the
normal indexes (e.g. B-tree ones) will be too. So the decision to
change one should be made with changing all of them.

More details see: https://docs.google.com/document/d/1enJdKYxkk5WFRV1WoqIgLgRxCGxOqI2nglJVE_Wglec
reshke pushed a commit that referenced this pull request Jan 21, 2025
In commit a0684da we made a change to use the fixed TOAST_TUPLE_TARGET
instead of custom toast_tuple_target that's calculated based on the the
blocksize of the AO table for toasting AO table tuples.

This caused an issue that some tuples that are well beyond the
toast_tuple_target are not being toasted (because they're smaller than
TOAST_TUPLE_TARGET). This is ok when the tuples can still fit into the
AO table's blocksize. But if not, an error would occur. E.g.:

  postgres=# create table t (a int, b int[]) WITH (appendonly=true, blocksize=8192);
  CREATE TABLE
  postgres=# insert into t select 1, array_agg(x) from generate_series(1, 2030) x;
  ERROR:  item too long (check #1): length 8160, maxBufferLen 8168
  CONTEXT:  Append-Only table 't'

Therefore, we should revert those changes, including:

1. Still use the toast_tuple_target for toasting 'x' and 'e' columns for AO tables.
There's also a GPDB_12_MERGE_FIXME in the original comment suggesting to use
RelationGetToastTupleTarget for AO table (in order to unify the logic to calculate
maxDataLen for heap and AO tuples, I suppose). But that's not possible currently
because the calculated toast_tuple_target is stored in AppendOnlyInsertDesc and
we cannot get it from the Relation struct. And it seems to me that we won't have
a way to do that easily. Therefore, keep this FIXME comment being removed.

2. Still use the toast_tuple_target for toasting 'm' columns for AO tables. Also restore
the FIXME comment suggesting that we should use a bigger target size for 'm' columns.
This should be something that worth investigating more into.

Commit a0684da also includes a change to use custom toast_tuple_target for heap tables,
which should be a valid change. I think that fixed an oversight when merging the upstream
commit c251336.
reshke pushed a commit that referenced this pull request Jan 24, 2025
## Problem
An error occurs in python lib when a plpython function is executed.
After our analysis, in the user's cluster, a plpython UDF 
was running with the unstable network, and got a timeout error:
`failed to acquire resources on one or more segments`.
Then a plpython UDF was run in the same session, and the UDF
failed with GC error.

Here is the core dump:
```
2023-11-24 10:15:18.945507 CST,,,p2705198,th2081832064,,,,0,,,seg-1,,,,,"LOG","00000","3rd party error log:
    #0 0x7f7c68b6d55b in frame_dealloc /home/cc/repo/cpython/Objects/frameobject.c:509:5
    #1 0x7f7c68b5109d in gen_send_ex /home/cc/repo/cpython/Objects/genobject.c:108:9
    #2 0x7f7c68af9ddd in PyIter_Next /home/cc/repo/cpython/Objects/abstract.c:3118:14
    #3 0x7f7c78caa5c0 in PLy_exec_function /home/cc/repo/gpdb6/src/pl/plpython/plpy_exec.c:134:11
    #4 0x7f7c78cb5ffb in plpython_call_handler /home/cc/repo/gpdb6/src/pl/plpython/plpy_main.c:387:13
    apache#5 0x562f5e008bb5 in ExecMakeTableFunctionResult /home/cc/repo/gpdb6/src/backend/executor/execQual.c:2395:13
    apache#6 0x562f5e0dddec in FunctionNext_guts /home/cc/repo/gpdb6/src/backend/executor/nodeFunctionscan.c:142:5
    apache#7 0x562f5e0da094 in FunctionNext /home/cc/repo/gpdb6/src/backend/executor/nodeFunctionscan.c:350:11
    apache#8 0x562f5e03d4b0 in ExecScanFetch /home/cc/repo/gpdb6/src/backend/executor/execScan.c:84:9
    apache#9 0x562f5e03cd8f in ExecScan /home/cc/repo/gpdb6/src/backend/executor/execScan.c:154:10
    #10 0x562f5e0da072 in ExecFunctionScan /home/cc/repo/gpdb6/src/backend/executor/nodeFunctionscan.c:380:9
    apache#11 0x562f5e001a1c in ExecProcNode /home/cc/repo/gpdb6/src/backend/executor/execProcnode.c:1071:13
    apache#12 0x562f5dfe6377 in ExecutePlan /home/cc/repo/gpdb6/src/backend/executor/execMain.c:3202:10
    apache#13 0x562f5dfe5bf4 in standard_ExecutorRun /home/cc/repo/gpdb6/src/backend/executor/execMain.c:1171:5
    apache#14 0x562f5dfe4877 in ExecutorRun /home/cc/repo/gpdb6/src/backend/executor/execMain.c:992:4
    apache#15 0x562f5e857e69 in PortalRunSelect /home/cc/repo/gpdb6/src/backend/tcop/pquery.c:1164:4
    apache#16 0x562f5e856d3f in PortalRun /home/cc/repo/gpdb6/src/backend/tcop/pquery.c:1005:18
    apache#17 0x562f5e84607a in exec_simple_query /home/cc/repo/gpdb6/src/backend/tcop/postgres.c:1848:10
```

## Reproduce
We can use a simple procedure to reproduce the above problem:
- set timeout GUC: `gpconfig -c gp_segment_connect_timeout -v 5` and `gpstop -ari`
- prepare function:
```
CREATE EXTENSION plpythonu;
CREATE OR REPLACE FUNCTION test_func() RETURNS SETOF int AS
$$
plpy.execute("select pg_backend_pid()")

for i in range(0, 5):
    yield (i)

$$ LANGUAGE plpythonu;
```
- exit from the current psql session.
- stop the postmaster of segment: `gdb -p "the pid of segment postmaster"`
- enter a psql session.
- call `SELECT test_func();` and get error
```
gpadmin=# select test_func();
ERROR:  function "test_func" error fetching next item from iterator (plpy_elog.c:121)
DETAIL:  Exception: failed to acquire resources on one or more segments
CONTEXT:  Traceback (most recent call last):
PL/Python function "test_func"
```
- quit gdb and make postmaster runnable.
- call  `SELECT test_func();` again and get panic
```
gpadmin=# SELECT test_func();
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!> 
```

## Analysis
- There is an SPI call in test_func(): `plpy.execute()`. 
- Then coordinator will start a subtransaction by PLy_spi_subtransaction_begin();
- Meanwhile, if the segment cannot receive the instruction from the coordinator,
  the subtransaction beginning procedure return fails.
- BUT! The Python processor does not know whether an error happened and
  does not clean its environment.
- Then the next plpython UDF in the same session will fail due to the wrong
  Python environment.

## Solution
- Use try-catch to catch the exception caused by PLy_spi_subtransaction_begin()
- set the python error indicator by PLy_spi_exception_set()


Co-authored-by: Chen Mulong <chenmulong@gmail.com>
reshke pushed a commit that referenced this pull request Jan 27, 2025
In commit a0684da we made a change to use the fixed TOAST_TUPLE_TARGET
instead of custom toast_tuple_target that's calculated based on the the
blocksize of the AO table for toasting AO table tuples.

This caused an issue that some tuples that are well beyond the
toast_tuple_target are not being toasted (because they're smaller than
TOAST_TUPLE_TARGET). This is ok when the tuples can still fit into the
AO table's blocksize. But if not, an error would occur. E.g.:

  postgres=# create table t (a int, b int[]) WITH (appendonly=true, blocksize=8192);
  CREATE TABLE
  postgres=# insert into t select 1, array_agg(x) from generate_series(1, 2030) x;
  ERROR:  item too long (check #1): length 8160, maxBufferLen 8168
  CONTEXT:  Append-Only table 't'

Therefore, we should revert those changes, including:

1. Still use the toast_tuple_target for toasting 'x' and 'e' columns for AO tables.
There's also a GPDB_12_MERGE_FIXME in the original comment suggesting to use
RelationGetToastTupleTarget for AO table (in order to unify the logic to calculate
maxDataLen for heap and AO tuples, I suppose). But that's not possible currently
because the calculated toast_tuple_target is stored in AppendOnlyInsertDesc and
we cannot get it from the Relation struct. And it seems to me that we won't have
a way to do that easily. Therefore, keep this FIXME comment being removed.

2. Still use the toast_tuple_target for toasting 'm' columns for AO tables. Also restore
the FIXME comment suggesting that we should use a bigger target size for 'm' columns.
This should be something that worth investigating more into.

Commit a0684da also includes a change to use custom toast_tuple_target for heap tables,
which should be a valid change. I think that fixed an oversight when merging the upstream
commit c251336.
reshke pushed a commit that referenced this pull request Jan 27, 2025
In commit a0684da we made a change to use the fixed TOAST_TUPLE_TARGET
instead of custom toast_tuple_target that's calculated based on the the
blocksize of the AO table for toasting AO table tuples.

This caused an issue that some tuples that are well beyond the
toast_tuple_target are not being toasted (because they're smaller than
TOAST_TUPLE_TARGET). This is ok when the tuples can still fit into the
AO table's blocksize. But if not, an error would occur. E.g.:

  postgres=# create table t (a int, b int[]) WITH (appendonly=true, blocksize=8192);
  CREATE TABLE
  postgres=# insert into t select 1, array_agg(x) from generate_series(1, 2030) x;
  ERROR:  item too long (check #1): length 8160, maxBufferLen 8168
  CONTEXT:  Append-Only table 't'

Therefore, we should revert those changes, including:

1. Still use the toast_tuple_target for toasting 'x' and 'e' columns for AO tables.
There's also a GPDB_12_MERGE_FIXME in the original comment suggesting to use
RelationGetToastTupleTarget for AO table (in order to unify the logic to calculate
maxDataLen for heap and AO tuples, I suppose). But that's not possible currently
because the calculated toast_tuple_target is stored in AppendOnlyInsertDesc and
we cannot get it from the Relation struct. And it seems to me that we won't have
a way to do that easily. Therefore, keep this FIXME comment being removed.

2. Still use the toast_tuple_target for toasting 'm' columns for AO tables. Also restore
the FIXME comment suggesting that we should use a bigger target size for 'm' columns.
This should be something that worth investigating more into.

Commit a0684da also includes a change to use custom toast_tuple_target for heap tables,
which should be a valid change. I think that fixed an oversight when merging the upstream
commit c251336.
reshke pushed a commit that referenced this pull request Jan 28, 2025
In commit a0684da we made a change to use the fixed TOAST_TUPLE_TARGET
instead of custom toast_tuple_target that's calculated based on the the
blocksize of the AO table for toasting AO table tuples.

This caused an issue that some tuples that are well beyond the
toast_tuple_target are not being toasted (because they're smaller than
TOAST_TUPLE_TARGET). This is ok when the tuples can still fit into the
AO table's blocksize. But if not, an error would occur. E.g.:

  postgres=# create table t (a int, b int[]) WITH (appendonly=true, blocksize=8192);
  CREATE TABLE
  postgres=# insert into t select 1, array_agg(x) from generate_series(1, 2030) x;
  ERROR:  item too long (check #1): length 8160, maxBufferLen 8168
  CONTEXT:  Append-Only table 't'

Therefore, we should revert those changes, including:

1. Still use the toast_tuple_target for toasting 'x' and 'e' columns for AO tables.
There's also a GPDB_12_MERGE_FIXME in the original comment suggesting to use
RelationGetToastTupleTarget for AO table (in order to unify the logic to calculate
maxDataLen for heap and AO tuples, I suppose). But that's not possible currently
because the calculated toast_tuple_target is stored in AppendOnlyInsertDesc and
we cannot get it from the Relation struct. And it seems to me that we won't have
a way to do that easily. Therefore, keep this FIXME comment being removed.

2. Still use the toast_tuple_target for toasting 'm' columns for AO tables. Also restore
the FIXME comment suggesting that we should use a bigger target size for 'm' columns.
This should be something that worth investigating more into.

Commit a0684da also includes a change to use custom toast_tuple_target for heap tables,
which should be a valid change. I think that fixed an oversight when merging the upstream
commit c251336.
reshke pushed a commit that referenced this pull request Jan 29, 2025
In commit a0684da we made a change to use the fixed TOAST_TUPLE_TARGET
instead of custom toast_tuple_target that's calculated based on the the
blocksize of the AO table for toasting AO table tuples.

This caused an issue that some tuples that are well beyond the
toast_tuple_target are not being toasted (because they're smaller than
TOAST_TUPLE_TARGET). This is ok when the tuples can still fit into the
AO table's blocksize. But if not, an error would occur. E.g.:

  postgres=# create table t (a int, b int[]) WITH (appendonly=true, blocksize=8192);
  CREATE TABLE
  postgres=# insert into t select 1, array_agg(x) from generate_series(1, 2030) x;
  ERROR:  item too long (check #1): length 8160, maxBufferLen 8168
  CONTEXT:  Append-Only table 't'

Therefore, we should revert those changes, including:

1. Still use the toast_tuple_target for toasting 'x' and 'e' columns for AO tables.
There's also a GPDB_12_MERGE_FIXME in the original comment suggesting to use
RelationGetToastTupleTarget for AO table (in order to unify the logic to calculate
maxDataLen for heap and AO tuples, I suppose). But that's not possible currently
because the calculated toast_tuple_target is stored in AppendOnlyInsertDesc and
we cannot get it from the Relation struct. And it seems to me that we won't have
a way to do that easily. Therefore, keep this FIXME comment being removed.

2. Still use the toast_tuple_target for toasting 'm' columns for AO tables. Also restore
the FIXME comment suggesting that we should use a bigger target size for 'm' columns.
This should be something that worth investigating more into.

Commit a0684da also includes a change to use custom toast_tuple_target for heap tables,
which should be a valid change. I think that fixed an oversight when merging the upstream
commit c251336.
reshke pushed a commit that referenced this pull request Mar 2, 2026
1. update hook function points, add smgrdounlinkall hook point
2. refactor code for pg master branch
reshke pushed a commit that referenced this pull request Mar 11, 2026
1. update hook function points, add smgrdounlinkall hook point
2. refactor code for pg master branch
reshke pushed a commit that referenced this pull request Mar 12, 2026
1. update hook function points, add smgrdounlinkall hook point
2. refactor code for pg master branch
reshke pushed a commit that referenced this pull request Mar 12, 2026
1. update hook function points, add smgrdounlinkall hook point
2. refactor code for pg master branch
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants