Apache Cloudberry and cbcopy version
cbcopy: v1.1.5 (main, current HEAD 4fa9725)
What happened
--connection-mode pull is documented as "destination connects to source",
but the CopyOnMaster strategy ignores --connection-mode entirely — it
always uses src master → dest master for the data connection regardless of
the flag. CopyOnMaster is forced for two classes of tables:
- any table with
rows <= --on-segment-threshold (default 1,000,000), and
- any
DISTRIBUTED REPLICATED table (hardcoded, no flag to opt out).
As a consequence, in any topology where the destination cluster cannot be
actively dialed by the source cluster (e.g. destination GP cluster running
inside Kubernetes, source GP cluster outside, only dest → src is
reachable), cbcopy fails as soon as it encounters such a table — even when
the user sets --connection-mode pull and --on-segment-threshold 0,
because replicated tables still take the CopyOnMaster path.
On top of that, when both --connection-mode pull and CopyOnMaster are
in play, the helper-port temp table is created on src but queried from
dest, producing:
ERROR: relation "public.cbcopy_ports_temp_onmaster_<ts>" does not exist (SQLSTATE 42P01)
What you think should happen instead
--connection-mode pull should apply uniformly to every copy strategy,
including CopyOnMaster. Specifically, under pull:
src master runs cbcopy_helper --listen ... --direction send
dest master runs cbcopy_helper --host <src> --port <p> --direction receive
so that the only network requirement is dest → src, matching the documented
intent of pull mode.
The helper-port temp table inconsistency goes away on its own once the
direction is correct (both _onmaster_ and _onall_ external tables end up
on the same side, src, under pull).
How to reproduce
Minimal logical repro (a real two-cluster setup just makes the failure mode
obvious; same-host environments mask the bug because src and dest masters
share /tmp):
- Set up source and destination clusters where the source side has no
inbound reachability to the destination side, only dest → src works.
- Create a
DISTRIBUTED REPLICATED table in the source schema, or any
small table (rows ≤ 1,000,000).
- Run:
cbcopy \
--source-host <src> --source-port 5432 --source-user gpadmin \
--dest-host <dest> --dest-port <port> --dest-user gpadmin \
--schema gpadmin.<schema> --dest-schema <destdb>.<schema> \
--connection-mode pull \
--on-segment-threshold 0 \
--data-port-range 50000-60000
- cbcopy attempts
src master → dest master for the replicated/small
table; with the dest-only-reachable topology this fails with either a TCP
connection error or the relation "public.cbcopy_ports_temp_onmaster_*" does not exist SQLSTATE 42P01 above.
Root cause references (against current main):
copy/copy_command.go:126-163 — CopyOnMaster.CopyTo / CopyFrom hardcode
src master --host/--port and dest master --listen, with no
if cc.ConnectionMode == option.ConnectionModePull branch (compare with
CopyOnSegment.CopyTo/CopyFrom at lines 179-225, which do have the branch).
copy/copy.go:271-280 — port temp table built on srcManageConn when
connectionMode == pull.
copy/copy_operation.go:135-147 — but the port query uses
destManageConn when connectionMode == push || op.command.IsMasterCopy(),
so pull + CopyOnMaster queries from dest while the table lives on src.
Note: the existing e2e test at end_to_end/basic_test.go:427
({"CopyOnMaster", "pull", ...}) does not catch this because in that test
src and dest are two databases on the same GP cluster — same master process,
same /tmp — so the bug is masked.
Operating System
Linux (issue is environment-independent; observed on RHEL 9.5 / CentOS 7
class hosts and Kubernetes Pod-based dest clusters).
Anything else
Workaround (only covers part of the problem): if the schema has no
DISTRIBUTED REPLICATED tables, the user can set
--on-segment-threshold 0 and --exclude-table for any replicated tables,
keeping CopyOnMaster from being hit. This is verified to work but does not
cover schemas with replicated tables, which are forced into CopyOnMaster
unconditionally.
I have a fix design ready (minimal: adds the pull branch to
CopyOnMaster.CopyTo/CopyFrom symmetric to CopyOnSegment, and drops the
|| op.command.IsMasterCopy() special case in copy_operation.go). Happy
to submit it as a PR linked to this issue.
Are you willing to submit PR?
Code of Conduct
Apache Cloudberry and cbcopy version
cbcopy:
v1.1.5(main, current HEAD4fa9725)What happened
--connection-mode pullis documented as "destination connects to source",but the
CopyOnMasterstrategy ignores--connection-modeentirely — italways uses
src master → dest masterfor the data connection regardless ofthe flag.
CopyOnMasteris forced for two classes of tables:rows <= --on-segment-threshold(default 1,000,000), andDISTRIBUTED REPLICATEDtable (hardcoded, no flag to opt out).As a consequence, in any topology where the destination cluster cannot be
actively dialed by the source cluster (e.g. destination GP cluster running
inside Kubernetes, source GP cluster outside, only
dest → srcisreachable), cbcopy fails as soon as it encounters such a table — even when
the user sets
--connection-mode pulland--on-segment-threshold 0,because replicated tables still take the
CopyOnMasterpath.On top of that, when both
--connection-mode pullandCopyOnMasterarein play, the helper-port temp table is created on src but queried from
dest, producing:
What you think should happen instead
--connection-mode pullshould apply uniformly to every copy strategy,including
CopyOnMaster. Specifically, underpull:src masterrunscbcopy_helper --listen ... --direction senddest masterrunscbcopy_helper --host <src> --port <p> --direction receiveso that the only network requirement is
dest → src, matching the documentedintent of pull mode.
The helper-port temp table inconsistency goes away on its own once the
direction is correct (both
_onmaster_and_onall_external tables end upon the same side, src, under pull).
How to reproduce
Minimal logical repro (a real two-cluster setup just makes the failure mode
obvious; same-host environments mask the bug because src and dest masters
share
/tmp):inbound reachability to the destination side, only
dest → srcworks.DISTRIBUTED REPLICATEDtable in the source schema, or anysmall table (rows ≤ 1,000,000).
src master → dest masterfor the replicated/smalltable; with the dest-only-reachable topology this fails with either a TCP
connection error or the
relation "public.cbcopy_ports_temp_onmaster_*" does not existSQLSTATE 42P01 above.Root cause references (against current
main):copy/copy_command.go:126-163—CopyOnMaster.CopyTo/CopyFromhardcodesrc master --host/--portanddest master --listen, with noif cc.ConnectionMode == option.ConnectionModePullbranch (compare withCopyOnSegment.CopyTo/CopyFromat lines 179-225, which do have the branch).copy/copy.go:271-280— port temp table built onsrcManageConnwhenconnectionMode == pull.copy/copy_operation.go:135-147— but the port query usesdestManageConnwhenconnectionMode == push || op.command.IsMasterCopy(),so
pull + CopyOnMasterqueries fromdestwhile the table lives onsrc.Note: the existing e2e test at
end_to_end/basic_test.go:427(
{"CopyOnMaster", "pull", ...}) does not catch this because in that testsrc and dest are two databases on the same GP cluster — same master process,
same
/tmp— so the bug is masked.Operating System
Linux (issue is environment-independent; observed on RHEL 9.5 / CentOS 7
class hosts and Kubernetes Pod-based dest clusters).
Anything else
Workaround (only covers part of the problem): if the schema has no
DISTRIBUTED REPLICATEDtables, the user can set--on-segment-threshold 0and--exclude-tablefor any replicated tables,keeping
CopyOnMasterfrom being hit. This is verified to work but does notcover schemas with replicated tables, which are forced into
CopyOnMasterunconditionally.
I have a fix design ready (minimal: adds the
pullbranch toCopyOnMaster.CopyTo/CopyFromsymmetric toCopyOnSegment, and drops the|| op.command.IsMasterCopy()special case incopy_operation.go). Happyto submit it as a PR linked to this issue.
Are you willing to submit PR?
Code of Conduct