Skip to content

Fix RDS S3 sub-pipeline folder depth calculation when partition_prefix is absent.#6859

Merged
divbok merged 1 commit into
opensearch-project:mainfrom
divbok:rdsdepthfix
May 19, 2026
Merged

Fix RDS S3 sub-pipeline folder depth calculation when partition_prefix is absent.#6859
divbok merged 1 commit into
opensearch-project:mainfrom
divbok:rdsdepthfix

Conversation

@divbok
Copy link
Copy Markdown
Collaborator

@divbok divbok commented May 13, 2026

Description

Fix RDS S3 sub-pipeline folder depth calculation when partition_prefix is absent.

calculateDepthForRdsSource uses a hardcoded base depth of 3, assuming the S3 buffer path always contains <partition_prefix>/buffer/<hash>. When SOURCE_COORDINATION_PIPELINE_IDENTIFIER is not set, the path only has buffer/<hash> (2 segments), causing the depth filter in S3ScanPartitionCreationSupplier.getPrefixWithDepth() to reject all objects — resulting in silent data loss.

This fix checks the env var and uses base depth 2 when absent, consistent with the logic already used in getIncludePrefixForRdsSource.

Issues Resolved

Resolves #6754

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Copy Markdown
Collaborator

@oeyh oeyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I'm good with the fix. Can we add a test?

return Integer.toString(getDepth(s3Prefix, 3));
String envSourceCoordinationIdentifier = System.getenv(SOURCE_COORDINATION_IDENTIFIER_ENVIRONMENT_VARIABLE);
int baseDepth = envSourceCoordinationIdentifier != null ? 3 : 2;
return Integer.toString(getDepth(s3Prefix, baseDepth));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on ading a test here, otherwise looks good

Copy link
Copy Markdown
Collaborator

@oeyh oeyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding tests! Some minor things:

Signed-off-by: Divyansh Bokadia <dbokadia@amazon.com>
@divbok divbok merged commit 9736e10 into opensearch-project:main May 19, 2026
72 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] rds s3 sub-pipeline buffer folder scan doesn't calculate depth correctly when s3_prefix is set but partition_prefix is not

3 participants