Skip to content

feat(Spanner): implement Standardize shard config in live migration.#3934

Open
pratickchokhani wants to merge 1 commit into
GoogleCloudPlatform:mainfrom
pratickchokhani:shard-config-live
Open

feat(Spanner): implement Standardize shard config in live migration.#3934
pratickchokhani wants to merge 1 commit into
GoogleCloudPlatform:mainfrom
pratickchokhani:shard-config-live

Conversation

@pratickchokhani

Copy link
Copy Markdown
Contributor

No description provided.

@codecov

codecov Bot commented Jun 22, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 12.19512% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 62.02%. Comparing base (3d09e7b) to head (f855b09).

Files with missing lines Patch % Lines
...oud/teleport/v2/templates/DataStreamToSpanner.java 5.00% 18 Missing and 1 partial ⚠️
...ud/teleport/v2/spanner/migrations/shard/Shard.java 11.11% 15 Missing and 1 partial ⚠️
...r/migrations/source/config/SourceConfigParser.java 66.66% 0 Missing and 1 partial ⚠️

❌ Your patch check has failed because the patch coverage (12.19%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@              Coverage Diff              @@
##               main    #3934       +/-   ##
=============================================
+ Coverage     38.10%   62.02%   +23.92%     
- Complexity      512     3102     +2590     
=============================================
  Files           195      543      +348     
  Lines         11745    32772    +21027     
  Branches       1173     3619     +2446     
=============================================
+ Hits           4475    20327    +15852     
- Misses         6908    11377     +4469     
- Partials        362     1068      +706     
Components Coverage Δ
spanner-templates 88.02% <12.19%> (∅)
spanner-import-export ∅ <ø> (∅)
spanner-live-forward-migration 89.63% <12.19%> (∅)
spanner-live-reverse-replication 83.11% <19.04%> (∅)
spanner-bulk-migration 92.40% <19.04%> (∅)
gcs-spanner-dv 90.05% <19.04%> (∅)
Files with missing lines Coverage Δ
...r/migrations/source/config/SourceConfigParser.java 85.36% <66.66%> (ø)
...ud/teleport/v2/spanner/migrations/shard/Shard.java 70.58% <11.11%> (ø)
...oud/teleport/v2/templates/DataStreamToSpanner.java 88.21% <5.00%> (ø)

... and 400 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@pratickchokhani pratickchokhani added the addition New feature or request label Jun 24, 2026
@pratickchokhani pratickchokhani force-pushed the shard-config-live branch 3 times, most recently from 5a96db2 to 6ca19c2 Compare June 26, 2026 05:14
@pratickchokhani pratickchokhani marked this pull request as ready for review June 26, 2026 05:28
@pratickchokhani pratickchokhani requested a review from a team as a code owner June 26, 2026 05:28
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request standardizes the shard configuration mechanism for live migrations to Spanner. By transitioning from a legacy JSON-based sharding context file to a more flexible configuration format and renaming the associated pipeline parameter, the changes improve maintainability and align the configuration structure with broader project standards. The update includes necessary refactoring of the pipeline ingestion logic, Terraform infrastructure definitions, and comprehensive updates to existing integration tests.

Highlights

  • Configuration Parameter Rename: Renamed the configuration parameter 'shardingContextFilePath' to 'sourceConfigURL' across the codebase, documentation, and Terraform templates to better reflect its purpose.
  • Shard Configuration Schema Update: Updated the shard configuration format from a nested JSON map to a flat list of shard objects, improving readability and extensibility.
  • Pipeline Logic Refactoring: Updated the DataStreamToSpanner pipeline to parse the new 'sourceConfigURL' format directly, removing the dependency on the deprecated 'ShardingContextReader'.
  • Testing Improvements: Updated all integration tests to use the new configuration format and file naming conventions, ensuring consistency with the production changes.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request replaces the "shardingContextFilePath" parameter with "sourceConfigURL" across the "datastream-to-spanner" template, migrating the sharding configuration from a JSON map to a HOCON/JSON list format ("shardConfigs"). The feedback recommends adding defensive null checks: first, to ensure "shardConfigs" is not null in "SourceConfigParser" before sorting to prevent a NullPointerException, and second, to verify that "datastreamSourceType" is specified when "sourceConfigURL" is provided to avoid startup failures in retry modes.

Comment on lines 99 to 105
JdbcShardConfig jdbcShardConfig = mapper.convertValue(jdbcConfigMap, JdbcShardConfig.class);
// Returns ordered list of shards
jdbcShardConfig.getShardConfigs().sort(Comparator.comparing(Shard::getLogicalShardId));
resolveShardSecret(jdbcShardConfig, sourceConfigFilePath);
if (resolveSecrets) {
resolveShardSecret(jdbcShardConfig, sourceConfigFilePath);
}
return jdbcShardConfig;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If the configuration file is missing the shardConfigs field, jdbcShardConfig.getShardConfigs() will return null. Calling .sort(...) on it directly will throw a NullPointerException before the resolveShardSecret validation check is ever reached. We should check if getShardConfigs() is null, log the error, and throw a descriptive IllegalArgumentException first. Validating this upstream prevents the need for redundant null checks in downstream transforms.

        JdbcShardConfig jdbcShardConfig = mapper.convertValue(jdbcConfigMap, JdbcShardConfig.class);
        if (jdbcShardConfig.getShardConfigs() == null) {
          LOG.error("The configuration file is missing the 'shardConfigs' field.");
          throw new IllegalArgumentException(
              "The configuration file is missing the 'shardConfigs' field.");
        }
        // Returns ordered list of shards
        jdbcShardConfig.getShardConfigs().sort(Comparator.comparing(Shard::getLogicalShardId));
        if (resolveSecrets) {
          resolveShardSecret(jdbcShardConfig, sourceConfigFilePath);
        }
        return jdbcShardConfig;
References
  1. When throwing exceptions in environments where the caller or global exception handler might not log them (such as certain Dataflow templates), log the error message before throwing the exception to ensure the failure is recorded.
  2. Ensure that mandatory parameters are validated for nullability at the entry point or in upstream components to prevent NullPointerExceptions (especially during auto-unboxing) in downstream transforms, avoiding the need for redundant null checks.

Comment on lines +823 to +830
if (options.getSourceConfigURL() != null && !options.getSourceConfigURL().isEmpty()) {
try {
SourceConfigParser parser = new SourceConfigParser(new SecretManagerAccessorImpl());
SourceConnectionConfig sourceConfig =
parser.parseConfiguration(
options.getDatastreamSourceType(),
options.getSourceConfigURL(),
/* resolveSecrets= */ false);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

In retry modes (such as retryDLQ), validateSourceType returns early without setting datastreamSourceType, which means options.getDatastreamSourceType() can be null. Passing a null source type to parser.parseConfiguration will cause a NullPointerException or IllegalArgumentException during startup. We should add a defensive check to ensure datastreamSourceType is provided when sourceConfigURL is used, and log the error message before throwing the exception to ensure it is recorded in the Dataflow environment.

    if (options.getSourceConfigURL() != null && !options.getSourceConfigURL().isEmpty()) {
      try {
        String sourceType = options.getDatastreamSourceType();
        if (sourceType == null || sourceType.isEmpty()) {
          LOG.error("datastreamSourceType must be specified when sourceConfigURL is provided.");
          throw new IllegalArgumentException(
              "datastreamSourceType must be specified when sourceConfigURL is provided.");
        }
        SourceConfigParser parser = new SourceConfigParser(new SecretManagerAccessorImpl());
        SourceConnectionConfig sourceConfig =
            parser.parseConfiguration(
                sourceType,
                options.getSourceConfigURL(),
                /* resolveSecrets= */ false);
References
  1. When throwing exceptions in environments where the caller or global exception handler might not log them (such as certain Dataflow templates), log the error message before throwing the exception to ensure the failure is recorded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

addition New feature or request size/L

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant