feat(Spanner): implement Standardize shard config in live migration.#3934
feat(Spanner): implement Standardize shard config in live migration.#3934pratickchokhani wants to merge 1 commit into
Conversation
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (12.19%) is below the target coverage (80.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #3934 +/- ##
=============================================
+ Coverage 38.10% 62.02% +23.92%
- Complexity 512 3102 +2590
=============================================
Files 195 543 +348
Lines 11745 32772 +21027
Branches 1173 3619 +2446
=============================================
+ Hits 4475 20327 +15852
- Misses 6908 11377 +4469
- Partials 362 1068 +706
🚀 New features to boost your workflow:
|
5a96db2 to
6ca19c2
Compare
6ca19c2 to
f855b09
Compare
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request standardizes the shard configuration mechanism for live migrations to Spanner. By transitioning from a legacy JSON-based sharding context file to a more flexible configuration format and renaming the associated pipeline parameter, the changes improve maintainability and align the configuration structure with broader project standards. The update includes necessary refactoring of the pipeline ingestion logic, Terraform infrastructure definitions, and comprehensive updates to existing integration tests. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request replaces the "shardingContextFilePath" parameter with "sourceConfigURL" across the "datastream-to-spanner" template, migrating the sharding configuration from a JSON map to a HOCON/JSON list format ("shardConfigs"). The feedback recommends adding defensive null checks: first, to ensure "shardConfigs" is not null in "SourceConfigParser" before sorting to prevent a NullPointerException, and second, to verify that "datastreamSourceType" is specified when "sourceConfigURL" is provided to avoid startup failures in retry modes.
| JdbcShardConfig jdbcShardConfig = mapper.convertValue(jdbcConfigMap, JdbcShardConfig.class); | ||
| // Returns ordered list of shards | ||
| jdbcShardConfig.getShardConfigs().sort(Comparator.comparing(Shard::getLogicalShardId)); | ||
| resolveShardSecret(jdbcShardConfig, sourceConfigFilePath); | ||
| if (resolveSecrets) { | ||
| resolveShardSecret(jdbcShardConfig, sourceConfigFilePath); | ||
| } | ||
| return jdbcShardConfig; |
There was a problem hiding this comment.
If the configuration file is missing the shardConfigs field, jdbcShardConfig.getShardConfigs() will return null. Calling .sort(...) on it directly will throw a NullPointerException before the resolveShardSecret validation check is ever reached. We should check if getShardConfigs() is null, log the error, and throw a descriptive IllegalArgumentException first. Validating this upstream prevents the need for redundant null checks in downstream transforms.
JdbcShardConfig jdbcShardConfig = mapper.convertValue(jdbcConfigMap, JdbcShardConfig.class);
if (jdbcShardConfig.getShardConfigs() == null) {
LOG.error("The configuration file is missing the 'shardConfigs' field.");
throw new IllegalArgumentException(
"The configuration file is missing the 'shardConfigs' field.");
}
// Returns ordered list of shards
jdbcShardConfig.getShardConfigs().sort(Comparator.comparing(Shard::getLogicalShardId));
if (resolveSecrets) {
resolveShardSecret(jdbcShardConfig, sourceConfigFilePath);
}
return jdbcShardConfig;References
- When throwing exceptions in environments where the caller or global exception handler might not log them (such as certain Dataflow templates), log the error message before throwing the exception to ensure the failure is recorded.
- Ensure that mandatory parameters are validated for nullability at the entry point or in upstream components to prevent NullPointerExceptions (especially during auto-unboxing) in downstream transforms, avoiding the need for redundant null checks.
| if (options.getSourceConfigURL() != null && !options.getSourceConfigURL().isEmpty()) { | ||
| try { | ||
| SourceConfigParser parser = new SourceConfigParser(new SecretManagerAccessorImpl()); | ||
| SourceConnectionConfig sourceConfig = | ||
| parser.parseConfiguration( | ||
| options.getDatastreamSourceType(), | ||
| options.getSourceConfigURL(), | ||
| /* resolveSecrets= */ false); |
There was a problem hiding this comment.
In retry modes (such as retryDLQ), validateSourceType returns early without setting datastreamSourceType, which means options.getDatastreamSourceType() can be null. Passing a null source type to parser.parseConfiguration will cause a NullPointerException or IllegalArgumentException during startup. We should add a defensive check to ensure datastreamSourceType is provided when sourceConfigURL is used, and log the error message before throwing the exception to ensure it is recorded in the Dataflow environment.
if (options.getSourceConfigURL() != null && !options.getSourceConfigURL().isEmpty()) {
try {
String sourceType = options.getDatastreamSourceType();
if (sourceType == null || sourceType.isEmpty()) {
LOG.error("datastreamSourceType must be specified when sourceConfigURL is provided.");
throw new IllegalArgumentException(
"datastreamSourceType must be specified when sourceConfigURL is provided.");
}
SourceConfigParser parser = new SourceConfigParser(new SecretManagerAccessorImpl());
SourceConnectionConfig sourceConfig =
parser.parseConfiguration(
sourceType,
options.getSourceConfigURL(),
/* resolveSecrets= */ false);References
- When throwing exceptions in environments where the caller or global exception handler might not log them (such as certain Dataflow templates), log the error message before throwing the exception to ensure the failure is recorded.
No description provided.