Implement multi-row INSERT batching for PreparedStatement by josecsotomorales · Pull Request #944 · databricks/databricks-jdbc

josecsotomorales · 2025-08-16T16:17:17Z

Linked issue: #867

This PR implements multi-row INSERT batching optimization for prepared statements to improve performance when executing large batches of INSERT operations. The implementation combines multiple single-row INSERT statements into fewer multi-row INSERT statements while respecting Databricks' 256 parameter limit.

Adds a new InsertStatementParser utility for parsing INSERT statements and generating multi-row equivalents
Optimizes executeBatch() and executeLargeBatch() to use multi-row INSERT when possible
Includes parameter limit-aware chunking to handle large batches that exceed the 256 parameter maximum

Impact illustration (10k rows, 5 columns, 50 ms RTT):
• Before (single-row inserts): 10,000 statements → ~500s of RTT + server planning.
• After (batched): 196 statements (10k ÷ 51) → ~9.8s of RTT.
• That’s about a 50× reduction in latency, not even counting server CPU savings.

Signed-off-by: josecsotomorales josecsmorales@gmail.com, Jayant Singh jayant.singh@databricks.com

* Add INSERT statement detection with new INSERT_PATTERN regex * Create InsertStatementParser utility for parsing INSERT statements * Enhance DatabricksPreparedStatement.executeLargeBatch() to: - Detect compatible INSERT operations in batch - Combine multiple single-row INSERTs into multi-row INSERT - Generate optimized SQL like: INSERT INTO table VALUES (?), (?), (?) - Fall back to individual execution for non-INSERT statements * Add comprehensive unit tests for all new functionality * Maintain backward compatibility and proper JDBC error semantics This addresses performance issues with Spark JDBC writes by reducing the number of database round-trips from N individual INSERTs to 1 multi-row INSERT statement.

…ERT batching Resolves issue where large batches exceeded Databricks' 256 parameter limit by implementing intelligent parameter chunking: - Add MAX_QUERY_PARAMETERS constant (256) to DatabricksJdbcConstants - Implement smart chunking logic: maxRowsPerChunk = 256 / columnCount - Automatically split large batches into optimally-sized chunks - Maintain multi-row INSERT performance benefits within parameter limits - Add comprehensive tests covering chunking scenarios and edge cases - Ensure minimum 1 row per chunk for very wide tables (>256 columns) Example: 60 rows × 5 columns = 300 parameters (exceeds limit) → Automatically chunked into: 51 rows + 9 rows (255 + 45 parameters)

Copilot

Pull Request Overview

This PR implements multi-row INSERT batching optimization for prepared statements to improve performance when executing large batches of INSERT operations. The implementation combines multiple single-row INSERT statements into fewer multi-row INSERT statements while respecting Databricks' 256 parameter limit.

Adds a new InsertStatementParser utility for parsing INSERT statements and generating multi-row equivalents
Optimizes executeBatch() and executeLargeBatch() to use multi-row INSERT when possible
Includes parameter limit-aware chunking to handle large batches that exceed the 256 parameter maximum

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
src/main/java/com/databricks/jdbc/common/util/InsertStatementParser.java	New utility class for parsing INSERT statements and generating multi-row batched versions
src/main/java/com/databricks/jdbc/common/DatabricksJdbcConstants.java	Adds INSERT pattern constant and maximum query parameters limit
src/main/java/com/databricks/jdbc/api/impl/DatabricksStatement.java	Adds `isInsertQuery()` method to detect INSERT statements
src/main/java/com/databricks/jdbc/api/impl/DatabricksPreparedStatement.java	Implements multi-row INSERT batching logic with parameter chunking
src/test/java/com/databricks/jdbc/common/util/InsertStatementParserTest.java	Comprehensive tests for INSERT statement parsing and multi-row generation
src/test/java/com/databricks/jdbc/api/impl/DatabricksStatementTest.java	Tests for INSERT statement detection
src/test/java/com/databricks/jdbc/api/impl/DatabricksPreparedStatementTest.java	Updated tests to verify multi-row batching behavior and parameter chunking

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

jayantsing-db

I have added some comments/suggestions.

… rollout - Add EnableBatchedInserts connection property for controlled rollout - Enhance Javadoc documentation with detailed INSERT compatibility examples - Replace null returns with specific DatabricksParsingException for better debugging - Eliminate redundant INSERT pattern validation for improved performance - Consolidate parsing logic to reduce code duplication - Add comprehensive input validation with clear error messages

josecsotomorales · 2025-08-29T01:48:17Z

@jayantsing-db I've committed new changes to address feedback. Can you please review again?

jayantsing-db

Thanks for the changes. I request that the default value for the feature remains set to 0 for now to avoid any accidental disruptions. Apart from that, just a few minor comments. Please feel free to merge once those are addressed.

jayantsing-db · 2025-09-04T07:39:27Z

+    if (!INSERT_PATTERN.matcher(trimmedSql).find()) {
+      throw new DatabricksParsingException(
+          "SQL statement is not an INSERT operation: " + trimmedSql,
+          DatabricksDriverErrorCode.INPUT_VALIDATION_ERROR);
+    }
+
+    // Then extract detailed information using our specific pattern
+    Matcher matcher = INSERT_DETAILS_PATTERN.matcher(trimmedSql);


Opinion: Consider reusing the matcher object.

jayantsing-db · 2025-09-04T09:34:03Z

gentle reminder (you maybe already aware): request to sign-off the final commit to main. For more info, please take a look at https://github.com/databricks/databricks-jdbc/blob/main/CONTRIBUTING.md

- Changed ENABLE_BATCHED_INSERTS default value from "1" to "0" in DatabricksJdbcUrlParams - Updated batch statement tests to explicitly enable EnableBatchedInserts=1 for proper testing - Added lenient mocking to prevent unnecessary stubbing exceptions in test cases - This ensures batched inserts are disabled by default while maintaining test coverage Signed-off-by: josecsotomorales <josecsmorales@gmail.com>

josecsotomorales · 2025-09-04T16:06:04Z

gentle reminder (you maybe already aware): request to sign-off the final commit to main. For more info, please take a look at https://github.com/databricks/databricks-jdbc/blob/main/CONTRIBUTING.md

@jayantsing-db, I've addressed all the requested changes and signed off on my commit. Even though the PR is approved, I'm unable to merge it due to the lack of permissions. Could you please merge it?

jayantsing-db · 2026-01-08T13:06:00Z

Hey @josecsotomorales, I just came across this post: https://qualytics.ai/blog/qualytics-databricks-partnership/. Curious whether the integration is using the OSS JDBC driver?

josecsotomorales · 2026-01-08T16:55:12Z

Hey @josecsotomorales, I just came across this post: https://qualytics.ai/blog/qualytics-databricks-partnership/. Curious whether the integration is using the OSS JDBC driver?

Hi @jayantsing-db, Yep! — We support two modes today.

Standard Connector: uses the Databricks JDBC driver for broad compatibility across environments. Thanks again for accepting our contributions — that helped a ton on our side! 🚀

Unity Catalog Mode: more Spark-native. We do direct Spark reads against Unity Catalog–managed tables, which avoids JDBC, integrates cleanly with UC permissions, and performs better at scale.

jayantsing-db · 2026-01-08T17:06:22Z

Great, thanks and congratulations on the launch!

josecsotomorales added 2 commits August 14, 2025 21:59

josecsotomorales mentioned this pull request Aug 16, 2025

[FEATURE] Implement multi-row INSERT batching for PreparedStatement #867

Closed

josecsotomorales added 3 commits August 18, 2025 13:43

Merge branch 'main' into feature/multi-row-insert-batching

3ec6d3d

Merge branch 'main' into feature/multi-row-insert-batching

37eff2d

Merge branch 'main' into feature/multi-row-insert-batching

b29bb6e

jayantsing-db self-requested a review August 21, 2025 18:44

jayantsing-db self-assigned this Aug 21, 2025

jayantsing-db requested a review from Copilot August 21, 2025 18:44

Copilot AI reviewed Aug 21, 2025

View reviewed changes

Comment thread src/main/java/com/databricks/jdbc/common/util/InsertStatementParser.java

jayantsing-db reviewed Aug 25, 2025

View reviewed changes

josecsotomorales added 2 commits August 27, 2025 12:26

Merge branch 'main' into feature/multi-row-insert-batching

989040c

Merge branch 'main' into feature/multi-row-insert-batching

758fc9e

josecsotomorales requested a review from jayantsing-db September 2, 2025 14:15

jayantsing-db approved these changes Sep 4, 2025

View reviewed changes

josecsotomorales added 2 commits September 4, 2025 10:04

Merge branch 'main' into feature/multi-row-insert-batching

b5d0903

josecsotomorales requested a review from jayantsing-db September 4, 2025 16:08

jayantsing-db added 2 commits September 4, 2025 22:14

Make java 11 compatible

3ac155f

Update next changelog

891478d

jayantsing-db enabled auto-merge (squash) September 4, 2025 16:50

jayantsing-db assigned josecsotomorales Sep 4, 2025

jayantsing-db approved these changes Sep 4, 2025

View reviewed changes

jayantsing-db merged commit df447ec into databricks:main Sep 4, 2025
12 of 13 checks passed

Conversation

josecsotomorales commented Aug 16, 2025 • edited by jayantsing-db Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

jayantsing-db left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

josecsotomorales commented Aug 29, 2025

Uh oh!

jayantsing-db left a comment

Choose a reason for hiding this comment

Uh oh!

jayantsing-db Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jayantsing-db commented Sep 4, 2025

Uh oh!

josecsotomorales commented Sep 4, 2025

Uh oh!

Uh oh!

jayantsing-db commented Jan 8, 2026

Uh oh!

josecsotomorales commented Jan 8, 2026

Uh oh!

jayantsing-db commented Jan 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

josecsotomorales commented Aug 16, 2025 •

edited by jayantsing-db

Loading