Skip to content

Commit 5dc1e5a

Browse files
authored
[PECOBLR-1121] Arrow patch to circumvent Arrow issues with JDk 16+ (databricks#1243)
Databricks server shares query results in Arrow format for easy cross language functionality. The JDBC driver experiences compatibility issues with JDK 16 and later versions when processing Arrow results. This problem arises from stricter encapsulation of internal APIs in newer Java versions, which affects the driver's use of the Apache Arrow result format consumption with the Apache Arrow library. The JDBC driver is used in partner solutions, where they do not have control of the runtime environment, and the workaround of setting JVM arguments is not feasible. This PR patches some of the Arrow code to provide alternative JVM Heap based byte allocators that do not use native `MemoryUtil` based direct reads from off-heap memory. This implementation uses the native Arrow code path if feasible, else falls back to the patched code. All the code has been tested for read compatibility with all Arrow types, latency benchmarks have been tested, and automated tests have been added as well. During the course of this change it became necessary to also convert the project into a multi-module maven project
1 parent 00ceea5 commit 5dc1e5a

62 files changed

Lines changed: 11340 additions & 920 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/bugCatcher.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ jobs:
3030
restore-keys: ${{ runner.os }}-m2
3131

3232
- name: Run Integration Tests
33-
run: mvn -B test -Dtest=*e2e/OAuthTests*
33+
run: mvn -pl jdbc-core -B test -Dtest='*e2e/OAuthTests*'
3434
env:
3535
DATABRICKS_HOST: ${{ secrets.JDBC_PAT_TEST_HOST_NAME }}
3636
DATABRICKS_HTTP_PATH: ${{ secrets.JDBC_PAT_TEST_HTTP_PATH }}

.github/workflows/concurrencyExecutionTests.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ jobs:
4848
restore-keys: ${{ runner.os }}-m2
4949

5050
- name: Run Concurrency Execution Tests
51-
run: mvn -B test -Dtest=com.databricks.jdbc.integration.e2e.ConcurrentExecutionTests -DargLine="-ea"
51+
run: mvn -pl jdbc-core -B test -Dtest=com.databricks.jdbc.integration.e2e.ConcurrentExecutionTests -DargLine="-ea"
5252
env:
5353
DATABRICKS_TOKEN: ${{ secrets.JDBC_PAT_TEST_TOKEN }}
5454
DATABRICKS_USER: ${{ secrets.DATABRICKS_USER }}

.github/workflows/coverageReport.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,7 @@ jobs:
3434
${{ runner.os }}-m2-
3535
3636
- name: Run tests with coverage
37-
run: mvn clean test jacoco:report
37+
run: mvn -pl jdbc-core clean test -Dgroups='!Jvm17PlusAndArrowToNioReflectionDisabled' jacoco:report
3838

3939
- name: Check for coverage override
4040
id: override
@@ -53,7 +53,7 @@ jobs:
5353
- name: Check coverage percentage
5454
if: steps.override.outputs.override == 'false'
5555
run: |
56-
COVERAGE_FILE="target/site/jacoco/jacoco.xml"
56+
COVERAGE_FILE="jdbc-core/target/site/jacoco/jacoco.xml"
5757
if [ ! -f "$COVERAGE_FILE" ]; then
5858
echo "ERROR: Coverage file not found at $COVERAGE_FILE"
5959
exit 1

.github/workflows/loggingTesting.yml

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -45,13 +45,13 @@ jobs:
4545
- name: Find JAR file
4646
shell: bash
4747
run: |
48-
# Find the main JAR file dynamically (fat JAR, not thin, not tests)
49-
MAIN_JAR=$(find target -maxdepth 1 -name "databricks-jdbc-*.jar" \
48+
# Find the main JAR file dynamically (uber JAR from assembly-uber module)
49+
MAIN_JAR=$(find assembly-uber/target -maxdepth 1 -name "databricks-jdbc-*.jar" \
5050
-not -name "*-thin.jar" \
5151
-not -name "*-tests.jar" | head -1)
5252
if [ -z "$MAIN_JAR" ]; then
53-
echo "ERROR: Could not find main JAR file in target directory"
54-
ls -la target/
53+
echo "ERROR: Could not find main JAR file in assembly-uber/target directory"
54+
ls -la assembly-uber/target/
5555
exit 1
5656
fi
5757
echo "Using JAR file: $MAIN_JAR"
@@ -88,18 +88,18 @@ jobs:
8888
- name: Clean & Compile LoggingTest
8989
shell: bash
9090
run: |
91-
rm -rf target/test-classes
92-
mkdir -p target/test-classes
91+
rm -rf jdbc-core/target/test-classes
92+
mkdir -p jdbc-core/target/test-classes
9393
9494
echo "Using JAR file: $MAIN_JAR"
9595
9696
javac \
9797
-cp "$MAIN_JAR" \
98-
-d target/test-classes \
99-
src/test/java/com/databricks/client/jdbc/LoggingTest.java
98+
-d jdbc-core/target/test-classes \
99+
jdbc-core/src/test/java/com/databricks/client/jdbc/LoggingTest.java
100100
101101
echo "==== Checking compiled classes ===="
102-
find target/test-classes -type f
102+
find jdbc-core/target/test-classes -type f
103103
104104
- name: Run LoggingTest
105105
shell: bash
@@ -110,7 +110,7 @@ jobs:
110110
echo "Using classpath separator: '$SEP'"
111111
echo "Using JAR file: $MAIN_JAR"
112112
113-
CP="target/test-classes${SEP}$MAIN_JAR"
113+
CP="jdbc-core/target/test-classes${SEP}$MAIN_JAR"
114114
115115
java \
116116
--add-opens=java.base/java.nio=ALL-UNNAMED \

.github/workflows/prCheck.yml

Lines changed: 112 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -76,21 +76,51 @@ jobs:
7676
key: ${{ runner.os }}-m2-${{ hashFiles('**/pom.xml') }}
7777
restore-keys: ${{ runner.os }}-m2
7878

79+
- name: Set up Maven Toolchains
80+
shell: bash
81+
run: |
82+
mkdir -p ~/.m2
83+
cat > ~/.m2/toolchains.xml <<EOF
84+
<?xml version="1.0" encoding="UTF-8"?>
85+
<toolchains>
86+
<toolchain>
87+
<type>jdk</type>
88+
<provides>
89+
<version>${{ matrix.java-version }}</version>
90+
</provides>
91+
<configuration>
92+
<jdkHome>$JAVA_HOME</jdkHome>
93+
</configuration>
94+
</toolchain>
95+
</toolchains>
96+
EOF
97+
98+
- name: Check Arrow Patch Tests
99+
shell: bash
100+
if: matrix.java-version >= 17
101+
run: mvn -Pjdk${{ matrix.java-version }}-NioNotOpen -pl jdbc-core test -Dgroups='Jvm17PlusAndArrowToNioReflectionDisabled'
102+
103+
- name: Check Arrow Allocator Manager Tests
104+
shell: bash
105+
if: matrix.java-version >= 17
106+
run: mvn -Pjdk${{ matrix.java-version }}-NioNotOpen -pl jdbc-core test -Dgroups='Jvm17PlusAndArrowToNioReflectionDisabled' -Dtest="ArrowBufferAllocatorNettyManagerTest,ArrowBufferAllocatorUnsafeManagerTest,ArrowBufferAllocatorUnknownManagerTest" -DforkCount=1 -DreuseForks=false
107+
108+
- name: Check Arrow Memory Tests
109+
shell: bash
110+
run: mvn -Plow-memory -pl jdbc-core test -Dtest='DatabricksArrowPatchMemoryUsageTest'
111+
79112
- name: Check Unit Tests
80113
shell: bash
81-
run: mvn test -Dtest='!**/integration/**,!**/DatabricksDriverExamples.java,!**/ProxyTest.java,!**/LoggingTest.java,!**/SSLTest.java'
114+
run: mvn -pl jdbc-core clean test -Dgroups='!Jvm17PlusAndArrowToNioReflectionDisabled' jacoco:report
82115

83116
- name: Install xmllint
84117
if: runner.os == 'Linux'
85118
run: sudo apt-get update && sudo apt-get install -y libxml2-utils
86119

87-
- name: JaCoCo report
88-
run: mvn --batch-mode --errors jacoco:report --file pom.xml
89-
90120
- name: Extract codeCov percentage
91121
shell: bash
92122
run: |
93-
COVERAGE_FILE="target/site/jacoco/jacoco.xml"
123+
COVERAGE_FILE="jdbc-core/target/site/jacoco/jacoco.xml"
94124
COVERED=$(xmllint --xpath "string(//report/counter[@type='INSTRUCTION']/@covered)" "$COVERAGE_FILE")
95125
MISSED=$(xmllint --xpath "string(//report/counter[@type='INSTRUCTION']/@missed)" "$COVERAGE_FILE")
96126
TOTAL=$((COVERED + MISSED))
@@ -114,4 +144,80 @@ jobs:
114144
exit 1
115145
else
116146
echo "Coverage is equal to or greater than 85%"
117-
fi
147+
fi
148+
149+
packaging-tests:
150+
strategy:
151+
fail-fast: false
152+
matrix:
153+
java-version: [ 17 ]
154+
github-runner: [ linux-ubuntu-latest, windows-server-latest ]
155+
156+
runs-on:
157+
group: databricks-protected-runner-group
158+
labels: ${{ matrix.github-runner }}
159+
160+
steps:
161+
- name: Set up JDK ${{ matrix.java-version }}
162+
uses: actions/setup-java@v4
163+
with:
164+
java-version: ${{ matrix.java-version }}
165+
distribution: 'adopt'
166+
167+
- name: Enable long paths
168+
if: runner.os == 'Windows'
169+
run: git config --system core.longpaths true
170+
171+
- name: Checkout
172+
uses: actions/checkout@v4
173+
with:
174+
ref: ${{ github.event.pull_request.head.ref || github.ref_name }}
175+
repository: ${{ github.event.pull_request.head.repo.full_name || github.repository }}
176+
177+
- name: Cache Maven packages
178+
uses: actions/cache@v4
179+
with:
180+
path: ~/.m2
181+
key: ${{ runner.os }}-m2-${{ hashFiles('**/pom.xml') }}
182+
restore-keys: ${{ runner.os }}-m2
183+
184+
- name: Set up Maven Toolchains
185+
shell: bash
186+
run: |
187+
mkdir -p ~/.m2
188+
cat > ~/.m2/toolchains.xml <<EOF
189+
<?xml version="1.0" encoding="UTF-8"?>
190+
<toolchains>
191+
<toolchain>
192+
<type>jdk</type>
193+
<provides>
194+
<version>${{ matrix.java-version }}</version>
195+
</provides>
196+
<configuration>
197+
<jdkHome>$JAVA_HOME</jdkHome>
198+
</configuration>
199+
</toolchain>
200+
</toolchains>
201+
EOF
202+
203+
- name: Install JDBC artifacts into maven local
204+
shell: bash
205+
run: mvn -B -pl jdbc-core,assembly-uber,assembly-thin install -DskipTests -Dmaven.javadoc.skip=true -Dmaven.source.skip=true -Ddependency-check.skip=true
206+
207+
- name: Check Uber Jar Packaging
208+
shell: bash
209+
run: mvn -pl test-assembly-uber test
210+
env:
211+
DATABRICKS_HOST: ${{ secrets.JDBC_PAT_TEST_HOST_NAME }}
212+
DATABRICKS_HTTP_PATH: ${{ secrets.JDBC_PAT_TEST_HTTP_PATH }}
213+
DATABRICKS_USER: ${{ secrets.DATABRICKS_USER }}
214+
DATABRICKS_TOKEN: ${{ secrets.JDBC_PAT_TEST_TOKEN }}
215+
216+
- name: Check Thin Jar Packaging
217+
shell: bash
218+
run: mvn -pl test-assembly-thin test
219+
env:
220+
DATABRICKS_HOST: ${{ secrets.JDBC_PAT_TEST_HOST_NAME }}
221+
DATABRICKS_HTTP_PATH: ${{ secrets.JDBC_PAT_TEST_HTTP_PATH }}
222+
DATABRICKS_USER: ${{ secrets.DATABRICKS_USER }}
223+
DATABRICKS_TOKEN: ${{ secrets.JDBC_PAT_TEST_TOKEN }}

.github/workflows/prCheckJDK8.yml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,4 +48,8 @@ jobs:
4848
restore-keys: ${{ runner.os }}-jdk8-m2
4949

5050
- name: Run Unit Tests
51-
run: mvn clean test
51+
run: mvn -pl jdbc-core clean test -Dgroups='!Jvm17PlusAndArrowToNioReflectionDisabled'
52+
53+
- name: Check Arrow Memory Tests
54+
shell: bash
55+
run: mvn -Plow-memory -pl jdbc-core test -Dtest='DatabricksArrowPatchMemoryUsageTest'

.github/workflows/prIntegrationTests.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,10 @@ jobs:
1616
include:
1717
# SQL_EXEC mode: Tests SEA client behavior
1818
# Note: CircuitBreakerIntegrationTests requires THRIFT_SERVER mode (tested in second matrix entry)
19-
- test-command: mvn -B compile test -Dtest=*IntegrationTests,!M2MPrivateKeyCredentialsIntegrationTests,!M2MAuthIntegrationTests,!CircuitBreakerIntegrationTests,!ThriftCloudFetchFakeIntegrationTests
19+
- test-command: mvn -pl jdbc-core -B compile test -Dtest=*IntegrationTests,!M2MPrivateKeyCredentialsIntegrationTests,!M2MAuthIntegrationTests,!CircuitBreakerIntegrationTests,!ThriftCloudFetchFakeIntegrationTests
2020
fake-service-type: 'SQL_EXEC'
2121
# THRIFT_SERVER mode: Tests Thrift client behavior and circuit breaker fallback
22-
- test-command: mvn -B compile test -Dtest=*IntegrationTests,!M2MPrivateKeyCredentialsIntegrationTests,!SqlExecApiHybridResultsIntegrationTests,!DBFSVolumeIntegrationTests,!M2MAuthIntegrationTests,!UCVolumeIntegrationTests,!SqlExecApiIntegrationTests
22+
- test-command: mvn -pl jdbc-core -B compile test -Dtest=*IntegrationTests,!M2MPrivateKeyCredentialsIntegrationTests,!SqlExecApiHybridResultsIntegrationTests,!DBFSVolumeIntegrationTests,!M2MAuthIntegrationTests,!UCVolumeIntegrationTests,!SqlExecApiIntegrationTests
2323
fake-service-type: 'THRIFT_SERVER'
2424
steps:
2525
- name: Checkout PR

.github/workflows/proxyTesting.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,7 +157,7 @@ jobs:
157157
################################################################
158158
- name: Run ProxyTest
159159
run: |
160-
mvn test -Dtest=**/ProxyTest.java
160+
mvn -pl jdbc-core test -Dtest=**/ProxyTest.java
161161
162162
################################################################
163163
# 14) Cleanup

.github/workflows/release-thin.yml

Lines changed: 21 additions & 96 deletions
Original file line numberDiff line numberDiff line change
@@ -31,111 +31,37 @@ jobs:
3131

3232
- name: Set up Java for publishing to Maven Central Repository
3333
uses: actions/setup-java@v4
34+
env:
35+
GPG_PASSPHRASE: ${{ secrets.GPG_PASSPHRASE }}
3436
with:
3537
java-version: 11
36-
distribution: "adopt"
3738
server-id: central
39+
distribution: "adopt"
3840
server-username: MAVEN_CENTRAL_USERNAME
3941
server-password: MAVEN_CENTRAL_PASSWORD
4042
gpg-private-key: ${{ secrets.GPG_PRIVATE_KEY }}
4143
gpg-passphrase: GPG_PASSPHRASE
4244

43-
- name: Configure GPG
45+
# Step 1: Build and install dependencies to local Maven repository
46+
# This builds jdbc-core (and parent) without publishing them.
47+
# The -am flag builds all dependencies needed by assembly-thin.
48+
# We use -Prelease here to generate sources/javadoc JARs for jdbc-core,
49+
# which assembly-thin needs for its own sources/javadoc artifacts.
50+
# GPG signing is skipped since we're only installing locally, not publishing.
51+
- name: Build dependencies
4452
run: |
45-
echo "allow-loopback-pinentry" >> ~/.gnupg/gpg-agent.conf
46-
echo "pinentry-mode loopback" >> ~/.gnupg/gpg.conf
47-
gpg-connect-agent reloadagent /bye
53+
mvn -Prelease clean install --batch-mode -pl jdbc-core -am -Dgpg.skip=true
4854
49-
- name: Build thin JAR with sources and javadocs
55+
# Step 2: Deploy only the thin JAR module to Maven Central
56+
# We don't use -am here to avoid the central-publishing-maven-plugin
57+
# from collecting parent/jdbc-core artifacts into the deployment bundle.
58+
# The jdbc-core dependency is already available from Step 1.
59+
- name: Publish thin JAR to Maven Central
5060
run: |
51-
# Build main artifacts including sources and javadocs
52-
mvn -B -DskipTests package source:jar javadoc:jar
53-
54-
- name: Sign all thin JAR artifacts
55-
run: |
56-
VERSION=$(grep -m1 '<version>' pom.xml | sed 's/.*<version>\(.*\)<\/version>.*/\1/')
57-
58-
# Sign thin JAR
59-
echo "$GPG_PASSPHRASE" | gpg --batch --yes --passphrase-fd 0 --pinentry-mode loopback \
60-
--armor --detach-sign "target/databricks-jdbc-${VERSION}-thin.jar"
61-
62-
# Sign sources JAR
63-
echo "$GPG_PASSPHRASE" | gpg --batch --yes --passphrase-fd 0 --pinentry-mode loopback \
64-
--armor --detach-sign "target/databricks-jdbc-${VERSION}-sources.jar"
65-
66-
# Sign javadoc JAR
67-
echo "$GPG_PASSPHRASE" | gpg --batch --yes --passphrase-fd 0 --pinentry-mode loopback \
68-
--armor --detach-sign "target/databricks-jdbc-${VERSION}-javadoc.jar"
69-
env:
70-
GPG_PASSPHRASE: ${{ secrets.GPG_PASSPHRASE }}
71-
72-
- name: Verify all required artifacts exist
73-
run: |
74-
VERSION=$(grep -m1 '<version>' pom.xml | sed 's/.*<version>\(.*\)<\/version>.*/\1/')
75-
test -f "target/databricks-jdbc-${VERSION}-thin.jar"
76-
test -f "target/databricks-jdbc-${VERSION}-thin.jar.asc"
77-
test -f "target/databricks-jdbc-${VERSION}-sources.jar"
78-
test -f "target/databricks-jdbc-${VERSION}-sources.jar.asc"
79-
test -f "target/databricks-jdbc-${VERSION}-javadoc.jar"
80-
test -f "target/databricks-jdbc-${VERSION}-javadoc.jar.asc"
81-
82-
- name: Publish Thin JAR as Separate Artifact to Maven Central
83-
run: |
84-
VERSION=$(grep -m1 '<version>' pom.xml | sed 's/.*<version>\(.*\)<\/version>.*/\1/')
85-
86-
echo "Creating deployment bundle for thin JAR..."
87-
88-
# Create staging directory
89-
mkdir -p target/thin-staging/com/databricks/databricks-jdbc-thin/${VERSION}
90-
91-
# Copy thin JAR and its signature
92-
cp "target/databricks-jdbc-${VERSION}-thin.jar" \
93-
target/thin-staging/com/databricks/databricks-jdbc-thin/${VERSION}/databricks-jdbc-thin-${VERSION}.jar
94-
cp "target/databricks-jdbc-${VERSION}-thin.jar.asc" \
95-
target/thin-staging/com/databricks/databricks-jdbc-thin/${VERSION}/databricks-jdbc-thin-${VERSION}.jar.asc
96-
97-
# Copy sources JAR and its signature
98-
cp "target/databricks-jdbc-${VERSION}-sources.jar" \
99-
target/thin-staging/com/databricks/databricks-jdbc-thin/${VERSION}/databricks-jdbc-thin-${VERSION}-sources.jar
100-
cp "target/databricks-jdbc-${VERSION}-sources.jar.asc" \
101-
target/thin-staging/com/databricks/databricks-jdbc-thin/${VERSION}/databricks-jdbc-thin-${VERSION}-sources.jar.asc
102-
103-
# Copy javadoc JAR and its signature
104-
cp "target/databricks-jdbc-${VERSION}-javadoc.jar" \
105-
target/thin-staging/com/databricks/databricks-jdbc-thin/${VERSION}/databricks-jdbc-thin-${VERSION}-javadoc.jar
106-
cp "target/databricks-jdbc-${VERSION}-javadoc.jar.asc" \
107-
target/thin-staging/com/databricks/databricks-jdbc-thin/${VERSION}/databricks-jdbc-thin-${VERSION}-javadoc.jar.asc
108-
109-
# Copy POM and sign it
110-
cp thin_public_pom.xml target/thin-staging/com/databricks/databricks-jdbc-thin/${VERSION}/databricks-jdbc-thin-${VERSION}.pom
111-
echo "$GPG_PASSPHRASE" | gpg --batch --yes --passphrase-fd 0 --pinentry-mode loopback \
112-
--armor --detach-sign \
113-
target/thin-staging/com/databricks/databricks-jdbc-thin/${VERSION}/databricks-jdbc-thin-${VERSION}.pom
114-
115-
# Generate checksums for all files
116-
cd target/thin-staging/com/databricks/databricks-jdbc-thin/${VERSION}
117-
for file in databricks-jdbc-thin-*; do
118-
md5sum "$file" | awk '{print $1}' > "${file}.md5"
119-
sha1sum "$file" | awk '{print $1}' > "${file}.sha1"
120-
done
121-
cd $GITHUB_WORKSPACE
122-
123-
# Create bundle ZIP
124-
cd target/thin-staging
125-
zip -r ../central-thin-bundle.zip com/
126-
cd $GITHUB_WORKSPACE
127-
128-
echo "Uploading bundle to Maven Central Portal..."
129-
130-
# Upload to new Maven Central Portal
131-
curl -X POST \
132-
-u "$MAVEN_CENTRAL_USERNAME:$MAVEN_CENTRAL_PASSWORD" \
133-
-F "bundle=@target/central-thin-bundle.zip" \
134-
-F "publishingType=AUTOMATIC" \
135-
-w "\nHTTP_STATUS:%{http_code}\n" \
136-
https://central.sonatype.com/api/v1/publisher/upload
137-
138-
echo "Thin JAR published successfully!"
61+
mvn -Prelease deploy --batch-mode -pl assembly-thin \
62+
-Dnvd.api.key=${{ secrets.NVD_API_KEY }} \
63+
-Dossindex.username=${{ secrets.OSSINDEX_USERNAME }} \
64+
-Dossindex.password=${{ secrets.OSSINDEX_PASSWORD }}
13965
env:
14066
GPG_PASSPHRASE: ${{ secrets.GPG_PASSPHRASE }}
14167
MAVEN_CENTRAL_USERNAME: ${{ secrets.MAVEN_CENTRAL_USERNAME }}
@@ -174,5 +100,4 @@ jobs:
174100
with:
175101
tag_name: ${{ steps.get_tag.outputs.tag }}
176102
files: |
177-
target/*-thin.jar
178-
103+
assembly-thin/target/databricks-jdbc-thin-*.jar

0 commit comments

Comments
 (0)