Skip to content

Commit 3b9883f

Browse files
Dandandanclaude
andcommitted
Rewrite ParquetOpener to use push-based ParquetPushDecoder
Replace the async pull-based ParquetRecordBatchStreamBuilder with arrow-rs's SansIO ParquetPushDecoder for reading Parquet files. The caller now controls IO explicitly via DecodeResult::NeedsData, pushing byte ranges to the decoder and receiving decoded batches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 5a86142 commit 3b9883f

File tree

2,168 files changed

+342630
-87405
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

2,168 files changed

+342630
-87405
lines changed

.asf.yaml

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,7 @@ github:
4141
- sql
4242
enabled_merge_buttons:
4343
squash: true
44+
squash_commit_message: PR_TITLE_AND_DESC
4445
merge: false
4546
rebase: false
4647
features:
@@ -50,11 +51,27 @@ github:
5051
main:
5152
required_pull_request_reviews:
5253
required_approving_review_count: 1
54+
# needs to be updated as part of the release process
55+
# .asf.yaml doesn't support wildcard branch protection rules, only exact branch names
56+
# https://github.com/apache/infrastructure-asfyaml?tab=readme-ov-file#branch-protection
57+
# these branches protection blocks autogenerated during release process which is described in
58+
# https://github.com/apache/datafusion/tree/main/dev/release#2-add-a-protection-to-release-candidate-branch
59+
branch-50:
60+
required_pull_request_reviews:
61+
required_approving_review_count: 1
62+
branch-51:
63+
required_pull_request_reviews:
64+
required_approving_review_count: 1
65+
branch-52:
66+
required_pull_request_reviews:
67+
required_approving_review_count: 1
5368
pull_requests:
5469
# enable updating head branches of pull requests
5570
allow_update_branch: true
71+
allow_auto_merge: true
5672

5773
# publishes the content of the `asf-site` branch to
5874
# https://datafusion.apache.org/
5975
publish:
6076
whoami: asf-site
77+

.devcontainer/Dockerfile

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,10 +4,12 @@ RUN apt-get update && export DEBIAN_FRONTEND=noninteractive \
44
# Remove imagemagick due to https://security-tracker.debian.org/tracker/CVE-2019-10131
55
&& apt-get purge -y imagemagick imagemagick-6-common
66

7-
# Add protoc
8-
# https://datafusion.apache.org/contributor-guide/getting_started.html#protoc-installation
9-
RUN curl -LO https://github.com/protocolbuffers/protobuf/releases/download/v25.1/protoc-25.1-linux-x86_64.zip \
10-
&& unzip protoc-25.1-linux-x86_64.zip -d $HOME/.local \
11-
&& rm protoc-25.1-linux-x86_64.zip
7+
# setup the containers WORKDIR so npm install works
8+
# https://stackoverflow.com/questions/57534295/npm-err-tracker-idealtree-already-exists-while-creating-the-docker-image-for
9+
WORKDIR /root
1210

13-
ENV PATH="$PATH:$HOME/.local/bin"
11+
# Add protoc, npm, prettier
12+
# https://datafusion.apache.org/contributor-guide/development_environment.html#protoc-installation
13+
RUN apt-get update \
14+
&& apt-get install -y --no-install-recommends protobuf-compiler libprotobuf-dev npm nodejs\
15+
&& rm -rf /var/lib/apt/lists/*

.github/ISSUE_TEMPLATE/bug_report.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
name: Bug report
22
description: Create a report to help us improve
3+
type: Bug
34
labels: bug
45
body:
56
- type: textarea

.github/ISSUE_TEMPLATE/feature_request.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
name: Feature request
22
description: Suggest an idea for this project
3+
type: Feature
34
labels: enhancement
45
body:
56
- type: textarea

.github/actions/setup-builder/action.yaml

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,3 +46,17 @@ runs:
4646
# https://github.com/actions/checkout/issues/766
4747
shell: bash
4848
run: git config --global --add safe.directory "$GITHUB_WORKSPACE"
49+
- name: Remove unnecessary preinstalled software
50+
shell: bash
51+
run: |
52+
echo "Disk space before cleanup:"
53+
df -h
54+
apt-get clean
55+
# remove tool cache: about 8.5GB (github has host /opt/hostedtoolcache mounted as /__t)
56+
rm -rf /__t/* || true
57+
# remove Haskell runtime: about 6.3GB (host /usr/local/.ghcup)
58+
rm -rf /host/usr/local/.ghcup || true
59+
# remove Android library: about 7.8GB (host /usr/local/lib/android)
60+
rm -rf /host/usr/local/lib/android || true
61+
echo "Disk space after cleanup:"
62+
df -h

.github/actions/setup-macos-aarch64-builder/action.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,8 @@ runs:
4444
rustup default stable
4545
rustup component add rustfmt
4646
- name: Setup rust cache
47-
uses: Swatinem/rust-cache@v2
47+
uses: Swatinem/rust-cache@f13886b937689c021905a6b90929199931d60db1 # v2.8.1
48+
with:
49+
save-if: ${{ github.ref_name == 'main' }}
4850
- name: Configure rust runtime env
4951
uses: ./.github/actions/setup-rust-runtime

.github/actions/setup-macos-builder/action.yaml

Lines changed: 0 additions & 47 deletions
This file was deleted.

.github/actions/setup-rust-runtime/action.yaml

Lines changed: 0 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -20,10 +20,6 @@ description: 'Setup Rust Runtime Environment'
2020
runs:
2121
using: "composite"
2222
steps:
23-
# https://github.com/apache/datafusion/issues/15535
24-
# disabled because neither version nor git hash works with apache github policy
25-
#- name: Run sccache-cache
26-
# uses: mozilla-actions/sccache-action@65101d47ea8028ed0c98a1cdea8dd9182e9b5133 # v0.0.8
2723
- name: Configure runtime env
2824
shell: bash
2925
# do not produce debug symbols to keep memory usage down
@@ -32,11 +28,6 @@ runs:
3228
#
3329
# Set debuginfo=line-tables-only as debuginfo=0 causes immensely slow build
3430
# See for more details: https://github.com/rust-lang/rust/issues/119560
35-
#
36-
# readd the following to the run below once sccache-cache is re-enabled
37-
# echo "RUSTC_WRAPPER=sccache" >> $GITHUB_ENV
38-
# echo "SCCACHE_GHA_ENABLED=true" >> $GITHUB_ENV
3931
run: |
4032
echo "RUST_BACKTRACE=1" >> $GITHUB_ENV
4133
echo "RUSTFLAGS=-C debuginfo=line-tables-only -C incremental=false" >> $GITHUB_ENV
42-

.github/dependabot.yml

Lines changed: 25 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,10 @@ updates:
2020
- package-ecosystem: cargo
2121
directory: "/"
2222
schedule:
23-
interval: daily
23+
interval: weekly
2424
target-branch: main
2525
labels: [auto-dependencies]
26+
open-pull-requests-limit: 15
2627
ignore:
2728
# major version bumps of arrow* and parquet are handled manually
2829
- dependency-name: "arrow*"
@@ -44,9 +45,31 @@ updates:
4445
patterns:
4546
- "prost*"
4647
- "pbjson*"
48+
49+
# Catch-all: group only minor/patch into a single PR,
50+
# excluding deps we want always separate (and excluding arrow/parquet which have their own group)
51+
all-other-cargo-deps:
52+
applies-to: version-updates
53+
patterns:
54+
- "*"
55+
exclude-patterns:
56+
- "arrow*"
57+
- "parquet"
58+
- "object_store"
59+
- "sqlparser"
60+
- "prost*"
61+
- "pbjson*"
62+
update-types:
63+
- "minor"
64+
- "patch"
4765
- package-ecosystem: "github-actions"
4866
directory: "/"
4967
schedule:
50-
interval: "daily"
68+
interval: "weekly"
5169
open-pull-requests-limit: 10
5270
labels: [auto-dependencies]
71+
- package-ecosystem: "pip"
72+
directory: "/docs"
73+
schedule:
74+
interval: "weekly"
75+
labels: [auto-dependencies]

.github/workflows/audit.yml

Lines changed: 11 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -23,25 +23,29 @@ concurrency:
2323

2424
on:
2525
push:
26+
branches:
27+
- main
2628
paths:
2729
- "**/Cargo.toml"
2830
- "**/Cargo.lock"
29-
branches:
30-
- main
3131

3232
pull_request:
3333
paths:
3434
- "**/Cargo.toml"
3535
- "**/Cargo.lock"
36+
37+
merge_group:
3638

3739
jobs:
3840
security_audit:
3941
runs-on: ubuntu-latest
4042
steps:
41-
- uses: actions/checkout@v4
43+
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
4244
- name: Install cargo-audit
43-
run: cargo install cargo-audit
45+
uses: taiki-e/install-action@d6e286fa45544157a02d45a43742857ebbc25d12 # v2.68.16
46+
with:
47+
tool: cargo-audit
4448
- name: Run audit check
45-
# Ignored until https://github.com/apache/datafusion/issues/15571
46-
# ignored py03 warning until arrow 55 upgrade
47-
run: cargo audit --ignore RUSTSEC-2024-0370 --ignore RUSTSEC-2025-0020
49+
# Note: you can ignore specific RUSTSEC issues using the `--ignore` flag ,for example:
50+
# run: cargo audit --ignore RUSTSEC-2026-0001
51+
run: cargo audit

0 commit comments

Comments
 (0)