Skip to content

Commit d62e173

Browse files
authored
Merge branch 'master' into fd-remove-reflection
2 parents d4040e2 + e251102 commit d62e173

227 files changed

Lines changed: 17782 additions & 2023 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/ISSUE_TEMPLATE/feature_request.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,6 @@ body:
4040
- Build
4141
- Arrow
4242
- Avro
43-
- Pig
4443
- Protobuf
4544
- Thrift
4645
- CLI

.github/dependabot.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,3 +29,4 @@ updates:
2929
schedule:
3030
interval: "weekly"
3131
day: "sunday"
32+
open-pull-requests-limit: 50

.github/workflows/ci-hadoop3.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -26,14 +26,14 @@ jobs:
2626
strategy:
2727
fail-fast: false
2828
matrix:
29-
java: [ { setup: '8', maven: '1.8' }, { setup: '11', maven: '11' }, { setup: '17', maven: '17' } ]
29+
java: [ { setup: '11', maven: '11' }, { setup: '17', maven: '17' } ]
3030
codes: [ 'uncompressed,brotli', 'gzip,snappy' ]
3131
name: Build Parquet with JDK ${{ matrix.java.setup }} and ${{ matrix.codes }}
3232

3333
steps:
3434
- uses: actions/checkout@master
3535
- name: Set up JDK ${{ matrix.java.setup }}
36-
uses: actions/setup-java@v4
36+
uses: actions/setup-java@v5
3737
with:
3838
distribution: temurin
3939
java-version: ${{ matrix.java.setup }}

.github/workflows/stale-prs.yml

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
name: Stale pull requests
19+
20+
on:
21+
schedule:
22+
- cron: '0 0 * * *'
23+
workflow_dispatch:
24+
25+
permissions:
26+
pull-requests: write
27+
issues: write
28+
29+
jobs:
30+
stale:
31+
runs-on: ubuntu-slim
32+
steps:
33+
- name: Mark and close stale pull requests
34+
uses: actions/stale@v9
35+
with:
36+
repo-token: ${{ secrets.GITHUB_TOKEN }}
37+
# Don't touch issues.
38+
days-before-issue-stale: -1
39+
days-before-issue-close: -1
40+
# PRs stale after 2 months of inactivity and closed 1 month later.
41+
days-before-pr-stale: 60
42+
days-before-pr-close: 30
43+
stale-pr-label: stale
44+
stale-pr-message: >
45+
This pull request has been automatically marked as stale because it has
46+
had no activity for at least 2 months. If you are still working on this
47+
change or plan to move it forward, please leave a comment or push a new
48+
commit so we know to keep it open. Otherwise, this PR will be closed
49+
automatically in about one month. Thank you for your contribution to
50+
Apache Parquet!
51+
close-pr-message: >
52+
Closing this pull request due to at least 3 months of inactivity. If you
53+
would like to continue the work, please feel free to reopen this pull
54+
request or open a new one. Thank you for your contribution to
55+
Apache Parquet!

.github/workflows/vector-plugins.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@ jobs:
3333
steps:
3434
- uses: actions/checkout@master
3535
- name: Set up JDK ${{ matrix.java }}
36-
uses: actions/setup-java@v4
36+
uses: actions/setup-java@v5
3737
with:
3838
distribution: temurin
3939
java-version: ${{ matrix.java }}
@@ -46,7 +46,7 @@ jobs:
4646
run: |
4747
EXTRA_JAVA_TEST_ARGS=$(./mvnw help:evaluate -Dexpression=extraJavaTestArgs -q -DforceStdout)
4848
export MAVEN_OPTS="$MAVEN_OPTS $EXTRA_JAVA_TEST_ARGS"
49-
./mvnw install --batch-mode -Pvector-plugins -DskipTests=true -Dmaven.javadoc.skip=true -Dsource.skip=true -Djava.version=${{ matrix.java }} -pl parquet-plugins/parquet-encoding-vector,parquet-plugins/parquet-plugins-benchmarks -am
49+
./mvnw install --batch-mode -Pvector-plugins -DskipTests=true -Dmaven.javadoc.skip=true -Dsource.skip=true -Dmaven.buildNumber.skip=true -Djava.version=${{ matrix.java }} -pl parquet-plugins/parquet-encoding-vector,parquet-plugins/parquet-plugins-benchmarks -am
5050
- name: verify
5151
env:
5252
TEST_CODECS: ${{ matrix.codes }}

README.md

Lines changed: 30 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -43,19 +43,19 @@ Parquet-Java uses Maven to build and depends on the thrift compiler (protoc is n
4343
To build and install the thrift compiler, run:
4444

4545
```
46-
wget -nv https://archive.apache.org/dist/thrift/0.21.0/thrift-0.21.0.tar.gz
47-
tar xzf thrift-0.21.0.tar.gz
48-
cd thrift-0.21.0
46+
wget -nv https://archive.apache.org/dist/thrift/0.22.0/thrift-0.22.0.tar.gz
47+
tar xzf thrift-0.22.0.tar.gz
48+
cd thrift-0.22.0
4949
chmod +x ./configure
5050
./configure --disable-libs
5151
sudo make install -j
5252
```
5353

54-
If you're on OSX and use homebrew, you can instead install Thrift 0.21.0 with `brew` and ensure that it comes first in your `PATH`.
54+
If you're on OSX and use homebrew, you can instead install Thrift 0.22.0 with `brew` and ensure that it comes first in your `PATH`.
5555

5656
```
5757
brew install thrift
58-
export PATH="/usr/local/opt/thrift@0.21.0/bin:$PATH"
58+
export PATH="/usr/local/opt/thrift@0.22.0/bin:$PATH"
5959
```
6060

6161
### Build Parquet with Maven
@@ -68,8 +68,7 @@ LC_ALL=C ./mvnw clean install
6868

6969
## Features
7070

71-
Parquet is a very active project, and new features are being added quickly. Here are a few features:
72-
71+
Parquet is an active project, and new features are being added quickly. Here are a few features:
7372

7473
* Type-specific encoding
7574
* Hive integration (deprecated)
@@ -96,7 +95,9 @@ Parquet is a very active project, and new features are being added quickly. Here
9695

9796
## Java Vector API support
9897
`The feature is experimental and is currently not part of the parquet distribution`.
98+
9999
Parquet-Java has supported Java Vector API to speed up reading, to enable this feature:
100+
100101
* Java 17+, 64-bit
101102
* Requiring the CPU to support instruction sets:
102103
* avx512vbmi
@@ -116,26 +117,29 @@ Note that to use an Input or Output format, you need to implement a WriteSupport
116117
We've implemented this for 2 popular data formats to provide a clean migration path as well:
117118

118119
### Thrift
120+
119121
Thrift integration is provided by the [parquet-thrift](https://github.com/apache/parquet-java/tree/master/parquet-thrift) sub-project.
120122

121123
### Avro
124+
122125
Avro conversion is implemented via the [parquet-avro](https://github.com/apache/parquet-java/tree/master/parquet-avro) sub-project.
123126

124127
### Protobuf
128+
125129
Protobuf conversion is implemented via the [parquet-protobuf](https://github.com/apache/parquet-java/tree/master/parquet-protobuf) sub-project.
126130

127131
### Create your own objects
132+
128133
* The ParquetOutputFormat can be provided a WriteSupport to write your own objects to an event based RecordConsumer.
129134
* The ParquetInputFormat can be provided a ReadSupport to materialize your own objects by implementing a RecordMaterializer
130135

131136
See the APIs:
137+
132138
* [Record conversion API](https://github.com/apache/parquet-java/tree/master/parquet-column/src/main/java/org/apache/parquet/io/api)
133139
* [Hadoop API](https://github.com/apache/parquet-java/tree/master/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/api)
134140

135141
## Hive integration
136142

137-
Hive integration is provided via the [parquet-hive](https://github.com/apache/parquet-java/tree/master/parquet-hive) sub-project.
138-
139143
Hive integration is now deprecated within the Parquet project. It is now maintained by Apache Hive.
140144

141145
## Build
@@ -149,51 +153,51 @@ The build runs in [GitHub Actions](https://github.com/apache/parquet-java/action
149153

150154
## Add Parquet as a dependency in Maven
151155

152-
The current release is version `1.15.1`.
156+
The current release is version `1.17.0`.
153157

154158
```xml
155159
<dependencies>
156160
<dependency>
157161
<groupId>org.apache.parquet</groupId>
158162
<artifactId>parquet-common</artifactId>
159-
<version>1.15.1</version>
163+
<version>1.17.0</version>
160164
</dependency>
161165
<dependency>
162166
<groupId>org.apache.parquet</groupId>
163167
<artifactId>parquet-encoding</artifactId>
164-
<version>1.15.1</version>
168+
<version>1.17.0</version>
165169
</dependency>
166170
<dependency>
167171
<groupId>org.apache.parquet</groupId>
168172
<artifactId>parquet-column</artifactId>
169-
<version>1.15.1</version>
173+
<version>1.17.0</version>
170174
</dependency>
171175
<dependency>
172176
<groupId>org.apache.parquet</groupId>
173177
<artifactId>parquet-hadoop</artifactId>
174-
<version>1.15.1</version>
178+
<version>1.17.0</version>
175179
</dependency>
176180
</dependencies>
177181
```
178182

179183
### How To Contribute
180184

181-
We prefer to receive contributions in the form of GitHub pull requests. Please send pull requests against the [parquet-java](https://github.com/apache/parquet-java) Git repository. If you've previously forked Parquet from its old location, you will need to add a remote or update your origin remote to https://github.com/apache/parquet-java.git
185+
We prefer to receive contributions in the form of GitHub pull requests. Please send pull requests against the [parquet-java](https://github.com/apache/parquet-java) Git repository. If you've previously forked Parquet from its old location, you will need to add a remote or update your origin remote to `https://github.com/apache/parquet-java.git`.
182186

183-
If you are looking for some ideas on what to contribute, check out jira issues for this project labeled ["pick-me-up"](https://issues.apache.org/jira/browse/PARQUET-5?jql=project%20%3D%20PARQUET%20and%20labels%20%3D%20pick-me-up%20and%20status%20%3D%20open).
184-
Comment on the issue and/or contact [dev@parquet.apache.org](http://mail-archives.apache.org/mod_mbox/parquet-dev/) with your questions and ideas.
187+
If you are looking for some ideas on what to contribute, check out [GitHub issues](https://github.com/apache/parquet-java/issues) for labeled [Good first issue](https://github.com/apache/parquet-java/issues?q=state%3Aopen%20label%3A%22Good%20first%20issue%22). Comment on the issue and/or contact [dev@parquet.apache.org](https://lists.apache.org/list.html?dev@parquet.apache.org) with your questions and ideas.
185188

186-
If you’d like to report a bug but don’t have time to fix it, you can still post it to our [issue tracker](https://issues.apache.org/jira/browse/PARQUET), or email the mailing list [dev@parquet.apache.org](http://mail-archives.apache.org/mod_mbox/parquet-dev/)
189+
If you’d like to report a bug but don’t have time to fix it, you can still raise an [issue on GitHub](https://github.com/apache/parquet-java/issues/new/choose), or email the mailing list [dev@parquet.apache.org](https://lists.apache.org/list.html?dev@parquet.apache.org).
187190

188191
To contribute a patch:
189192

190193
1. Break your work into small, single-purpose patches if possible. It’s much harder to merge in a large change with a lot of disjoint features.
191-
2. Create a JIRA for your patch on the [Parquet Project JIRA](https://issues.apache.org/jira/browse/PARQUET).
192-
3. Submit the patch as a GitHub pull request against the master branch. For a tutorial, see the GitHub guides on forking a repo and sending a pull request. Prefix your pull request name with the JIRA name (ex: https://github.com/apache/parquet-java/pull/240).
194+
2. Create an issue for your patch on the [GitHub issues](https://github.com/apache/parquet-java/issues).
195+
3. Submit the patch as a GitHub pull request against the master branch. For a tutorial, see the GitHub guides on forking a repo and sending a pull request. Prefix your pull request name with the issue (ex: https://github.com/apache/parquet-java/pull/3260).
193196
4. Make sure that your code passes the unit tests. You can run the tests with `./mvnw test` in the root directory.
194197
5. Add new unit tests for your code.
195198

196199
We tend to do fairly close readings of pull requests, and you may get a lot of comments. Some common issues that are not code structure related, but still important:
200+
197201
* Use 2 spaces for whitespace. Not tabs, not 4 spaces. The number of the spacing shall be 2.
198202
* Give your operators some room. Not `a+b` but `a + b` and not `foo(int a,int b)` but `foo(int a, int b)`.
199203
* Generally speaking, stick to the [Sun Java Code Conventions](http://www.oracle.com/technetwork/java/javase/documentation/codeconvtoc-136057.html)
@@ -204,18 +208,20 @@ Thank you for getting involved!
204208
## Authors and contributors
205209

206210
* [Contributors](https://github.com/apache/parquet-java/graphs/contributors)
207-
* [Committers](dev/COMMITTERS.md)
211+
* [Committers](https://projects.apache.org/committee.html?parquet)
208212

209213
## Code of Conduct
210214

211215
We hold ourselves and the Parquet developer community to two codes of conduct:
216+
212217
1. [The Apache Software Foundation Code of Conduct](https://www.apache.org/foundation/policies/conduct.html)
213218
2. [The Twitter OSS Code of Conduct](https://github.com/twitter/code-of-conduct/blob/master/code-of-conduct.md)
214219

215220
## Discussions
216-
* Mailing list: [dev@parquet.apache.org](http://mail-archives.apache.org/mod_mbox/parquet-dev/)
217-
* Bug tracker: [jira](https://issues.apache.org/jira/browse/PARQUET)
218-
* Discussions also take place in github pull requests
221+
222+
* Mailing list: [dev@parquet.apache.org](https://lists.apache.org/list.html?dev@parquet.apache.org)
223+
* GitHub issues: [Issues](https://github.com/apache/parquet-java/issues)
224+
* Discussions also take place in GitHub pull requests
219225

220226
## License
221227

dev/COMMITTERS.md

Lines changed: 0 additions & 66 deletions
This file was deleted.

dev/ci-before_install.sh

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,17 +20,27 @@
2020
# This script gets invoked by the CI system in a "before install" step
2121
################################################################################
2222

23-
export THRIFT_VERSION=0.21.0
23+
export THRIFT_VERSION=0.22.0
2424

2525
set -e
26+
set -o pipefail
2627
date
2728
sudo apt-get update -qq
2829
sudo apt-get install -qq --no-install-recommends build-essential pv autoconf automake libtool curl make \
29-
g++ unzip libboost-dev libboost-test-dev libboost-program-options-dev \
30+
g++ unzip libboost-dev libboost-test-dev libboost-program-options-dev wget \
3031
libevent-dev automake libtool flex bison pkg-config g++ libssl-dev xmlstarlet
3132
date
3233
pwd
33-
wget -qO- https://archive.apache.org/dist/thrift/$THRIFT_VERSION/thrift-$THRIFT_VERSION.tar.gz | tar zxf -
34+
for attempt in 1 2 3; do
35+
if wget -nv -O- https://archive.apache.org/dist/thrift/$THRIFT_VERSION/thrift-$THRIFT_VERSION.tar.gz | tar zxf -; then
36+
break
37+
fi
38+
if [[ "$attempt" -eq 3 ]]; then
39+
echo "Failed to download thrift after ${attempt} attempts." >&2
40+
exit 1
41+
fi
42+
sleep $((attempt * 5))
43+
done
3444
cd thrift-${THRIFT_VERSION}
3545
chmod +x ./configure
3646
./configure --disable-libs

parquet-arrow/pom.xml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
<groupId>org.apache.parquet</groupId>
2222
<artifactId>parquet</artifactId>
2323
<relativePath>../pom.xml</relativePath>
24-
<version>1.16.0-SNAPSHOT</version>
24+
<version>1.18.0-SNAPSHOT</version>
2525
</parent>
2626

2727
<modelVersion>4.0.0</modelVersion>
@@ -33,7 +33,7 @@
3333
<url>https://parquet.apache.org</url>
3434

3535
<properties>
36-
<arrow.version>17.0.0</arrow.version>
36+
<arrow.version>19.0.0</arrow.version>
3737
</properties>
3838

3939
<dependencies>

0 commit comments

Comments
 (0)