Skip to content

Commit e32e330

Browse files
committed
Merge remote-tracking branch 'github/master' into format_table
* github/master: (41 commits) [Python] Support data writer for PyPaimon (apache#5997) [Python] Support scan and plan for PyPaimon (apache#5996) [flink-cdc] Provide option to disable use of source primary keys if primary keys in action command are not specified for CDC ingestion. (apache#5793) Revert "[core] Add compaction.force-wait to support force waiting compaction finish when preparing commit (apache#5994)" (apache#5995) [core] Add total compaction count metric (apache#5963) [hotfix] Rename to SchemaManager.applyRenameColumnsToOptions [core] fix column rename when columns referenced by table options. (apache#5964) [core] Log a warning for invalid partition values instead of throwing an exception when enable partition mark done. (apache#5978) [core] Add required Field IDs to support ID-based column pruning (apache#5981) [core] Row-tracking row should keep their row_id and sequence_number in compaction (apache#5991) [core] Add compaction.force-wait to support force waiting compaction finish when preparing commit (apache#5994) [format] Introduce 'write.batch-memory' to control memory in arrow (apache#5988) [flink] Change filesystem.job-level-settings.enabled default value to true (apache#5971) [clone] support including some tables when clone all tables in a catalog or database. (apache#5993) [iceberg] Support TINYINT and SMALLINT in Iceberg Compatibility (apache#5984) [Python] Support snapshot and manifest for PyPaimon (apache#5987) [python] Change Schema to TableSchema in Class GetTableResponse. (apache#5990) [core] Introduce 'compaction.total-size-threshold' to do full compaction (apache#5973) [Python] Support filesystem catalog for PyPaimon (apache#5986) [core] Add lance table type for rest catalog (apache#5977) ...
2 parents 27b1a52 + 40bc087 commit e32e330

273 files changed

Lines changed: 13397 additions & 1541 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/e2e-tests-flink-1.x.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ on:
2424
paths-ignore:
2525
- 'docs/**'
2626
- '**/*.md'
27+
- 'paimon-python/**'
2728

2829
env:
2930
JDK_VERSION: 8
@@ -52,7 +53,7 @@ jobs:
5253
distribution: 'temurin'
5354

5455
- name: Build Flink
55-
run: mvn -T 2C -B clean install -DskipTests -Pflink1,spark3 -pl paimon-e2e-tests -am -Pflink-${{ matrix.flink_version }}
56+
run: mvn -T 2C -B clean install -DskipTests -Pflink1,spark3 -pl paimon-e2e-tests -am -Pflink-${{ matrix.flink_version }}
5657

5758
- name: Test Flink
5859
run: |

.github/workflows/e2e-tests-flink-2.x-jdk11.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ on:
2424
paths-ignore:
2525
- 'docs/**'
2626
- '**/*.md'
27+
- 'paimon-python/**'
2728

2829
env:
2930
JDK_VERSION: 11

.github/workflows/paimon-python-checks.yml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,9 @@ name: Check Code Style and Test
2121
on:
2222
push:
2323
pull_request:
24-
paths-ignore:
25-
- '**/*.md'
24+
paths:
25+
- 'paimon-python/**'
26+
- '!**/*.md'
2627

2728
env:
2829
PYTHON_VERSION: "3.10"
@@ -44,7 +45,7 @@ jobs:
4445
python-version: ${{ env.PYTHON_VERSION }}
4546
- name: Install dependencies
4647
run: |
47-
python -m pip install -q flake8==4.0.1 pytest~=7.0 requests 2>&1 >/dev/null
48+
python -m pip install -q readerwriterlock==1.0.9 fsspec==2024.3.1 cachetools==5.3.3 ossfs==2023.12.0 ray==2.48.0 pyarrow==15.0.2 numpy==1.24.3 pandas==2.0.3 flake8==4.0.1 pytest~=7.0 requests 2>&1 >/dev/null
4849
- name: Run lint-python.sh
4950
run: |
5051
chmod +x paimon-python/dev/lint-python.sh

.github/workflows/utitcase-flink-1.x.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ on:
2323
paths-ignore:
2424
- 'docs/**'
2525
- '**/*.md'
26+
- 'paimon-python/**'
2627

2728
env:
2829
JDK_VERSION: 8

.github/workflows/utitcase-flink-2.x-jdk11.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ on:
2424
paths-ignore:
2525
- 'docs/**'
2626
- '**/*.md'
27+
- 'paimon-python/**'
2728

2829
env:
2930
JDK_VERSION: 11

.github/workflows/utitcase-jdk11.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ on:
2424
paths-ignore:
2525
- 'docs/**'
2626
- '**/*.md'
27+
- 'paimon-python/**'
2728

2829
env:
2930
JDK_VERSION: 11

.github/workflows/utitcase-spark-3.x.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ on:
2424
paths-ignore:
2525
- 'docs/**'
2626
- '**/*.md'
27+
- 'paimon-python/**'
2728

2829
env:
2930
JDK_VERSION: 8
@@ -58,6 +59,6 @@ jobs:
5859
test_modules+="org.apache.paimon:paimon-spark-${suffix},"
5960
done
6061
test_modules="${test_modules%,}"
61-
mvn -T 2C -B test -pl "${test_modules}" -Duser.timezone=$jvm_timezone
62+
mvn -T 2C -B verify -pl "${test_modules}" -Duser.timezone=$jvm_timezone
6263
env:
6364
MAVEN_OPTS: -Xmx4096m

.github/workflows/utitcase-spark-4.x.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ on:
2424
paths-ignore:
2525
- 'docs/**'
2626
- '**/*.md'
27+
- 'paimon-python/**'
2728

2829
env:
2930
JDK_VERSION: 17
@@ -58,6 +59,6 @@ jobs:
5859
test_modules+="org.apache.paimon:paimon-spark-${suffix},"
5960
done
6061
test_modules="${test_modules%,}"
61-
mvn -T 2C -B test -pl "${test_modules}" -Duser.timezone=$jvm_timezone -Pspark4,flink1
62+
mvn -T 2C -B verify -pl "${test_modules}" -Duser.timezone=$jvm_timezone -Pspark4,flink1
6263
env:
6364
MAVEN_OPTS: -Xmx4096m

.github/workflows/utitcase.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,7 @@ on:
2424
paths-ignore:
2525
- 'docs/**'
2626
- '**/*.md'
27+
- 'paimon-python/**'
2728

2829
env:
2930
JDK_VERSION: 8

docs/content/concepts/spec/fileindex.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -217,8 +217,101 @@ Integers are all BIG_ENDIAN.
217217
Bitmap only support the following data type: TinyIntType, SmallIntType, IntType, BigIntType, DateType, TimeType,
218218
LocalZonedTimestampType, TimestampType, CharType, VarCharType, StringType, BooleanType.
219219

220+
## Index: Range Bitmap
221+
222+
Advantage:
223+
1. Smaller than the bitmap index.
224+
2. Suitable for the point query and the range query in the high level of cardinality scenarios.
225+
3. Can be used conjunction with bitmap index.
226+
227+
Shortcoming:
228+
1. The point query evaluation maybe slower than bitmap index.
229+
230+
Options:
231+
* `file-index.range-bitmap.columns`: specify the columns that need range-bitmap index.
232+
* `file-index.range-bitmap.<column_name>.chunk-size`: to config the chunk size, default value is 16kb.
233+
234+
<pre>
235+
Range Bitmap file index format (V1)
236+
+-------------------------------------------------+-----------------
237+
| header length (4 bytes int) |
238+
+-------------------------------------------------+
239+
| version (1 byte) |
240+
+-------------------------------------------------+
241+
| row number (4 bytes int) |
242+
+-------------------------------------------------+
243+
| cardinality (4 bytes int) | HEAD
244+
+-------------------------------------------------+
245+
| min value |
246+
+-------------------------------------------------+
247+
| max value |
248+
+-------------------------------------------------+
249+
| dictionary length (4 bytes int) |
250+
+-------------------------------------------------+-----------------
251+
| dictionary serialize in bytes |
252+
+-------------------------------------------------+ BODY
253+
| bit-slice index bitmap serialize in bytes |
254+
+-------------------------------------------------+-----------------
255+
</pre>
256+
257+
<pre>
258+
Dictionary format (V1)
259+
+-------------------------------------------------+-----------------
260+
| header length (4 bytes int) |
261+
+-------------------------------------------------+
262+
| version (1 byte) |
263+
+-------------------------------------------------+
264+
| the chunks size (4 bytes int) | HEAD
265+
+-------------------------------------------------+
266+
| the offsets length (4 bytes int) |
267+
+-------------------------------------------------+
268+
| the chunks length (4 bytes int) |
269+
+-------------------------------------------------+-----------------
270+
| offsets serialize in bytes |
271+
+-------------------------------------------------+
272+
| chunks serialize in bytes | BODY
273+
+-------------------------------------------------+
274+
| keys serialize in bytes |
275+
+-------------------------------------------------+-----------------
276+
</pre>
277+
278+
<pre>
279+
Bit-slice index bitmap format (V1)
280+
+-------------------------------------------------+-----------------
281+
| header length (4 bytes int) |
282+
+-------------------------------------------------+
283+
| version (1 byte) |
284+
+-------------------------------------------------+
285+
| slices size (4 bytes int) | HEAD
286+
+-------------------------------------------------+
287+
| existence bitmap length (4 bytes int) |
288+
+-------------------------------------------------+
289+
| indexes length (4 bytes int) |
290+
+-------------------------------------------------+
291+
| indexes serialize in bytes |
292+
+-------------------------------------------------+-----------------
293+
| existence bitmap serialize in bytes |
294+
+-------------------------------------------------+
295+
| the bit 0 bitmap serialize in bytes |
296+
+-------------------------------------------------+
297+
| the bit 1 bitmap serialize in byte | BODY
298+
+-------------------------------------------------+
299+
| the bit 2 bitmap serialize in byte |
300+
+-------------------------------------------------+
301+
| ... |
302+
+-------------------------------------------------+-----------------
303+
</pre>
304+
305+
RangeBitmap only support the following data type: TinyIntType, SmallIntType, IntType, BigIntType, DateType, TimeType, LocalZonedTimestampType, TimestampType, CharType, VarCharType, StringType, BooleanType, DoubleType, FloatType.
306+
220307
## Index: Bit-Slice Index Bitmap
221308

309+
{{< hint warning >}}
310+
311+
Deprecated. Using the range-bitmap index instead.
312+
313+
{{< /hint >}}
314+
222315
BSI file index is a numeric range index, used to accelerate range query, it can be used with bitmap index.
223316

224317
Define `'file-index.bsi.columns'`.

0 commit comments

Comments
 (0)