Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
74 commits
Select commit Hold shift + click to select a range
a31879e
Build: Bump mkdocstrings-python from 1.16.5 to 1.16.6 (#1811)
dependabot[bot] Mar 18, 2025
480f6d6
Build: Bump griffe from 1.6.0 to 1.6.1 (#1814)
dependabot[bot] Mar 18, 2025
a294257
Build: Bump pre-commit from 4.1.0 to 4.2.0 (#1813)
dependabot[bot] Mar 19, 2025
a84689d
Build: Bump mkdocs-literate-nav from 0.6.1 to 0.6.2 (#1812)
dependabot[bot] Mar 19, 2025
6658187
Upsert: Don't produce empty snapshots (#1810)
Fokko Mar 19, 2025
c06e320
Add JSON single value encoding (#1805)
Fokko Mar 19, 2025
063abd5
Build: Bump griffe from 1.6.1 to 1.6.2 (#1823)
dependabot[bot] Mar 21, 2025
83cc178
Build: Bump mkdocstrings-python from 1.16.6 to 1.16.7 (#1824)
dependabot[bot] Mar 21, 2025
90aee03
Build: Bump coverage from 7.7.0 to 7.7.1 (#1827)
dependabot[bot] Mar 23, 2025
baee2f9
Build: Bump getdaft from 0.4.7 to 0.4.8 (#1826)
dependabot[bot] Mar 23, 2025
9d19ef7
V3: Introduce `timestamp_ns` and `timestamptz_ns` (#1632)
sungwy Mar 23, 2025
71cb247
Support Filters on Top-Level Struct Fields (#1832)
srilman Mar 23, 2025
322ebdd
Nit: Introduce `S3_SIGNER` constant property (#1837)
smaheshwar-pltr Mar 24, 2025
1c34bc0
Build: Bump mkdocstrings-python from 1.16.7 to 1.16.8 (#1844)
dependabot[bot] Mar 25, 2025
d9a1b5d
Build: Bump pypa/cibuildwheel from 2.23.1 to 2.23.2 (#1841)
dependabot[bot] Mar 25, 2025
82b5fce
Build: Bump polars from 1.25.2 to 1.26.0 (#1842)
dependabot[bot] Mar 25, 2025
7a6a7c8
Build: Bump pyparsing from 3.2.1 to 3.2.2 (#1843)
dependabot[bot] Mar 25, 2025
172d9a7
REST: Delegate parsing to Pydantic (#1847)
Fokko Mar 25, 2025
87f9e7a
nit: Clean up import in `conftest.py` (#1848)
Fokko Mar 25, 2025
778db75
Infra: Run dependabot check weekly (#1850)
kevinjqliu Mar 26, 2025
9216233
Build: Bump griffe from 1.6.2 to 1.6.3 (#1855)
dependabot[bot] Mar 26, 2025
278f764
Build: Bump pyparsing from 3.2.2 to 3.2.3 (#1854)
dependabot[bot] Mar 26, 2025
62191ee
Deletion Vectors read support (#1516)
Fokko Mar 26, 2025
7a56ddb
Arrow: Infer the types when reading (#1669)
Fokko Mar 26, 2025
ff76144
Clean up import in `test_schema.py` (#1858)
sunxiaojian Mar 27, 2025
96e6d54
Fix naming (#1857)
Fokko Mar 28, 2025
4b15fb6
Pass data type as string representation to `NestedField` (#1860)
sunxiaojian Mar 30, 2025
bae62df
Use a balanced tree instead of unbalanced one (#1830)
koenvo Mar 31, 2025
1a5e32a
Fix decimal physicial type mapping (#1839)
redpheonixx Mar 31, 2025
d69a191
fix `upsert` with null values (#1861)
kevinjqliu Mar 31, 2025
85d4ff1
Build: Bump mkdocs-material from 9.6.9 to 9.6.10 (#1875)
dependabot[bot] Apr 1, 2025
b29caba
Build: Bump sqlalchemy from 2.0.39 to 2.0.40 (#1874)
dependabot[bot] Apr 1, 2025
df2a6ba
Build: Bump griffe from 1.6.3 to 1.7.1 (#1873)
dependabot[bot] Apr 1, 2025
f8d5574
Build: Bump datafusion from 45.2.0 to 46.0.0 (#1872)
dependabot[bot] Apr 1, 2025
0171eb0
Build: Bump coverage from 7.7.1 to 7.8.0 (#1870)
dependabot[bot] Apr 1, 2025
77c8951
Build: Bump mkdocstrings from 0.29.0 to 0.29.1 (#1871)
dependabot[bot] Apr 1, 2025
3d08776
Set field-id when needed (#1867)
Fokko Apr 1, 2025
4d4714a
Build: Bump rich from 13.9.4 to 14.0.0 (#1868)
dependabot[bot] Apr 1, 2025
a62799e
Fix creation of Bucket Transforms with `pydantic>=2.11.0` (#1881)
b-rick Apr 4, 2025
5c4e59f
Build: Bump pydantic from 2.10.6 to 2.11.1 (#1869)
dependabot[bot] Apr 4, 2025
da403d2
Add support for `Transaction.update_statistics()` (#1831)
srilman Apr 4, 2025
8adf246
Support quoted column identifiers for scan `row_filter` (#1863)
norton120 Apr 4, 2025
54571fd
Build: Bump typing-extensions from 4.12.2 to 4.13.1 (#1897)
dependabot[bot] Apr 8, 2025
855b472
Build: Bump tenacity from 9.0.0 to 9.1.2 (#1896)
dependabot[bot] Apr 8, 2025
c284341
Build: Bump mypy-boto3-glue from 1.37.13 to 1.37.29 (#1892)
dependabot[bot] Apr 8, 2025
a7d5b64
Build: Bump getdaft from 0.4.8 to 0.4.9 (#1890)
dependabot[bot] Apr 8, 2025
aeb4493
CI: Use Java 1.9.0-SNAPSHOT for testing (#1899)
Fokko Apr 8, 2025
7bf9cab
Build: Bump moto from 5.1.1 to 5.1.3 (#1889)
dependabot[bot] Apr 8, 2025
fbc7482
Build: Bump mkdocs-material from 9.6.10 to 9.6.11 (#1891)
dependabot[bot] Apr 8, 2025
2501c33
Build: Bump mkdocs-section-index from 0.3.9 to 0.3.10 (#1895)
dependabot[bot] Apr 8, 2025
2bfc926
Build: Bump griffe from 1.7.1 to 1.7.2 (#1894)
dependabot[bot] Apr 8, 2025
76d02ad
Build: Bump mkdocstrings-python from 1.16.8 to 1.16.10 (#1893)
dependabot[bot] Apr 8, 2025
1588701
Temporary fix for filtering on empty batches (#1901)
koenvo Apr 9, 2025
bb45d1e
Bump Snapshot versions (#1907)
Fokko Apr 13, 2025
a1287d4
Build: Bump duckdb from 1.2.1 to 1.2.2 (#1916)
dependabot[bot] Apr 15, 2025
0e3e80d
Introduce AuthManager (#1908)
sungwy Apr 15, 2025
018345a
Build: Bump typing-extensions from 4.13.1 to 4.13.2 (#1911)
dependabot[bot] Apr 15, 2025
8017a77
Build: Bump polars from 1.26.0 to 1.27.1 (#1912)
dependabot[bot] Apr 15, 2025
d7834ca
Build: Bump pydantic from 2.11.2 to 2.11.3 (#1913)
dependabot[bot] Apr 15, 2025
053ae04
Build: Bump getdaft from 0.4.9 to 0.4.10 (#1914)
dependabot[bot] Apr 15, 2025
4257b67
Build: Bump mypy-boto3-glue from 1.37.29 to 1.37.31 (#1915)
dependabot[bot] Apr 15, 2025
eb8756a
Ignore duckdb test (#1918)
Fokko Apr 15, 2025
5f10bbc
Fix `add_files` with non-identity transforms (#1925)
Fokko Apr 16, 2025
881b2d5
Fix thrift client connection for Kerberos Hive Client (#1747)
kevinjqliu Apr 16, 2025
3c52167
Fix the snapshot summary of a partial overwrite (#1879)
Fokko Apr 16, 2025
b440682
Fix for metadata entries table for MOR tables containing Delete Files…
guptaakashdeep Apr 16, 2025
825fd5d
Revert ignore duckdb test (#1927)
kevinjqliu Apr 17, 2025
00c548a
Use `version-hint.text` for StaticTable (#1887)
arnaudbriche Apr 17, 2025
831170d
Fix setting `force_virtual_addressing` (#1923)
helmiazizm Apr 17, 2025
02eecb6
Add cast from string to float and double (#1933)
guptaakashdeep Apr 18, 2025
068ee5d
Refactor `Metadata` in `Transaction` (#1903)
Fokko Apr 18, 2025
0d56a3b
Changes to support string transform in add_field. (#1936)
guptaakashdeep Apr 19, 2025
f5978bb
Adds support for creating a GlueCatalog with own client (#1920)
rchowell Apr 21, 2025
7dfaad9
tmp
kevinjqliu Apr 21, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ updates:
- package-ecosystem: "pip"
directory: "/"
schedule:
interval: "daily"
interval: "weekly"
open-pull-requests-limit: 50
- package-ecosystem: "github-actions"
directory: "/"
schedule:
interval: "daily"
interval: "weekly"
2 changes: 1 addition & 1 deletion .github/workflows/pypi-build-artifacts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ jobs:
if: startsWith(matrix.os, 'ubuntu')

- name: Build wheels
uses: pypa/cibuildwheel@v2.23.1
uses: pypa/cibuildwheel@v2.23.2
with:
output-dir: wheelhouse
config-file: "pyproject.toml"
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/svn-build-artifacts.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ jobs:
if: startsWith(matrix.os, 'ubuntu')

- name: Build wheels
uses: pypa/cibuildwheel@v2.23.1
uses: pypa/cibuildwheel@v2.23.2
with:
output-dir: wheelhouse
config-file: "pyproject.toml"
Expand Down
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
help: ## Display this help
@awk 'BEGIN {FS = ":.*##"; printf "\nUsage:\n make \033[36m\033[0m\n"} /^[a-zA-Z_-]+:.*?##/ { printf " \033[36m%-20s\033[0m %s\n", $$1, $$2 } /^##@/ { printf "\n\033[1m%s\033[0m\n", substr($$0, 5) } ' $(MAKEFILE_LIST)

POETRY_VERSION = 2.0.1
POETRY_VERSION = 2.1.1
install-poetry: ## Ensure Poetry is installed and the correct version is being used.
@if ! command -v poetry &> /dev/null; then \
echo "Poetry could not be found. Installing..."; \
Expand Down
6 changes: 3 additions & 3 deletions dev/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -39,20 +39,20 @@ WORKDIR ${SPARK_HOME}
# Remember to also update `tests/conftest`'s spark setting
ENV SPARK_VERSION=3.5.4
ENV ICEBERG_SPARK_RUNTIME_VERSION=3.5_2.12
ENV ICEBERG_VERSION=1.8.0
ENV ICEBERG_VERSION=1.9.0-SNAPSHOT
ENV PYICEBERG_VERSION=0.9.0

RUN curl --retry 5 -s -C - https://archive.apache.org/dist/spark/spark-${SPARK_VERSION}/spark-${SPARK_VERSION}-bin-hadoop3.tgz -o spark-${SPARK_VERSION}-bin-hadoop3.tgz \
&& tar xzf spark-${SPARK_VERSION}-bin-hadoop3.tgz --directory /opt/spark --strip-components 1 \
&& rm -rf spark-${SPARK_VERSION}-bin-hadoop3.tgz

# Download iceberg spark runtime
RUN curl --retry 5 -s https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-spark-runtime-${ICEBERG_SPARK_RUNTIME_VERSION}/${ICEBERG_VERSION}/iceberg-spark-runtime-${ICEBERG_SPARK_RUNTIME_VERSION}-${ICEBERG_VERSION}.jar \
RUN curl --retry 5 -s https://repository.apache.org/content/groups/snapshots/org/apache/iceberg/iceberg-spark-runtime-3.5_2.12/1.9.0-SNAPSHOT/iceberg-spark-runtime-3.5_2.12-1.9.0-20250409.001855-44.jar \
-Lo /opt/spark/jars/iceberg-spark-runtime-${ICEBERG_SPARK_RUNTIME_VERSION}-${ICEBERG_VERSION}.jar


# Download AWS bundle
RUN curl --retry 5 -s https://repo1.maven.org/maven2/org/apache/iceberg/iceberg-aws-bundle/${ICEBERG_VERSION}/iceberg-aws-bundle-${ICEBERG_VERSION}.jar \
RUN curl --retry 5 -s https://repository.apache.org/content/groups/snapshots/org/apache/iceberg/iceberg-aws-bundle/1.9.0-SNAPSHOT/iceberg-aws-bundle-1.9.0-20250409.002731-88.jar \
-Lo /opt/spark/jars/iceberg-aws-bundle-${ICEBERG_VERSION}.jar

COPY spark-defaults.conf /opt/spark/conf
Expand Down
164 changes: 88 additions & 76 deletions dev/provision.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
import math

from pyspark.sql import SparkSession
from pyspark.sql.functions import current_date, date_add, expr
Expand Down Expand Up @@ -113,89 +114,99 @@
"""
)

spark.sql(
f"""
CREATE OR REPLACE TABLE {catalog_name}.default.test_positional_mor_deletes (
dt date,
number integer,
letter string
)
USING iceberg
TBLPROPERTIES (
'write.delete.mode'='merge-on-read',
'write.update.mode'='merge-on-read',
'write.merge.mode'='merge-on-read',
'format-version'='2'
);
"""
)
# Merge on read has been implemented in version ≥2:
# v2: Using positional deletes
# v3: Using deletion vectors

spark.sql(
f"""
INSERT INTO {catalog_name}.default.test_positional_mor_deletes
VALUES
(CAST('2023-03-01' AS date), 1, 'a'),
(CAST('2023-03-02' AS date), 2, 'b'),
(CAST('2023-03-03' AS date), 3, 'c'),
(CAST('2023-03-04' AS date), 4, 'd'),
(CAST('2023-03-05' AS date), 5, 'e'),
(CAST('2023-03-06' AS date), 6, 'f'),
(CAST('2023-03-07' AS date), 7, 'g'),
(CAST('2023-03-08' AS date), 8, 'h'),
(CAST('2023-03-09' AS date), 9, 'i'),
(CAST('2023-03-10' AS date), 10, 'j'),
(CAST('2023-03-11' AS date), 11, 'k'),
(CAST('2023-03-12' AS date), 12, 'l');
"""
)
for format_version in [2, 3]:
identifier = f'{catalog_name}.default.test_positional_mor_deletes_v{format_version}'
spark.sql(
f"""
CREATE OR REPLACE TABLE {identifier} (
dt date,
number integer,
letter string
)
USING iceberg
TBLPROPERTIES (
'write.delete.mode'='merge-on-read',
'write.update.mode'='merge-on-read',
'write.merge.mode'='merge-on-read',
'format-version'='{format_version}'
);
"""
)

spark.sql(
f"""
INSERT INTO {identifier}
VALUES
(CAST('2023-03-01' AS date), 1, 'a'),
(CAST('2023-03-02' AS date), 2, 'b'),
(CAST('2023-03-03' AS date), 3, 'c'),
(CAST('2023-03-04' AS date), 4, 'd'),
(CAST('2023-03-05' AS date), 5, 'e'),
(CAST('2023-03-06' AS date), 6, 'f'),
(CAST('2023-03-07' AS date), 7, 'g'),
(CAST('2023-03-08' AS date), 8, 'h'),
(CAST('2023-03-09' AS date), 9, 'i'),
(CAST('2023-03-10' AS date), 10, 'j'),
(CAST('2023-03-11' AS date), 11, 'k'),
(CAST('2023-03-12' AS date), 12, 'l');
"""
)

spark.sql(f"ALTER TABLE {catalog_name}.default.test_positional_mor_deletes CREATE TAG tag_12")
spark.sql(f"ALTER TABLE {identifier} CREATE TAG tag_12")

spark.sql(f"ALTER TABLE {catalog_name}.default.test_positional_mor_deletes CREATE BRANCH without_5")
spark.sql(f"ALTER TABLE {identifier} CREATE BRANCH without_5")

spark.sql(f"DELETE FROM {catalog_name}.default.test_positional_mor_deletes.branch_without_5 WHERE number = 5")
spark.sql(f"DELETE FROM {identifier}.branch_without_5 WHERE number = 5")

spark.sql(f"DELETE FROM {catalog_name}.default.test_positional_mor_deletes WHERE number = 9")
spark.sql(f"DELETE FROM {identifier} WHERE number = 9")

spark.sql(
f"""
CREATE OR REPLACE TABLE {catalog_name}.default.test_positional_mor_double_deletes (
dt date,
number integer,
letter string
)
USING iceberg
TBLPROPERTIES (
'write.delete.mode'='merge-on-read',
'write.update.mode'='merge-on-read',
'write.merge.mode'='merge-on-read',
'format-version'='2'
);
"""
)
identifier = f'{catalog_name}.default.test_positional_mor_double_deletes_v{format_version}'

spark.sql(
f"""
INSERT INTO {catalog_name}.default.test_positional_mor_double_deletes
VALUES
(CAST('2023-03-01' AS date), 1, 'a'),
(CAST('2023-03-02' AS date), 2, 'b'),
(CAST('2023-03-03' AS date), 3, 'c'),
(CAST('2023-03-04' AS date), 4, 'd'),
(CAST('2023-03-05' AS date), 5, 'e'),
(CAST('2023-03-06' AS date), 6, 'f'),
(CAST('2023-03-07' AS date), 7, 'g'),
(CAST('2023-03-08' AS date), 8, 'h'),
(CAST('2023-03-09' AS date), 9, 'i'),
(CAST('2023-03-10' AS date), 10, 'j'),
(CAST('2023-03-11' AS date), 11, 'k'),
(CAST('2023-03-12' AS date), 12, 'l');
"""
)
spark.sql(
f"""
CREATE OR REPLACE TABLE {identifier} (
dt date,
number integer,
letter string
)
USING iceberg
TBLPROPERTIES (
'write.delete.mode'='merge-on-read',
'write.update.mode'='merge-on-read',
'write.merge.mode'='merge-on-read',
'format-version'='2'
);
"""
)

spark.sql(f"DELETE FROM {catalog_name}.default.test_positional_mor_double_deletes WHERE number = 9")
spark.sql(
f"""
INSERT INTO {identifier}
VALUES
(CAST('2023-03-01' AS date), 1, 'a'),
(CAST('2023-03-02' AS date), 2, 'b'),
(CAST('2023-03-03' AS date), 3, 'c'),
(CAST('2023-03-04' AS date), 4, 'd'),
(CAST('2023-03-05' AS date), 5, 'e'),
(CAST('2023-03-06' AS date), 6, 'f'),
(CAST('2023-03-07' AS date), 7, 'g'),
(CAST('2023-03-08' AS date), 8, 'h'),
(CAST('2023-03-09' AS date), 9, 'i'),
(CAST('2023-03-10' AS date), 10, 'j'),
(CAST('2023-03-11' AS date), 11, 'k'),
(CAST('2023-03-12' AS date), 12, 'l');
"""
)

spark.sql(f"DELETE FROM {catalog_name}.default.test_positional_mor_double_deletes WHERE letter == 'f'")
# Perform two deletes, should produce:
# v2: two positional delete files in v2
# v3: one deletion vector since they are merged
spark.sql(f"DELETE FROM {identifier} WHERE number = 9")
spark.sql(f"DELETE FROM {identifier} WHERE letter == 'f'")

all_types_dataframe = (
spark.range(0, 5, 1, 5)
Expand Down Expand Up @@ -328,6 +339,7 @@
CREATE TABLE {catalog_name}.default.test_table_empty_list_and_map (
col_list array<int>,
col_map map<int, int>,
col_struct struct<test:int>,
col_list_with_struct array<struct<test:int>>
)
USING iceberg
Expand All @@ -340,8 +352,8 @@
spark.sql(
f"""
INSERT INTO {catalog_name}.default.test_table_empty_list_and_map
VALUES (null, null, null),
(array(), map(), array(struct(1)))
VALUES (null, null, null, null),
(array(), map(), struct(1), array(struct(1)))
"""
)

Expand Down
11 changes: 11 additions & 0 deletions mkdocs/docs/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,17 @@ static_table = StaticTable.from_metadata(

The static-table is considered read-only.

Alternatively, if your table metadata directory contains a `version-hint.text` file, you can just specify
the table root path, and the latest metadata file will be picked automatically.

```python
from pyiceberg.table import StaticTable

static_table = StaticTable.from_metadata(
"s3://warehouse/wh/nyc.db/taxis
)
```

## Check if a table exists

To check whether the `bids` table exists:
Expand Down
2 changes: 1 addition & 1 deletion mkdocs/docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ PyIceberg uses [S3FileSystem](https://arrow.apache.org/docs/python/generated/pya
| s3.access-key-id | admin | Configure the static access key id used to access the FileIO. |
| s3.secret-access-key | password | Configure the static secret access key used to access the FileIO. |
| s3.session-token | AQoDYXdzEJr... | Configure the static session token used to access the FileIO. |
| s3.force-virtual-addressing | True | Whether to use virtual addressing of buckets. This must be set to True as OSS can only be accessed with virtual hosted style address. |
| s3.force-virtual-addressing | True | Whether to use virtual addressing of buckets. This is set to `True` by default as OSS can only be accessed with virtual hosted style address. |

<!-- markdown-link-check-enable-->

Expand Down
Loading