Skip to content

Commit fd7e87e

Browse files
committed
add docs
1 parent 2be460e commit fd7e87e

File tree

1 file changed

+63
-0
lines changed

1 file changed

+63
-0
lines changed

mkdocs/docs/api.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1758,3 +1758,66 @@ shape: (11, 4)
17581758
21566 ┆ Incorrect billing amount ┆ 2022-04-17 10:53:20
17591759
└───────────┴─────────────┴────────────────────────────┴─────────────────────┘
17601760
```
1761+
1762+
### Apache DataFusion
1763+
1764+
PyIceberg integrates with [Apache DataFusion](https://datafusion.apache.org/) through the Custom Table Provider interface ([FFI_TableProvider](https://datafusion.apache.org/python/user-guide/io/table_provider.html)) exposed through `iceberg-rust`.
1765+
1766+
<!-- prettier-ignore-start -->
1767+
1768+
!!! note "Requirements"
1769+
This requires [`datafusion` to be installed](index.md).
1770+
1771+
<!-- prettier-ignore-end -->
1772+
1773+
<!-- markdownlint-disable MD046 -- Allowing indented multi-line formatting in admonition-->
1774+
1775+
!!! warning "Experimental Feature"
1776+
The DataFusion integration is considered **experimental**.
1777+
1778+
The integration has a few caveats:
1779+
1780+
- Only works with `datafusion >= 45`
1781+
- Depends directly on `iceberg-rust` instead of PyIceberg's implementation
1782+
- Has limited features compared to the full PyIceberg API
1783+
1784+
The integration will improve as both DataFusion and `iceberg-rust` matures.
1785+
1786+
<!-- markdownlint-enable MD046 -->
1787+
1788+
PyIceberg tables can be registered directly with DataFusion's SessionContext using the table provider interface.
1789+
1790+
```python
1791+
from datafusion import SessionContext
1792+
from pyiceberg.catalog import load_catalog
1793+
import pyarrow as pa
1794+
1795+
# Load catalog and create/load a table
1796+
catalog = load_catalog("catalog", type="in-memory")
1797+
catalog.create_namespace_if_not_exists("default")
1798+
1799+
# Create some sample data
1800+
data = pa.table({"x": [1, 2, 3], "y": [4, 5, 6]})
1801+
iceberg_table = catalog.create_table("default.test", schema=data.schema)
1802+
iceberg_table.append(data)
1803+
1804+
# Register the table with DataFusion
1805+
ctx = SessionContext()
1806+
ctx.register_table_provider("test", iceberg_table)
1807+
1808+
# Query the table using DataFusion SQL
1809+
ctx.table("test").show()
1810+
```
1811+
1812+
This will output:
1813+
1814+
```python
1815+
DataFrame()
1816+
+---+---+
1817+
| x | y |
1818+
+---+---+
1819+
| 1 | 4 |
1820+
| 2 | 5 |
1821+
| 3 | 6 |
1822+
+---+---+
1823+
```

0 commit comments

Comments
 (0)