huggingface · lhoestq · Feb 9, 2026 · Feb 9, 2026 · Feb 9, 2026 · Feb 9, 2026
diff --git a/docs/hub/_toctree.yml b/docs/hub/_toctree.yml
@@ -283,6 +283,10 @@
           title: Performing data transformations
         - local: datasets-polars-optimizations
           title: Performance optimizations
+      - local: datasets-pyarrow
+        title: PyArrow
+      - local: datasets-pyiceberg
+        title: PyIceberg
       - local: datasets-spark
         title: Spark
       - local: datasets-webdataset

diff --git a/docs/hub/datasets-duckdb.md b/docs/hub/datasets-duckdb.md
@@ -1,11 +1,7 @@
 # DuckDB
 
 [DuckDB](https://github.com/duckdb/duckdb) is an in-process SQL [OLAP](https://en.wikipedia.org/wiki/Online_analytical_processing) database management system.
-You can use the Hugging Face paths (`hf://`) to access data on the Hub:
-
-<div class="flex justify-center">
-<img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/duckdb_hf_url.png"/>
-</div>
+You can use the Hugging Face paths (`hf://`) to access data on the Hub, or an Iceberg Datasets Catalog.
 
 The [DuckDB CLI](https://duckdb.org/docs/api/cli/overview.html) (Command Line Interface) is a single, dependency-free executable. 
 There are also other APIs available for running DuckDB, including Python, C++, Go, Java, Rust, and more. For additional details, visit their [clients](https://duckdb.org/docs/api/overview.html) page.
@@ -20,55 +16,66 @@ Starting from version `v0.10.3`, the DuckDB CLI includes native support for acce
 - Combine datasets and export it to different formats
 - Conduct vector similarity search on embedding datasets
 - Implement full-text search on datasets
+- Use an Iceberg Datasets Catalog
 
 For a complete list of DuckDB features, visit the DuckDB [documentation](https://duckdb.org/docs/).
 
-To start the CLI, execute the following command in the installation folder:
+## Authentication
+
+To access gated and private datasets, login to Hugging Face with:
 
 ```bash
-./duckdb
+hf auth login
+```
+
+Then in DuckDB, load the hf_token with this command:
+
+```sql
+CREATE SECRET hf_token (TYPE HUGGINGFACE, PROVIDER credential_chain);
 ```
 
-## Forging the Hugging Face URL
+See more details on authentication in the [DuckDB authentication documentation for Hugging face](./datasets-duckdb-auth).
+
+## Querying files on Hugging Face
 
 To access Hugging Face datasets, use the following URL format:
 
 ```plaintext
 hf://datasets/{my-username}/{my-dataset}/{path_to_file} 
 ```
 
-- **my-username**, the user or organization of the dataset, e.g. `ibm`
-- **my-dataset**, the dataset name, e.g: `duorc`
+- **my-username**, the user or organization of the dataset, e.g. `stanfordnlp`
+- **my-dataset**, the dataset name, e.g: `imdb`
 - **path_to_parquet_file**, the parquet file path which supports glob patterns, e.g `**/*.parquet`, to query all parquet files
 
 
-> [!TIP]
-> You can query auto-converted Parquet files using the @~parquet branch, which corresponds to the `refs/convert/parquet` revision. For more details, refer to the documentation at https://huggingface.co/docs/datasets-server/en/parquet#conversion-to-parquet.
->
-> To reference the `refs/convert/parquet` revision of a dataset, use the following syntax:
->
-> ```plaintext
-> hf://datasets/{my-username}/{my-dataset}@~parquet/{path_to_file} 
-> ```
->
-> Here is a sample URL following the above syntax:
->
-> ```plaintext
-> hf://datasets/ibm/duorc@~parquet/ParaphraseRC/test/0000.parquet
-> ```
-
-Let's start with a quick demo to query all the rows of a dataset:
+For example, to query the train split of the [stanfordnlp/imdb](https://huggingface.co/datasets/stanfordnlp/imdb) dataset:
 
 ```sql
-FROM 'hf://datasets/ibm/duorc/ParaphraseRC/*.parquet' LIMIT 3;
+SELECT * FROM 'hf://datasets/stanfordnlp/imdb/**/train-*.parquet' LIMIT 10;
 ```
 
-Or using traditional SQL syntax:
+Which returns:
 
-```sql
-SELECT * FROM 'hf://datasets/ibm/duorc/ParaphraseRC/*.parquet' LIMIT 3;
 ```
-In the following sections, we will cover more complex operations you can perform with DuckDB on Hugging Face datasets.
+┌──────────────────────────────────────────────────────────────────────┬───────┐
+│                                 text                                 │ label │
+│                               varchar                                │ int64 │
+├──────────────────────────────────────────────────────────────────────┼───────┤
+│ I rented I AM CURIOUS-YELLOW from my video store because of all th…  │     0 │
+│ "I Am Curious: Yellow" is a risible and pretentious steaming pile.…  │     0 │
+│ If only to avoid making this type of film in the future. This film…  │     0 │
+│ This film was probably inspired by Godard's Masculin, féminin and …  │     0 │
+│ Oh, brother...after hearing about this ridiculous film for umpteen…  │     0 │
+│ I would put this at the top of my list of films in the category of…  │     0 │
+│ Whoever wrote the screenplay for this movie obviously never consul…  │     0 │
+│ When I first saw a glimpse of this movie, I quickly noticed the ac…  │     0 │
+│ Who are these "They"- the actors? the filmmakers? Certainly couldn…  │     0 │
+│ This is said to be a personal film for Peter Bogdonavitch. He base…  │     0 │
+├──────────────────────────────────────────────────────────────────────┴───────┤
+│ 10 rows                                                            2 columns │
+└──────────────────────────────────────────────────────────────────────────────┘
+```
 
 > [!TIP]
 > **Querying Storage Buckets**: When using the DuckDB Python client, you can query data stored in [Storage Buckets](./storage-buckets) by registering the Hugging Face filesystem:
@@ -79,3 +86,136 @@ In the following sections, we will cover more complex operations you can perform
 > duckdb.sql("SELECT * FROM 'hf://buckets/username/my-bucket/data.parquet' LIMIT 10")
 > ```
 Native `hf://buckets/` support in DuckDB is expected in a future release.
+
+## Query an Iceberg Datasets Catalog
+
+Use the PyIceberg library `faceberg` to deploy an Iceberg catalog (see next section) you can use to query datasets on Huggging Face using an easy syntax.
+
+In particular you can query datasets as `faceberg.namespace.dataset_name` instead of having to pass a file pattern, and it automatically adds a `split` column to differentiate between train/test/validation splits.
+
+For example, here is the syntax to query the [stanfordnlp/imdb](https://huggingface.co/datasets/stanfordnlp/imdb) dataset:
+
+```sql
+SELECT * FROM faceberg.stanfordnlp.imdb LIMIT 10;
+```
+
+```
+┌─────────┬────────────────────────────────────────────────────────────┬───────┐
+│  split  │                            text                            │ label │
+│ varchar │                          varchar                           │ int64 │
+├─────────┼────────────────────────────────────────────────────────────┼───────┤
+│ train   │ I rented I AM CURIOUS-YELLOW from my video store because…  │     0 │
+│ train   │ "I Am Curious: Yellow" is a risible and pretentious stea…  │     0 │
+│ train   │ If only to avoid making this type of film in the future.…  │     0 │
+│ train   │ This film was probably inspired by Godard's Masculin, fé…  │     0 │
+│ train   │ Oh, brother...after hearing about this ridiculous film f…  │     0 │
+│ train   │ I would put this at the top of my list of films in the c…  │     0 │
+│ train   │ Whoever wrote the screenplay for this movie obviously ne…  │     0 │
+│ train   │ When I first saw a glimpse of this movie, I quickly noti…  │     0 │
+│ train   │ Who are these "They"- the actors? the filmmakers? Certai…  │     0 │
+│ train   │ This is said to be a personal film for Peter Bogdonavitc…  │     0 │
+├─────────┴────────────────────────────────────────────────────────────┴───────┤
+│ 10 rows                                                            3 columns │
+└──────────────────────────────────────────────────────────────────────────────┘
+```
+
+And you can simply filter by split like this:
+
+
+```sql
+SELECT * FROM faceberg.stanfordnlp.imdb WHERE split = 'test' LIMIT 10;
+```
+
+```
+┌─────────┬────────────────────────────────────────────────────────────┬───────┐
+│  split  │                            text                            │ label │
+│ varchar │                          varchar                           │ int64 │
+├─────────┼────────────────────────────────────────────────────────────┼───────┤
+│ test    │ I love sci-fi and am willing to put up with a lot. Sci-f…  │     0 │
+│ test    │ Worth the entertainment value of a rental, especially if…  │     0 │
+│ test    │ its a totally average film with a few semi-alright actio…  │     0 │
+│ test    │ STAR RATING: ***** Saturday Night **** Friday Night *** …  │     0 │
+│ test    │ First off let me say, If you haven't enjoyed a Van Damme…  │     0 │
+│ test    │ I had high hopes for this one until they changed the nam…  │     0 │
+│ test    │ Isaac Florentine has made some of the best western Marti…  │     0 │
+│ test    │ It actually pains me to say it, but this movie was horri…  │     0 │
+│ test    │ Technically I'am a Van Damme Fan, or I was. this movie i…  │     0 │
+│ test    │ Honestly awful film, bad editing, awful lighting, dire d…  │     0 │
+├─────────┴────────────────────────────────────────────────────────────┴───────┤
+│ 10 rows                                                            3 columns │
+└──────────────────────────────────────────────────────────────────────────────┘
+```
+
+### Deploy a catalog on HuggingFace Hub
+
+To deploy an Iceberg Datasets catalog, run `pip install faceberg` and run this command using your own Hugging Face username instead of "user":
+
+```bash
+faceberg user/mycatalog init
+```
+
+### Add datasets
+
+Once your catalog is ready, add datasets using the following command:
+
+```bash
+faceberg user/mycatalog add stanfordnlp/imdb
+faceberg user/mycatalog add openai/gsm8k --config main
+```
+
+### Query with interactive DuckDB shell
+
+`faceberg` comes with an builtin DuckDB shell you can run like this:
+
+```bash
+faceberg user/mycatalog quack
+```
+
+```sql
+SELECT label, substr(text, 1, 100) as preview
+FROM faceberg.stanfordnlp.imdb
+LIMIT 10;
+```
+
+Alternatively DuckDB shell is also available in the catalog's web interface:
+
+<div class="flex justify-center">
+    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/datasets-faceberg-space-imdb-sql-min.png"/>
+</div>
+
+### More information
+
+Find more information on `faceberg` and the PyIceberg integration with the Hugging Face Hub in the [documentation](./datasets-pyiceberg).
+
+
+## Auto-converted Parquet files
+
+You can query auto-converted Parquet files using the @~parquet branch, which corresponds to the `refs/convert/parquet` revision. For more details, refer to the documentation at https://huggingface.co/docs/datasets-server/en/parquet#conversion-to-parquet:
+
+
+<div class="flex justify-center">
+    <img src="https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/hub/duckdb_hf_url.png"/>
+</div>
+
+To reference the `refs/convert/parquet` revision of a dataset, use the following syntax:
+
+```plaintext
+hf://datasets/{my-username}/{my-dataset}@~parquet/{path_to_file} 
+```
+
+Here is a sample URL following the above syntax for the [fka/prompts.chat](https://huggingface.co/datasets/fka/prompts.chat) dataset to a file in its [Parquet branch](https://huggingface.co/datasets/fka/prompts.chat/tree/refs%2Fconvert%2Fparquet):
+
+```plaintext
+hf://datasets/fka/prompts.chat@~parquet/default/train/0000.parquet
+```
+
+In the following sections, we will cover more complex operations you can perform with DuckDB on Hugging Face datasets.
+
+## Use-cases and examples
+
+Find more use-cases and examples with Hugging Face Datasets here:
+
+* [Query datasets](./datasets-duckdb-select.md)
+* [Perform SQL operations](./datasets-duckdb-sql)
+* [Combine datasets and export](./datasets-duckdb-combine-and-export.md)
+* [Perform vector similarity search](./datasets-duckdb-vector-similarity-search.md)
diff --git a/docs/hub/datasets-libraries.md b/docs/hub/datasets-libraries.md
@@ -24,6 +24,7 @@ The table below summarizes the supported libraries and their level of integratio
 | [Pandas](./datasets-pandas)         | Python data analysis toolkit.                                                                                                  | ✅                | ❌              | ✅          | ❌            | ✅*                     |
 | [Polars](./datasets-polars)         | A DataFrame library on top of an OLAP query engine.                                                                            | ✅                | ✅              | ✅          | ❌            | ❌                      |
 | [PyArrow](./datasets-pyarrow)       | Apache Arrow is a columnar format and a toolbox for fast data interchange and in-memory analytics.                             | ✅                | ✅              | ✅          | ❌            | ✅*                     |
+| [PyIceberg](./datasets-pyiceberg)   | Apache Iceberg is a high performance open-source format for large analytic tables.                                             | ✅                | ✅              | ❌          | ❌            | ❌                      |
 | [Spark](./datasets-spark)           | Real-time, large-scale data processing tool in a distributed environment.                                                      | ✅                | ✅              | ✅          | ✅            | ✅                      |
 | [WebDataset](./datasets-webdataset) | Library to write I/O pipelines for large datasets.                                                                             | ✅                | ✅              | ❌          | ❌            | ❌                      |