Below are planned roadmap for MVP as discussed in different places (e.g. dev ML, Github issues & PRs, slack channel, etc.). Note that it is only for the native C++ implementation. For the effort of Rust C++ binding, please refer to https://lists.apache.org/thread/hotlcdw86nrmt7cf5o5o7kq6gwo98758.
Convention
Goal
- Implement read path for parsing metadata files of Iceberg v1 & v2. It is a nice-to-have feature to read data files depending on the bandwidth of contributors.
- Provide a light-weight io-less
iceberg library with minimal dependencies (like apache/nanoarrow and nlohmann/json) to mainly deal with the Iceberg metadata. Downstream projects are required to provide their own implementations like I/O, Parquet, Avro and write adaptation code.
- Provide a battery-included
iceberg-bundle library backed by Apache Arrow C++ and Apache Avro C++ libraries.
Workitems
(Disclaimer: this is not an exhaustive list and is subject to change as the development goes on)
API of metadata or building block
Catalog
IO
Table
JSON Serialization
Metadata File Reader
File Format Reader
Schema/Data conversion
Expression
Third-party library
First release
Below are planned roadmap for MVP as discussed in different places (e.g. dev ML, Github issues & PRs, slack channel, etc.). Note that it is only for the native C++ implementation. For the effort of Rust C++ binding, please refer to https://lists.apache.org/thread/hotlcdw86nrmt7cf5o5o7kq6gwo98758.
Convention
expectedsimilar tostd::expectedGoal
iceberglibrary with minimal dependencies (likeapache/nanoarrowandnlohmann/json) to mainly deal with the Iceberg metadata. Downstream projects are required to provide their own implementations like I/O, Parquet, Avro and write adaptation code.iceberg-bundlelibrary backed by Apache Arrow C++ and Apache Avro C++ libraries.Workitems
(Disclaimer: this is not an exhaustive list and is subject to change as the development goes on)
API of metadata or building block
Schema(including data types)DataFile- [ ] AddDeleteFileManifestFileManifestEntrySnapshotPartitionSpecSortOrderManifestListTableMetadataCatalog
Cataloginterface.IO
FileIOinterface with minimal operations.arrow::FileSystemfor different storage providers.Table
Tableinterface.Table::NewScanfunction andTableScanclass to support planning files for reading a specific snapshot. @gty404TableScan. (Postponed to 0.2.0)JSON Serialization
Metadata File Reader
File Format Reader
FileReaderinterface with Arrow C Data as the contract.iceberg-bundlelibrary.iceberg-bundlelibrary.Schema/Data conversion
Expression
Third-party library
nanoarrowtolibicebergnlohmann/jsontolibiceberg@yingcai-cyavro-cpptolibiceberg-bundlearrow-cpptolibiceberg-bundleFirst release