You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Make upgrading and using downstream third-party extensions hard
Third party extensions like delta-rs and iceberg provide TableProviders for DataFusion, which is really nice. However, to use those packages the versions of DataFusion must match exactly.
This means for an application that relies on multiple downstream packages must wait until ALL of them have upgraded to the new version in order to upgrade DataFusion. If there is any delay in the downstream libraries updating, it delays.
For example, an application that wants to use delta-rs, iceberg, and the table-providers crate, there is a race after each upgrade of DataFusion
Let's take a release timeline for
+0 days: DataFusion version X released
+7 days: New delta-rs releases upgraded to DataFusion X
+11 days: new iceberg crate released upgraded to DataFusion X
+12 days: new table-providers version is released
+13-30 days: End user app can upgrade DataFusion and delta, and icerberg
+31 days: New DataFusion is released again
Describe the solution you'd like
I would like downstream libraries to have more time and schedule flexibility when upgrading DataFusion and other dependent crates, so that it is easier to construct a system from different components
-Keep (at least) two branches going: LTS and main, as proposed by @andygrove in #5269
In this model we would likely backport changes to the LTS branch and make releases from there. The downside of this approach is that there is extra work to backport changes to LTS.
Is your feature request related to a problem or challenge?
One of the dreams of the composable data ecosystem is to quickly assemble a system from various components (DataFusion, data formats
DataFusion still releases once a month, which allows code to quickly flow but also causes at least 2 challenges:
Third party extensions like delta-rs and iceberg provide
TableProvidersfor DataFusion, which is really nice. However, to use those packages the versions of DataFusion must match exactly.This means for an application that relies on multiple downstream packages must wait until ALL of them have upgraded to the new version in order to upgrade DataFusion. If there is any delay in the downstream libraries updating, it delays.
For example, an application that wants to use delta-rs, iceberg, and the
table-providerscrate, there is a race after each upgrade of DataFusionLet's take a release timeline for
XreleasedXXDescribe the solution you'd like
I would like downstream libraries to have more time and schedule flexibility when upgrading DataFusion and other dependent crates, so that it is easier to construct a system from different components
Describe alternatives you've considered
Option 1: Switch to major/minor release cadence
We could follow the model of arrow-rs which does releases monthly, but breaking releases only quarterly. Here is how it works in arrow-rs: https://github.com/apache/arrow-rs?tab=readme-ov-file#release-versioning-and-schedule
This would mean continuing to release every month, but only allowing breaking API changes every 3rd release (or some other cadence)
The major cost here is that maintainers and contributors would have to be diligent about not merging breaking API changes until a major release
This is possible to automate somewhat:
Option 2: LTS and feature branch
-Keep (at least) two branches going: LTS and main, as proposed by @andygrove in #5269
In this model we would likely backport changes to the LTS branch and make releases from there. The downside of this approach is that there is extra work to backport changes to LTS.
Additional context
No response