|
19 | 19 |
|
20 | 20 | # Changelog |
21 | 21 |
|
| 22 | +## [52.0.0](https://github.com/apache/datafusion-ballista/tree/52.0.0) (2026-03-02) |
| 23 | + |
| 24 | +**Performance related:** |
| 25 | + |
| 26 | +- perf: optimize shuffle writer with buffered I/O and fix file size bug [#1386](https://github.com/apache/datafusion-ballista/pull/1386) (andygrove) |
| 27 | + |
| 28 | +**Implemented enhancements:** |
| 29 | + |
| 30 | +- feat: add config option for skipping arrow ipc read validation [#1374](https://github.com/apache/datafusion-ballista/pull/1374) (killzoner) |
| 31 | +- feat: improve tpch benchmark CLI [#1391](https://github.com/apache/datafusion-ballista/pull/1391) (andygrove) |
| 32 | +- feat: Add sort-based shuffle implementation [#1389](https://github.com/apache/datafusion-ballista/pull/1389) (andygrove) |
| 33 | +- feat: New ballista python interface [#1338](https://github.com/apache/datafusion-ballista/pull/1338) (milenkovicm) |
| 34 | +- feat: Add batch coalescing ability to shuffle reader exec [#1380](https://github.com/apache/datafusion-ballista/pull/1380) (danielhumanmod) |
| 35 | +- feat: Add arrow flight proxy to scheduler [#1351](https://github.com/apache/datafusion-ballista/pull/1351) (sebbegg) |
| 36 | +- feat: Creating SubstraitSchedulerClient and standalone Substrait examples [#1376](https://github.com/apache/datafusion-ballista/pull/1376) (mattcuento) |
| 37 | +- feat: Cluster RPC customisations to support TLS and custom headers [#1400](https://github.com/apache/datafusion-ballista/pull/1400) (phillipleblanc) |
| 38 | +- feat: add -c config override flag to tpch benchmark [#1435](https://github.com/apache/datafusion-ballista/pull/1435) (andygrove) |
| 39 | +- feat: Extract `execution_graph` to a trait [#1361](https://github.com/apache/datafusion-ballista/pull/1361) (milenkovicm) |
| 40 | +- feat: Add spark-compat mode to integrate datafusion-spark features au… [#1416](https://github.com/apache/datafusion-ballista/pull/1416) (mattcuento) |
| 41 | +- feat: add `Dataframe.cache()` factory (no planner handling) [#1420](https://github.com/apache/datafusion-ballista/pull/1420) (killzoner) |
| 42 | +- feat: Adaptive query execution (AQE) planner fundamentals [#1372](https://github.com/apache/datafusion-ballista/pull/1372) (milenkovicm) |
| 43 | +- feat: Make push scheduling policy default as it has lower latency [#1461](https://github.com/apache/datafusion-ballista/pull/1461) (milenkovicm) |
| 44 | +- feat: job scheduling with push based job status updates [#1478](https://github.com/apache/datafusion-ballista/pull/1478) (milenkovicm) |
| 45 | + |
| 46 | +**Fixed bugs:** |
| 47 | + |
| 48 | +- fix: compile issue after unsuccessful merge [#1402](https://github.com/apache/datafusion-ballista/pull/1402) (milenkovicm) |
| 49 | +- fix: prost build keda and TLS RPC example [#1429](https://github.com/apache/datafusion-ballista/pull/1429) (killzoner) |
| 50 | +- fix: remove `scheduler_config_spec.toml` as it is unused [#1462](https://github.com/apache/datafusion-ballista/pull/1462) (milenkovicm) |
| 51 | +- fix: Don't use `maxrows` as a "fetched rows" but calculate it from the batches [#1480](https://github.com/apache/datafusion-ballista/pull/1480) (martin-g) |
| 52 | + |
| 53 | +**Documentation updates:** |
| 54 | + |
| 55 | +- docs: fix outdated content in documentation [#1385](https://github.com/apache/datafusion-ballista/pull/1385) (andygrove) |
| 56 | +- docs: use tpchgen-rs for TPC-H data generation [#1390](https://github.com/apache/datafusion-ballista/pull/1390) (andygrove) |
| 57 | +- docs: add Jupyter notebook support documentation [#1399](https://github.com/apache/datafusion-ballista/pull/1399) (andygrove) |
| 58 | +- chore: Document ballista features in README.md [#1418](https://github.com/apache/datafusion-ballista/pull/1418) (mattcuento) |
| 59 | + |
| 60 | +**Merged pull requests:** |
| 61 | + |
| 62 | +- feat: add config option for skipping arrow ipc read validation [#1374](https://github.com/apache/datafusion-ballista/pull/1374) (killzoner) |
| 63 | +- docs: fix outdated content in documentation [#1385](https://github.com/apache/datafusion-ballista/pull/1385) (andygrove) |
| 64 | +- restrict python CI to python directory [#1383](https://github.com/apache/datafusion-ballista/pull/1383) (Huy1Ng) |
| 65 | +- perf: optimize shuffle writer with buffered I/O and fix file size bug [#1386](https://github.com/apache/datafusion-ballista/pull/1386) (andygrove) |
| 66 | +- docs: use tpchgen-rs for TPC-H data generation [#1390](https://github.com/apache/datafusion-ballista/pull/1390) (andygrove) |
| 67 | +- feat: improve tpch benchmark CLI [#1391](https://github.com/apache/datafusion-ballista/pull/1391) (andygrove) |
| 68 | +- doc: Add Ballista extensions example to the docs. [#1382](https://github.com/apache/datafusion-ballista/pull/1382) (LouisBurke) |
| 69 | +- feat: Add sort-based shuffle implementation [#1389](https://github.com/apache/datafusion-ballista/pull/1389) (andygrove) |
| 70 | +- feat: New ballista python interface [#1338](https://github.com/apache/datafusion-ballista/pull/1338) (milenkovicm) |
| 71 | +- doc: add more details for protobuf extension [#1393](https://github.com/apache/datafusion-ballista/pull/1393) (LouisBurke) |
| 72 | +- feat: Add batch coalescing ability to shuffle reader exec [#1380](https://github.com/apache/datafusion-ballista/pull/1380) (danielhumanmod) |
| 73 | +- docs: add Jupyter notebook support documentation [#1399](https://github.com/apache/datafusion-ballista/pull/1399) (andygrove) |
| 74 | +- feat: Add arrow flight proxy to scheduler [#1351](https://github.com/apache/datafusion-ballista/pull/1351) (sebbegg) |
| 75 | +- chore: update datafusion to 52 [#1394](https://github.com/apache/datafusion-ballista/pull/1394) (killzoner) |
| 76 | +- feat: Creating SubstraitSchedulerClient and standalone Substrait examples [#1376](https://github.com/apache/datafusion-ballista/pull/1376) (mattcuento) |
| 77 | +- fix: compile issue after unsuccessful merge [#1402](https://github.com/apache/datafusion-ballista/pull/1402) (milenkovicm) |
| 78 | +- feat: Cluster RPC customisations to support TLS and custom headers [#1400](https://github.com/apache/datafusion-ballista/pull/1400) (phillipleblanc) |
| 79 | +- chore: Document ballista features in README.md [#1418](https://github.com/apache/datafusion-ballista/pull/1418) (mattcuento) |
| 80 | +- fix: prost build keda and TLS RPC example [#1429](https://github.com/apache/datafusion-ballista/pull/1429) (killzoner) |
| 81 | +- Improve sort-based shuffle: single spill file per partition and batch coalescing [#1431](https://github.com/apache/datafusion-ballista/pull/1431) (andygrove) |
| 82 | +- feat: add -c config override flag to tpch benchmark [#1435](https://github.com/apache/datafusion-ballista/pull/1435) (andygrove) |
| 83 | +- feat: Extract `execution_graph` to a trait [#1361](https://github.com/apache/datafusion-ballista/pull/1361) (milenkovicm) |
| 84 | +- chore: add confirmation before tarball is released [#1445](https://github.com/apache/datafusion-ballista/pull/1445) (milenkovicm) |
| 85 | +- minor: add test to cover IPC arrow file read [#1450](https://github.com/apache/datafusion-ballista/pull/1450) (milenkovicm) |
| 86 | +- feat: Add spark-compat mode to integrate datafusion-spark features au… [#1416](https://github.com/apache/datafusion-ballista/pull/1416) (mattcuento) |
| 87 | +- feat: add `Dataframe.cache()` factory (no planner handling) [#1420](https://github.com/apache/datafusion-ballista/pull/1420) (killzoner) |
| 88 | +- fix: remove `scheduler_config_spec.toml` as it is unused [#1462](https://github.com/apache/datafusion-ballista/pull/1462) (milenkovicm) |
| 89 | +- feat: Adaptive query execution (AQE) planner fundamentals [#1372](https://github.com/apache/datafusion-ballista/pull/1372) (milenkovicm) |
| 90 | +- feat: Make push scheduling policy default as it has lower latency [#1461](https://github.com/apache/datafusion-ballista/pull/1461) (milenkovicm) |
| 91 | +- minor: improve log statements [#1482](https://github.com/apache/datafusion-ballista/pull/1482) (milenkovicm) |
| 92 | +- chore: update datafusion to 52.2 and other deps to latest [#1483](https://github.com/apache/datafusion-ballista/pull/1483) (milenkovicm) |
| 93 | +- fix: Don't use `maxrows` as a "fetched rows" but calculate it from the batches [#1480](https://github.com/apache/datafusion-ballista/pull/1480) (martin-g) |
| 94 | +- feat: job scheduling with push based job status updates [#1478](https://github.com/apache/datafusion-ballista/pull/1478) (milenkovicm) |
| 95 | + |
22 | 96 | ## [51.0.0](https://github.com/apache/datafusion-ballista/tree/51.0.0) (2026-01-11) |
23 | 97 |
|
24 | 98 | **Implemented enhancements:** |
|
0 commit comments