Skip to content

Commit 38315c5

Browse files
adriangbclaude
andcommitted
register TABLESAMPLE SYSTEM RelationPlanner by default
Out-of-the-box TABLESAMPLE SYSTEM support, achieved through the RelationPlanner extension API (#17843) instead of baking SQL parsing into the in-tree SQL planner. - New `TableSampleSystemPlanner` (RelationPlanner) in datafusion-sql, auto-registered via `SessionStateDefaults::default_relation_planners` and `SessionStateBuilder::with_default_features`. Lifts `TABLESAMPLE SYSTEM(p%) [REPEATABLE(n)]` to the core `Sample` extension node; rejects every other form at planning time with a clear error and a pointer to RelationPlanner for downstream semantics. `register_relation_planner` inserts at the front of the chain, so a user-supplied planner can override built-in semantics without touching core. - `DefaultPhysicalPlanner::default()` now pre-registers `SamplePhysicalPlanner` so the core `Sample` node lowers to `SampleExec` automatically. Callers using `with_extension_planners(...)` replace the defaults and need to re-add `SamplePhysicalPlanner` if they want sampling. - Restored the sqllogictest fixture (`datafusion/sqllogictest/test_files/tablesample.slt`) — TABLESAMPLE now works with the default `SessionContext` that the harness uses. Drops the Rust integration test that existed only to register the planner manually. - Drops the redundant `relation_planner/table_sample_system.rs` example. The remaining `table_sample.rs` example still demonstrates registering a custom RelationPlanner ahead of the built-in one for BERNOULLI / ROW / BUCKET semantics — note added in main.rs. - 7 new ParquetSource unit tests around `try_push_sample` directly (cube-root math, file-drop bounds, REPEATABLE determinism, single- file short-circuit, target clamping). These exercise the `FileSource` API without a SQL or SessionContext layer. Net effect: `datafusion-cli` users get TABLESAMPLE SYSTEM(p%) out of the box; downstream extensions can still take over by registering their own `RelationPlanner` first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 88326d4 commit 38315c5

10 files changed

Lines changed: 412 additions & 322 deletions

File tree

datafusion-examples/examples/relation_planner/main.rs

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,14 @@
3535
//! (file: pivot_unpivot.rs, desc: Implement PIVOT / UNPIVOT)
3636
//!
3737
//! - `table_sample`
38-
//! (file: table_sample.rs, desc: Implement TABLESAMPLE)
38+
//! (file: table_sample.rs, desc: Implement TABLESAMPLE BERNOULLI / ROW / BUCKET via per-batch sampling)
39+
//!
40+
//! Note: `TABLESAMPLE SYSTEM(p%)` is supported out of the box by the
41+
//! built-in `datafusion_sql::sample::TableSampleSystemPlanner`, which is
42+
//! auto-registered on a default `SessionContext`. The `table_sample`
43+
//! example below shows how to register a *different* planner for the
44+
//! row-level forms (`BERNOULLI`, `ROW`, `BUCKET`) that the built-in
45+
//! intentionally does not handle.
3946
//!
4047
//! ## Snapshot Testing
4148
//!

datafusion/core/src/execution/session_state.rs

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1175,6 +1175,10 @@ impl SessionStateBuilder {
11751175
.get_or_insert_with(Vec::new)
11761176
.extend(SessionStateDefaults::default_expr_planners());
11771177

1178+
self.relation_planners
1179+
.get_or_insert_with(Vec::new)
1180+
.extend(SessionStateDefaults::default_relation_planners());
1181+
11781182
self.scalar_functions
11791183
.get_or_insert_with(Vec::new)
11801184
.extend(SessionStateDefaults::default_scalar_functions());

datafusion/core/src/execution/session_state_defaults.rs

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ use datafusion_catalog::{MemoryCatalogProvider, MemorySchemaProvider};
3535
use datafusion_execution::config::SessionConfig;
3636
use datafusion_execution::object_store::ObjectStoreUrl;
3737
use datafusion_execution::runtime_env::RuntimeEnv;
38-
use datafusion_expr::planner::ExprPlanner;
38+
use datafusion_expr::planner::{ExprPlanner, RelationPlanner};
3939
use datafusion_expr::registry::ExtensionTypeRegistrationRef;
4040
use datafusion_expr::{AggregateUDF, HigherOrderUDF, ScalarUDF, WindowUDF};
4141
use std::collections::HashMap;
@@ -82,6 +82,18 @@ impl SessionStateDefaults {
8282
default_catalog
8383
}
8484

85+
/// Returns the list of default [`RelationPlanner`]s installed by
86+
/// [`Self::default_relation_planners`]. Currently this is just the
87+
/// built-in `TableSampleSystemPlanner`, which lifts
88+
/// `TABLESAMPLE SYSTEM(p%) [REPEATABLE(n)]` into the core `Sample`
89+
/// extension node so the `SamplePushdown` rule can absorb it into
90+
/// the scan. Other `TABLESAMPLE` flavors are rejected at planning
91+
/// time — register a `RelationPlanner` ahead of this one to add
92+
/// custom semantics.
93+
pub fn default_relation_planners() -> Vec<Arc<dyn RelationPlanner>> {
94+
vec![Arc::new(datafusion_sql::sample::TableSampleSystemPlanner)]
95+
}
96+
8597
/// returns the list of default [`ExprPlanner`]s
8698
pub fn default_expr_planners() -> Vec<Arc<dyn ExprPlanner>> {
8799
let expr_planners: Vec<Arc<dyn ExprPlanner>> = vec![

datafusion/core/src/physical_planner.rs

Lines changed: 17 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -258,10 +258,12 @@ pub trait ExtensionPlanner {
258258
/// ]);
259259
/// ```
260260
///
261-
/// `Sample` is intentionally not built into [`DefaultPhysicalPlanner`]
262-
/// by default: that would force callers who use a different SQL
263-
/// surface (or who don't want any TABLESAMPLE story) to carry the
264-
/// physical glue. The planner is small enough to opt into.
261+
/// `SamplePhysicalPlanner` is registered automatically in
262+
/// [`DefaultPhysicalPlanner::default`] so that `TABLESAMPLE SYSTEM`
263+
/// works out of the box on a default `SessionContext`. Callers who
264+
/// supply their own list via [`DefaultPhysicalPlanner::with_extension_planners`]
265+
/// **replace** the defaults — re-add `SamplePhysicalPlanner` to the
266+
/// front of their list if they want sampling support.
265267
///
266268
/// [`Sample`]: datafusion_expr::logical_plan::sample::Sample
267269
/// [`SampleExec`]: datafusion_physical_plan::sample::SampleExec
@@ -326,11 +328,21 @@ impl ExtensionPlanner for SamplePhysicalPlanner {
326328
/// execute concurrently.
327329
///
328330
/// [`planning_concurrency`]: crate::config::ExecutionOptions::planning_concurrency
329-
#[derive(Default)]
330331
pub struct DefaultPhysicalPlanner {
331332
extension_planners: Vec<Arc<dyn ExtensionPlanner + Send + Sync>>,
332333
}
333334

335+
impl Default for DefaultPhysicalPlanner {
336+
/// Constructs a planner with [`SamplePhysicalPlanner`] pre-registered
337+
/// so the core `Sample` extension node lowers without any extra
338+
/// wiring on a default `SessionContext`.
339+
fn default() -> Self {
340+
Self {
341+
extension_planners: vec![Arc::new(SamplePhysicalPlanner)],
342+
}
343+
}
344+
}
345+
334346
#[async_trait]
335347
impl PhysicalPlanner for DefaultPhysicalPlanner {
336348
/// Create a physical plan from a logical plan

datafusion/core/tests/parquet/mod.rs

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,6 @@ mod page_pruning;
5757
mod row_group_pruning;
5858
mod schema;
5959
mod schema_coercion;
60-
mod tablesample;
6160
mod utils;
6261

6362
#[cfg(test)]

0 commit comments

Comments
 (0)