Skip to content

Commit e46be6b

Browse files
adriangbclaude
andcommitted
feat: add TableSchemaBuilder; store partition cols as Fields
Introduce `TableSchemaBuilder` as the preferred way to construct a `TableSchema`. The file schema is the only required input; partition columns are optional, and the concatenated table schema is computed exactly once in `build()` (rather than being recomputed on every incremental setter call). `TableSchema` now stores its partition columns as `arrow::datatypes::Fields` (an immutable `Arc<[FieldRef]>`) instead of `Arc<Vec<FieldRef>>`: the idiomatic Arrow field-list type, a single `Arc<[FieldRef]>` (one fewer indirection), shareable zero-copy with an existing schema, and -- being immutable -- it makes the shared-`Arc` mutation panic that motivated recent changes structurally impossible. `TableSchemaBuilder::with_table_partition_cols` takes `impl Into<Fields>`, accepting an existing schema's `Fields` without a `Vec` round-trip. `TableSchema::table_partition_cols()` (and the delegating `FileScanConfig::table_partition_cols()`) now return `&Fields`. `Fields` derefs to `&[FieldRef]`, so iteration/indexing/`len`/`is_empty` callers are unchanged; only the arrow `FileFormat` path needed `.to_vec()`. The mutating `TableSchema::with_table_partition_cols` setter is deprecated in favor of the builder; `new`/`from_file_schema` are kept as conveniences that route through the builder. Documented in the 55.0.0 upgrade guide. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent c20f245 commit e46be6b

7 files changed

Lines changed: 248 additions & 112 deletions

File tree

datafusion/datasource-arrow/src/file_format.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -199,7 +199,7 @@ impl FileFormat for ArrowFormat {
199199

200200
let table_schema = TableSchema::new(
201201
Arc::clone(conf.file_schema()),
202-
conf.table_partition_cols().clone(),
202+
conf.table_partition_cols().to_vec(),
203203
);
204204

205205
let mut source: Arc<dyn FileSource> =

datafusion/datasource/src/file_scan_config/mod.rs

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ use crate::{
2727
file_stream::work_source::SharedWorkSource, source::DataSource,
2828
statistics::MinMaxStatistics,
2929
};
30-
use arrow::datatypes::FieldRef;
30+
use arrow::datatypes::Fields;
3131
use arrow::datatypes::{DataType, Schema, SchemaRef};
3232
use datafusion_common::config::ConfigOptions;
3333
use datafusion_common::{
@@ -1056,7 +1056,7 @@ impl FileScanConfig {
10561056
}
10571057

10581058
/// Get the table partition columns
1059-
pub fn table_partition_cols(&self) -> &Vec<FieldRef> {
1059+
pub fn table_partition_cols(&self) -> &Fields {
10601060
self.file_source.table_schema().table_partition_cols()
10611061
}
10621062

@@ -2053,7 +2053,10 @@ mod tests {
20532053
Some(vec![0, 2])
20542054
);
20552055
assert_eq!(new_config.limit, Some(10));
2056-
assert_eq!(*new_config.table_partition_cols(), partition_cols);
2056+
assert_eq!(
2057+
*new_config.table_partition_cols(),
2058+
Fields::from(partition_cols)
2059+
);
20572060
assert_eq!(new_config.file_groups.len(), 1);
20582061
assert_eq!(new_config.file_groups[0].len(), 1);
20592062
assert_eq!(

datafusion/datasource/src/mod.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ use datafusion_physical_expr::LexOrdering;
6262
use futures::{Stream, StreamExt};
6363
use object_store::{GetOptions, GetRange, ObjectStore};
6464
use object_store::{ObjectMeta, path::Path};
65-
pub use table_schema::TableSchema;
65+
pub use table_schema::{TableSchema, TableSchemaBuilder};
6666
// Remove when add_row_stats is remove
6767
#[expect(deprecated)]
6868
pub use statistics::add_row_stats;

0 commit comments

Comments
 (0)