@@ -7,36 +7,38 @@ use std::sync::Arc;
77/// A cheaply cloneable pointer to a [`DFExtensionType`].
88pub type DFExtensionTypeRef = Arc < dyn DFExtensionType > ;
99
10- /// Represents an implementation of a DataFusion extension type, allowing users to customize the
11- /// behavior of DataFusion for custom extension types.
12- ///
13- /// Extension types may change the semantics of a column. For example, adding two values of
14- /// [`DataType::Int64`] is a sensible thing to do. However, if the same data type is annotated with
15- /// an extension type like `custom.id`, the correct interpretation of a column changes. For example,
16- /// adding together two `custom.id` values (represented as a 64-bit integer) may no longer make
17- /// sense.
18- ///
19- /// Note that while helping users to navigate the semantic gap between the data type and extension
20- /// types is a goal of this trait, DataFusion's extension type support is still evolving and does
21- /// not cover all use cases. Currently, the following capabilities can be customized:
10+ /// Represents an implementation of a DataFusion extension type.
11+ ///
12+ /// This allows users to customize the behavior of DataFusion for certain types. Having this ability
13+ /// is necessary because extension types affect how columns should be treated by the query engine.
14+ /// This effect includes which operations are possible on a column and what are the expected results
15+ /// from these operations. The extension type mechanism allows users to define how these operations
16+ /// apply to a particular extension type.
17+ ///
18+ /// For example, adding two values of [`DataType::Int64`] is a sensible thing to do. However, if the
19+ /// same column is annotated with an extension type like `custom.id`, the correct interpretation of
20+ /// a column changes. Adding together two `custom.id` values, even though they are stored as
21+ /// integers, may no longer make sense.
22+ ///
23+ /// Note that DataFusion's extension type support is still young and therefore might not cover all
24+ /// relevant use cases. Currently, the following operations can be customized:
2225/// - Pretty-printing values in record batches
2326///
2427/// # Relation to Arrow's `ExtensionType`
2528///
26- /// The purpose of Arrow's `ExtensionType` trait, for the time being, is to provide a way to handle
27- /// metadata of an extension type in a type-safe manner. The trait does not provide any
28- /// customization options such that users can customize the behavior of any kernels (e.g.,
29- /// [`DFExtensionType::create_array_formatter`] for formatting record batches). Therefore,
30- /// downstream users (such as DataFusion) have the flexibility to implement the extension type
31- /// mechanism according to their needs. [`DFExtensionType`] is DataFusion's implementation of this
32- /// extension type mechanism.
29+ /// The purpose of Arrow's `ExtensionType` trait, for the time being, is to allow reading and
30+ /// writing extension type metadata in a type-safe manner. The trait does not provide any
31+ /// customization options. Therefore, downstream users (such as DataFusion) have the flexibility to
32+ /// implement the extension type mechanism according to their needs. [`DFExtensionType`] is
33+ /// DataFusion's implementation of this extension type mechanism.
3334///
34- /// Furthermore, Arrow's current trait is not dyn-compatible which we need for implementing
35+ /// Furthermore, the current trait in arrow-rs is not dyn-compatible, which we need for implementing
3536/// extension type registries. In the future, the two implementations may increasingly converge.
3637///
37- /// # Example
38- ///
38+ /// # Examples
3939///
40+ /// Examples for using the extension type machinery can be found in the DataFusion examples
41+ /// directory.
4042pub trait DFExtensionType : Debug + Send + Sync {
4143 /// Returns an [`ArrayFormatter`] that can format values of this type.
4244 ///
0 commit comments