@@ -84,10 +84,10 @@ pub trait TableProvider: Debug + Sync + Send {
8484 None
8585 }
8686
87- /// Create an [`ExecutionPlan`] for scanning the table with optionally
88- /// specified `projection`, `filter` and `limit`, described below.
87+ /// Create an [`ExecutionPlan`] for scanning the table with optional
88+ /// `projection`, `filter`, and `limit`, described below.
8989 ///
90- /// The `ExecutionPlan` is responsible scanning the datasource's
90+ /// The returned `ExecutionPlan` is responsible for scanning the datasource's
9191 /// partitions in a streaming, parallelized fashion.
9292 ///
9393 /// # Projection
@@ -96,33 +96,30 @@ pub trait TableProvider: Debug + Sync + Send {
9696 /// specified. The projection is a set of indexes of the fields in
9797 /// [`Self::schema`].
9898 ///
99- /// DataFusion provides the projection to scan only the columns actually
100- /// used in the query to improve performance, an optimization called
101- /// "Projection Pushdown". Some datasources, such as Parquet, can use this
102- /// information to go significantly faster when only a subset of columns is
103- /// required.
99+ /// DataFusion provides the projection so the scan reads only the columns
100+ /// actually used in the query, an optimization called "Projection
101+ /// Pushdown". Some datasources, such as Parquet, can use this information
102+ /// to go significantly faster when only a subset of columns is required.
104103 ///
105104 /// # Filters
106105 ///
107106 /// A list of boolean filter [`Expr`]s to evaluate *during* the scan, in the
108107 /// manner specified by [`Self::supports_filters_pushdown`]. Only rows for
109- /// which *all* of the `Expr`s evaluate to `true` must be returned (aka the
110- /// expressions are `AND`ed together).
108+ /// which *all* of the `Expr`s evaluate to `true` must be returned (that is,
109+ /// the expressions are `AND`ed together).
111110 ///
112- /// To enable filter pushdown you must override
113- /// [`Self::supports_filters_pushdown`] as the default implementation does
114- /// not and `filters` will be empty.
111+ /// To enable filter pushdown, override
112+ /// [`Self::supports_filters_pushdown`]. The default implementation does not
113+ /// push down filters, and `filters` will be empty.
115114 ///
116- /// DataFusion pushes filtering into the scans whenever possible
117- /// ("Filter Pushdown"), and depending on the format and the
118- /// implementation of the format, evaluating the predicate during the scan
119- /// can increase performance significantly.
115+ /// DataFusion pushes filters into scans whenever possible ("Filter
116+ /// Pushdown"). Depending on the data format and implementation, evaluating
117+ /// predicates during the scan can significantly improve performance.
120118 ///
121119 /// ## Note: Some columns may appear *only* in Filters
122120 ///
123- /// In certain cases, a query may only use a certain column in a Filter that
124- /// has been completely pushed down to the scan. In this case, the
125- /// projection will not contain all the columns found in the filter
121+ /// In some cases, a query may use a column only in a filter and the
122+ /// projection will not contain all columns referenced by the filter
126123 /// expressions.
127124 ///
128125 /// For example, given the query `SELECT t.a FROM t WHERE t.b > 5`,
@@ -154,15 +151,40 @@ pub trait TableProvider: Debug + Sync + Send {
154151 ///
155152 /// # Limit
156153 ///
157- /// If `limit` is specified, must only produce *at least* this many rows,
158- /// (though it may return more). Like Projection Pushdown and Filter
159- /// Pushdown, DataFusion pushes `LIMIT`s as far down in the plan as
160- /// possible, called "Limit Pushdown" as some sources can use this
161- /// information to improve their performance. Note that if there are any
162- /// Inexact filters pushed down, the LIMIT cannot be pushed down. This is
163- /// because inexact filters do not guarantee that every filtered row is
164- /// removed, so applying the limit could lead to too few rows being available
165- /// to return as a final result.
154+ /// If `limit` is specified, the scan must produce *at least* this many
155+ /// rows, though it may return more. Like Projection Pushdown and Filter
156+ /// Pushdown, DataFusion pushes `LIMIT`s as far down in the plan as
157+ /// possible. This is called "Limit Pushdown", and some sources can use the
158+ /// information to improve performance.
159+ ///
160+ /// Note: If any pushed-down filters are `Inexact`, the `LIMIT` cannot be
161+ /// pushed down. Inexact filters do not guarantee that every filtered row is
162+ /// removed, so applying the limit could leave too few rows to return in the
163+ /// final result.
164+ ///
165+ /// # Evaluation Order
166+ ///
167+ /// The logical evaluation order is `filters`, then `limit`, then
168+ /// `projection`.
169+ ///
170+ /// Note that `limit` applies to the filtered result, not to the unfiltered
171+ /// input, and `projection` affects only which columns are returned, not
172+ /// which rows qualify.
173+ ///
174+ /// For example, if a scan receives:
175+ ///
176+ /// - `projection = [a]`
177+ /// - `filters = [b > 5]`
178+ /// - `limit = Some(3)`
179+ ///
180+ /// It must logically produce results equivalent to:
181+ ///
182+ /// ```text
183+ /// PROJECTION a (LIMIT 3 (SCAN WHERE b > 5))
184+ /// ```
185+ ///
186+ /// As noted above, columns referenced only by pushed-down filters may be
187+ /// absent from `projection`.
166188 async fn scan (
167189 & self ,
168190 state : & dyn Session ,
0 commit comments