If you have a custom data source that you want to integrate with DataFusion, you can do so by implementing the TableProvider interface in Rust and then exposing it in Python. To do so, you must use DataFusion 43.0.0 or later and expose a FFI_TableProvider via PyCapsule.
A complete example can be found in the examples folder.
#[pymethods]
impl MyTableProvider {
fn __datafusion_table_provider__<'py>(
&self,
py: Python<'py>,
) -> PyResult<Bound<'py, PyCapsule>> {
let name = CString::new("datafusion_table_provider").unwrap();
let provider = Arc::new(self.clone());
let provider = FFI_TableProvider::new(provider, false, None);
PyCapsule::new_bound(py, provider, Some(name.clone()))
}
}Once you have this library available, you can construct a
:py:class:`~datafusion.Table` in Python and register it with the
SessionContext. Tables can be created either from the PyCapsule exposed by your
Rust provider or from an existing :py:class:`~datafusion.dataframe.DataFrame`.
Call the provider's __datafusion_table_provider__() method to obtain the capsule
before constructing a Table. The Table.from_view() helper is
deprecated; instead use Table.from_dataframe() or DataFrame.into_view().
Note
:py:meth:`~datafusion.context.SessionContext.register_table_provider` is deprecated. Use :py:meth:`~datafusion.context.SessionContext.register_table` with the resulting :py:class:`~datafusion.Table` instead.
from datafusion import SessionContext, Table
ctx = SessionContext()
provider = MyTableProvider()
capsule = provider.__datafusion_table_provider__()
capsule_table = Table.from_capsule(capsule)
df = ctx.from_pydict({"a": [1]})
view_table = Table.from_dataframe(df)
# or: view_table = df.into_view()
ctx.register_table("capsule_table", capsule_table)
ctx.register_table("view_table", view_table)
ctx.table("capsule_table").show()
ctx.table("view_table").show()Both Table.from_capsule() and Table.from_dataframe() create
table providers that can be registered with the SessionContext using register_table().