Skip to content

Latest commit

 

History

History
85 lines (63 loc) · 3.32 KB

File metadata and controls

85 lines (63 loc) · 3.32 KB

Custom Table Provider

If you have a custom data source that you want to integrate with DataFusion, you can do so by implementing the TableProvider interface in Rust and then exposing it in Python. To do so, you must use DataFusion 43.0.0 or later and expose a FFI_TableProvider via PyCapsule.

A complete example can be found in the examples folder.

#[pymethods]
impl MyTableProvider {

    fn __datafusion_table_provider__<'py>(
        &self,
        py: Python<'py>,
    ) -> PyResult<Bound<'py, PyCapsule>> {
        let name = CString::new("datafusion_table_provider").unwrap();

        let provider = Arc::new(self.clone());
        let provider = FFI_TableProvider::new(provider, false, None);

        PyCapsule::new_bound(py, provider, Some(name.clone()))
    }
}

Once you have this library available, you can construct a :py:class:`~datafusion.Table` in Python and register it with the SessionContext. Tables can be created either from the PyCapsule exposed by your Rust provider or from an existing :py:class:`~datafusion.dataframe.DataFrame`. Call the provider's __datafusion_table_provider__() method to obtain the capsule before constructing a Table. The Table.from_view() helper is deprecated; instead use Table.from_dataframe() or DataFrame.into_view().

from datafusion import SessionContext, Table

ctx = SessionContext()
provider = MyTableProvider()

capsule = provider.__datafusion_table_provider__()
capsule_table = Table.from_capsule(capsule)

df = ctx.from_pydict({"a": [1]})
view_table = Table.from_dataframe(df)
# or: view_table = df.into_view()

ctx.register_table("capsule_table", capsule_table)
ctx.register_table("view_table", view_table)

ctx.table("capsule_table").show()
ctx.table("view_table").show()

Both Table.from_capsule() and Table.from_dataframe() create table providers that can be registered with the SessionContext using register_table().