DataFusion also offers a SQL API, read the full reference here
.. ipython:: python
import datafusion
from datafusion import col
import pyarrow
# create a context
ctx = datafusion.SessionContext()
# register a CSV
ctx.register_csv('pokemon', 'pokemon.csv')
# create a new statement via SQL
df = ctx.sql('SELECT "Attack"+"Defense", "Attack"-"Defense" FROM pokemon')
# collect and convert to pandas DataFrame
df.to_pandas()
You can opt-in to DataFusion automatically registering Arrow-compatible Python
objects that appear in SQL queries. This removes the need to call
register_* helpers explicitly when working with in-memory data structures.
import pyarrow as pa
from datafusion import SessionContext
ctx = SessionContext(auto_register_python_objects=True)
orders = pa.Table.from_pydict({"item": ["apple", "pear"], "qty": [5, 2]})
result = ctx.sql("SELECT item, qty FROM orders WHERE qty > 2")
print(result.to_pandas())The feature inspects the call stack for variables whose names match missing
tables and registers them if they expose Arrow data (including pandas and
Polars DataFrames). Existing contexts can enable or disable the behavior at
runtime through :py:meth:`SessionContext.set_python_table_lookup` or by passing
auto_register_python_objects when constructing the session.