Skip to content

Latest commit

 

History

History
106 lines (78 loc) · 3.23 KB

File metadata and controls

106 lines (78 loc) · 3.23 KB

DataFusion in Python

This is a Python library that binds to Apache Arrow in-memory query engine DataFusion.

Like pyspark, it allows you to build a plan through SQL or a DataFrame API against in-memory data, parquet or CSV files, run it in a multi-threaded environment, and obtain the result back in Python.

It also allows you to use UDFs and UDAFs for complex operations.

The major advantage of this library over other execution engines is that this library achieves zero-copy between Python and its execution engine: there is no cost in using UDFs, UDAFs, and collecting the results to Python apart from having to lock the GIL when running those operations.

Its query engine, DataFusion, is written in Rust, which makes strong assumptions about thread safety and lack of memory leaks.

Technically, zero-copy is achieved via the c data interface.

Install

pip install datafusion

Example

.. ipython:: python

    import datafusion
    from datafusion import col
    import pyarrow

    # create a context
    ctx = datafusion.SessionContext()

    # create a RecordBatch and a new DataFrame from it
    batch = pyarrow.RecordBatch.from_arrays(
        [pyarrow.array([1, 2, 3]), pyarrow.array([4, 5, 6])],
        names=["a", "b"],
    )
    df = ctx.create_dataframe([[batch]], name="batch_array")

    # create a new statement
    df = df.select(
        col("a") + col("b"),
        col("a") - col("b"),
    )

    df


.. toctree::
   :hidden:
   :maxdepth: 1
   :caption: LINKS

   Github and Issue Tracker <https://github.com/apache/datafusion-python>
   Rust's API Docs <https://docs.rs/datafusion/latest/datafusion/>
   Code of conduct <https://github.com/apache/datafusion/blob/main/CODE_OF_CONDUCT.md>
   Examples <https://github.com/apache/datafusion-python/tree/main/examples>

.. toctree::
   :hidden:
   :maxdepth: 1
   :caption: USER GUIDE

   user-guide/introduction
   user-guide/basics
   user-guide/configuration
   user-guide/common-operations/index
   user-guide/io/index
   user-guide/sql


.. toctree::
   :hidden:
   :maxdepth: 1
   :caption: CONTRIBUTOR GUIDE

   contributor-guide/introduction

.. toctree::
   :hidden:
   :maxdepth: 1
   :caption: API