SedonaDB for Python can be installed from PyPI:
pip install "apache-sedona[db]"To get started, import the library and connect to a new session. You can run SQL queries directly on the session object.
import sedona.db
sd = sedona.db.connect()
sd.sql("SELECT ST_Point(0, 1) as geom").show()Output:
┌────────────┐
│ geom │
│ wkb │
╞════════════╡
│ POINT(0 1) │
└────────────┘A common use case is performing a spatial join. In this example, we'll find the country that each city belongs to by checking if the city's point geometry intersects with a country's polygon geometry.
First, load the cities and countries parquet files from their URLs into SedonaDB DataFrames.
cities_url = "https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_cities_geo.parquet"
countries_url = "https://raw.githubusercontent.com/geoarrow/geoarrow-data/v0.2.0/natural-earth/files/natural-earth_countries_geo.parquet"
cities = sd.read_parquet(cities_url)
countries = sd.read_parquet(countries_url)To query these DataFrames using SQL, they must be registered as temporary views in the session.
cities.to_view("cities")
countries.to_view("countries")Now you can run a SQL query using ST_Intersects to join the two views.
# Join the cities and countries tables
sd.sql("""
SELECT
cities.name AS city,
countries.name AS country,
countries.continent
FROM cities
JOIN countries
WHERE ST_Intersects(cities.geometry, countries.geometry)
""").show()Output:
┌───────────────┬─────────────────────────────┬───────────────┐
│ city ┆ country ┆ continent │
│ utf8view ┆ utf8view ┆ utf8view │
╞═══════════════╪═════════════════════════════╪═══════════════╡
│ Suva ┆ Fiji ┆ Oceania │
├───────────────┼─────────────────────────────┼───────────────┤
│ Dodoma ┆ United Republic of Tanzania ┆ Africa │
├───────────────┼─────────────────────────────┼───────────────┤
│ Dar es Salaam ┆ United Republic of Tanzania ┆ Africa │
├───────────────┼─────────────────────────────┼───────────────┤
│ Bir Lehlou ┆ Western Sahara ┆ Africa │
...
└───────────────┴─────────────────────────────┴───────────────┘
You can also create a SedonaDB DataFrame from scratch using SQL VALUES clauses and geometry functions like ST_GeomFromWkt.
df = sd.sql("""
SELECT * FROM (VALUES
('one', ST_GeomFromWkt('POINT(1 2)')),
('two', ST_GeomFromWkt('POLYGON((-74.0 40.7, -74.0 40.8, -73.9 40.8, -73.9 40.7, -74.0 40.7))')),
('three', ST_GeomFromWkt('LINESTRING(-74.0060 40.7128, -73.9352 40.7306, -73.8561 40.8484)')))
AS t(val, point)
""")
# Verify the object type
type(df)Output:
sedonadb.dataframe.DataFrame
Once created, you can register it as a view and run further spatial operations on it.
df.to_view("fun_table")
sd.sql("SELECT *, ST_Centroid(point) AS centroid FROM fun_table").show()Output:
┌───────┬─────────────────────────────────────────────┬────────────────────────────────────────────┐
│ val ┆ point ┆ centroid │
│ utf8 ┆ wkb ┆ wkb │
╞═══════╪═════════════════════════════════════════════╪════════════════════════════════════════════╡
│ one ┆ POINT(1 2) ┆ POINT(1 2) │
├───────┼─────────────────────────────────────────────┼────────────────────────────────────────────┤
│ two ┆ POLYGON((-74 40.7,-74 40.8,-73.9 40.8,-73.… ┆ POINT(-73.95000000000002 40.75) │
├───────┼─────────────────────────────────────────────┼────────────────────────────────────────────┤
│ three ┆ LINESTRING(-74.006 40.7128,-73.9352 40.730… ┆ POINT(-73.92111155675562 40.7664673976246… │
└───────┴─────────────────────────────────────────────┴────────────────────────────────────────────┘
For notebooks or interactive sessions, you can enable interactive mode. This eagerly prints the results of queries without requiring an explicit .show() call, which is useful for data exploration.
sedona.db.options.interactive = True
sd.sql("SELECT ST_Point(0, 1) as geom")Output:
┌────────────┐
│ geom │
│ wkb │
╞════════════╡
│ POINT(0 1) │
└────────────┘
For non-interactive scripts or when working with very large datasets, it's best to leave this option False to avoid accidentally pulling large amounts of data.