Skip to content

Latest commit

 

History

History

README.md

DataFrames

A Chalk DataFrame is a 2-dimensional data structure similar to pandas.Dataframe, but with richer types and underlying optimizations.

https://docs.chalk.ai/docs/dataframe

1. Creating DataFrames

Describe features at a feature class or feature level.

1_creating_dataframes.py

df = DataFrame()
DataFrame.from_dict({
    User.id: [1, 2],
    User.email: ["elliot@chalk.ai", "samantha@chalk.ai"],
})

https://docs.chalk.ai/docs/dataframe

2. Filters

Filter the rows of a DataFrame by supplying conditions to the __getitem__() method.

2_filtering.py

User.txns[
    Transaction.amount < 0,
    Transaction.merchant in {"uber", "lyft"} or Transaction.memo == "uberpmts",
    Transaction.canceled_at is None
]

https://docs.chalk.ai/docs/dataframe

3. Projections

Scope down the set of rows available in a DataFrame.

3_projections.py

User.txns[
    Transaction.amount,
    Transaction.memo
]

https://docs.chalk.ai/docs/dataframe

4. Projections with Filters

Compose projections and filters to create a new DataFrame.

4_filters_and_projections.py

User.transactions[Transaction.amount > 100, Transaction.memo]

https://docs.chalk.ai/docs/dataframe#composing-projections-and-filters

5. Aggregations

Compute aggregates over a DataFrame.

5_aggregations.py

User.transactions[Transaction.amount].sum()
User.transactions[Transaction.amount].mean()
User.transactions[Transaction.amount].count()
User.transactions[Transaction.amount].max()

https://docs.chalk.ai/docs/dataframe#aggregations

6. Self Joins

Join a feature set back to itself.

6_self_joins.py

@features
class PrequelLink:
    id: int
    prequel_id: int
    book: "Book" = has_one(lambda: Book.id == PrequelLink.prequel_id)


@features
class Book:
    id: int
    title: str
    author_id: Author.id
    prequel_id: PrequelLink.id | None
    prequel: PrequelLink | None = has_one(lambda: Book.id == PrequelLink.prequel_id)
    series_id: SeriesLink.id | None
    series: SeriesLink = has_one(lambda: SeriesLink.id == Book.series_id)