-
Notifications
You must be signed in to change notification settings - Fork 16
feat: Add unique rule to dy.Column
#325
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -39,7 +39,14 @@ def _build_rules( | |
| # Add primary key validation to the list of rules if applicable | ||
| primary_key = _primary_key(columns) | ||
| if len(primary_key) > 0: | ||
| rules["primary_key"] = Rule(~pl.struct(primary_key).is_duplicated()) | ||
| rules["primary_key"] = Rule(pl.struct(primary_key).is_unique()) | ||
|
|
||
| # Add unique column validation rules | ||
| unique_columns = _unique_columns(columns) | ||
| for col_name in unique_columns: | ||
| # wrap the column in a struct to make `is_unique` work with list/arrays | ||
|
gab23r marked this conversation as resolved.
|
||
| # https://github.com/pola-rs/polars/issues/27286 | ||
| rules[f"{col_name}|unique"] = Rule(pl.struct(col_name).is_unique()) | ||
|
Comment on lines
+44
to
+49
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry I didn't review previously but I find this implementation suboptimal. Why is it on the schema if we do not check composite uniqueness but uniqueness of individual columns? This should be on the column which would also allow for much more efficient evaluation of
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Besides, this also breaks for nested types; for example setting |
||
|
|
||
| # Add column-specific rules | ||
| column_rules = { | ||
|
|
@@ -71,6 +78,10 @@ def _primary_key(columns: dict[str, Column]) -> list[str]: | |
| return list(k for k, col in columns.items() if col.primary_key) | ||
|
|
||
|
|
||
| def _unique_columns(columns: dict[str, Column]) -> list[str]: | ||
| return list(k for k, col in columns.items() if col.unique) | ||
|
|
||
|
|
||
| # ------------------------------------------------------------------------------------ # | ||
| # SCHEMA META # | ||
| # ------------------------------------------------------------------------------------ # | ||
|
|
@@ -300,6 +311,11 @@ def primary_key(cls) -> list[str]: | |
| """The primary key columns in this schema (possibly empty).""" | ||
| return _primary_key(cls.columns()) | ||
|
|
||
| @classmethod | ||
| def unique_columns(cls) -> list[str]: | ||
| """The columns with unique constraints in this schema (possibly empty).""" | ||
| return _unique_columns(cls.columns()) | ||
|
|
||
| @classmethod | ||
| def _validation_rules(cls, *, with_cast: bool) -> dict[str, Rule]: | ||
| return _build_rules( | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meta-comment because I just noticed:
pl.struct(primary_key).is_unique()is likely much more inefficient thanpl.col(primary_key).is_unique()if we only have a single primary key. We might want to introduce an optimization for this after benchmarking 😄