fix: Make `unique` work for nested types and improve performance by borchero · Pull Request #333 · Quantco/dataframely

Oliver Borchert (borchero) · 2026-04-21T23:03:24Z

Motivation

Changes

Move unique to the column definitions as it is column- rather than schema-business
Remove the unique_columns classmethod with the same argument
Use pl.col("...").is_unique() by default instead of wrapping in a struct for improved performance
Add tests for nested uniqueness and uniqueness of complex types

codecov · 2026-04-21T23:06:30Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (7126c1e) to head (518dab2).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff            @@
##              main      #333   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           56        56           
  Lines         3408      3406    -2     
=========================================
- Hits          3408      3406    -2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copilot

Pull request overview

This PR refactors how unique constraints are expressed and validated by moving uniqueness checks into per-column validation rules, enabling unique to work for nested column types (e.g., List/Array inner types) and improving performance for scalar columns.

Changes:

Implement unique as part of Column.validation_rules() (and remove schema-level unique_columns() plumbing).
Use expr.is_unique() for most columns, with a Polars workaround for List/Array (pl.struct(expr).is_unique()).
Add/adjust tests for unique on List/Array columns and their inner types; remove the test for the removed unique_columns() method.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
tests/schema/test_validate.py	Removes test coverage for deleted `unique_columns()` API.
tests/column_types/test_list.py	Adds `unique` and `inner_unique` tests for `List`.
tests/column_types/test_array.py	Adds `unique` and `inner_unique` tests for `Array`.
dataframely/columns/_base.py	Adds `unique` rule to base column validation rules.
dataframely/_base_schema.py	Removes schema-level unique column rule construction and `unique_columns()` method.
dataframely/columns/list.py	Implements `unique` for list columns using struct wrapper workaround.
dataframely/columns/array.py	Implements `unique` for array columns using struct wrapper workaround.
dataframely/columns/struct.py	Updates `unique` parameter documentation.
dataframely/columns/string.py	Updates `unique` parameter documentation.
dataframely/columns/integer.py	Updates `unique` parameter documentation.
dataframely/columns/float.py	Updates `unique` parameter documentation.
dataframely/columns/enum.py	Updates `unique` parameter documentation.
dataframely/columns/decimal.py	Updates `unique` parameter documentation.
dataframely/columns/datetime.py	Updates `unique` parameter documentation.
dataframely/columns/categorical.py	Updates `unique` parameter documentation.

gab23r · 2026-04-22T08:33:43Z

Indeed much cleaner than my implementation !
Just for fun. Wapping in_unique in pl.struct is actually faster on micro benchmark 😄

import dataframely as dy
import polars as pl


class Schema(dy.Schema):
    a = dy.Int64()
df = Schema.sample(1000000)

%timeit _ = df.select(pl.struct("a").is_unique())
# 8.08 ms ± 495 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%timeit _ = df.select(pl.col("a").is_unique())
# 28.6 ms ± 1.04 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

This is because pl.struct("a").is_unique() dispatch to df.select("a").unnest().is_unique() which is exectuted in parallel. Which is not the case for Expr.is_unique.

Oliver Borchert (borchero) · 2026-04-22T11:11:50Z

Thanks for the benchmark gab23r, this is really surprising 👀 but good to know that this behavior changed since I last benchmarked this >1y ago 😅

fix: Make unique work for nested types and improve performance

51c0d9b

Oliver Borchert (borchero) self-assigned this Apr 21, 2026

Copilot AI review requested due to automatic review settings April 21, 2026 23:03

Oliver Borchert (borchero) requested review from Andreas Albert (AndreasAlbertQC) and Daniel Elsner (delsner) as code owners April 21, 2026 23:03

github-actions Bot added the fix label Apr 21, 2026

Copilot started reviewing on behalf of Oliver Borchert (borchero) April 21, 2026 23:03 View session

Fix docs

518dab2

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Andreas Albert (AndreasAlbertQC) approved these changes Apr 22, 2026

View reviewed changes

Oliver Borchert (borchero) merged commit b993da6 into main Apr 22, 2026
35 of 36 checks passed

Oliver Borchert (borchero) deleted the refactor-unique branch April 22, 2026 08:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Make `unique` work for nested types and improve performance#333

fix: Make `unique` work for nested types and improve performance#333
Oliver Borchert (borchero) merged 2 commits intomainfrom
refactor-unique

Oliver Borchert (borchero) commented Apr 21, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

gab23r commented Apr 22, 2026

Uh oh!

Oliver Borchert (borchero) commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Oliver Borchert (borchero) commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Changes

Uh oh!

codecov Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

gab23r commented Apr 22, 2026

Uh oh!

Oliver Borchert (borchero) commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Oliver Borchert (borchero) commented Apr 21, 2026 •

edited

Loading

codecov Bot commented Apr 21, 2026 •

edited

Loading