Skip to content

refactor: parameterize uuid generation#6961

Open
everettVT wants to merge 7 commits into
mainfrom
everettVT/uuid-param-generator
Open

refactor: parameterize uuid generation#6961
everettVT wants to merge 7 commits into
mainfrom
everettVT/uuid-param-generator

Conversation

@everettVT
Copy link
Copy Markdown
Contributor

Summary

  • Route UUIDv4 and UUIDv7 generation through one registered built-in scalar function, uuid, parameterized by a version literal.
  • Replace the separate internal uuidv7 registration with a shared UuidVersion dispatch path backed by the uuid crate.
  • Add UUID byte-layout correctness coverage for v4 and v7, including v7 timestamp/version/variant/random component parsing.
  • Add a docstring usage pattern showing how to cast UUIDv7 values to binary and slice out the RFC fields for validation.

Validation

  • cargo fmt --all --check
  • .venv/bin/ruff check daft/functions/misc.py tests/dataframe/test_uuid.py
  • .venv/bin/ruff format --check daft/functions/misc.py tests/dataframe/test_uuid.py
  • cargo test -p daft-functions uuid --lib
  • DAFT_RUNNER=native make test EXTRA_ARGS="-v tests/dataframe/test_uuid.py"
  • make doctests
  • git diff --check origin/main...HEAD

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 19, 2026

Rust Dependency Diff

Head: b054271703fd64725ed63ad340d2977224254120 vs Base: b63d9f81079d03fcb78b5f6c0f8b2e7a592227f5.

OK: Within budget.

  • New Crates: 0
  • Removed Crates: 0

@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

❌ Patch coverage is 95.04950% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.68%. Comparing base (b63d9f8) to head (d42823b).
⚠️ Report is 9 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-functions/src/uuid.rs 94.94% 5 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##             main    #6961    +/-   ##
========================================
  Coverage   75.68%   75.68%            
========================================
  Files        1146     1146            
  Lines      161232   161391   +159     
========================================
+ Hits       122032   122154   +122     
- Misses      39200    39237    +37     
Files with missing lines Coverage Δ
daft/functions/misc.py 93.40% <100.00%> (ø)
src/daft-dsl/src/expr/mod.rs 76.15% <100.00%> (ø)
src/daft-functions/src/lib.rs 62.50% <ø> (-2.21%) ⬇️
src/daft-functions/src/uuid.rs 93.61% <94.94%> (+21.70%) ⬆️

... and 37 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@everettVT everettVT marked this pull request as ready for review May 20, 2026 20:38
@everettVT everettVT requested a review from colin-ho May 20, 2026 20:38
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 20, 2026

Greptile Summary

This PR consolidates UUIDv4 and UUIDv7 generation into a single uuid built-in scalar function parameterized by a UuidVersion enum, removing the separate UuidV7 registration and the \"uuidv7\" function name entirely.

  • Rust refactor: Replaces Uuid and UuidV7 structs with a unified Uuid ScalarUDF; a new UuidVersion enum drives dispatch to RustUuid::new_v4() or RustUuid::now_v7() inside a single uuid_kernel.
  • Python layer: uuid(version=\"v7\") now routes through the shared \"uuid\" builtin; v4 still omits the arg and relies on the Rust default.
  • Tests: New byte-layout assertions cover the v4 variant bit, UUIDv7 field decomposition, and uniqueness of (timestamp, rand_a, rand_b) tuples.

Confidence Score: 4/5

Safe to merge; all changes are additive or internal consolidation with no effect on public output types.

The refactor is clean and the new tests are thorough. The only loose ends are a dead import sys in series.py and the asymmetric v4/v7 call paths in misc.py — neither affects correctness or output. The uuidv7 function name and its uuid_v7 alias are no longer registered, so any serialized query plans referencing those names would fail to deserialize.

daft/series.py (unused import), daft/functions/misc.py (asymmetric dispatch paths)

Important Files Changed

Filename Overview
src/daft-functions/src/uuid.rs Core refactor: replaces separate Uuid/UuidV7 structs with a unified Uuid ScalarUDF parameterized by a UuidVersion enum. Logic is clean; test name for version/variant bit check is slightly misleading after rename.
daft/functions/misc.py Python dispatch layer updated; v4 omits the version arg while v7 passes it explicitly — asymmetric but functionally correct.
daft/series.py Only change is an added import sys that is never used anywhere in the file.
src/daft-dsl/src/expr/mod.rs Removes the "uuidv7" special-case from expression naming. Straightforward cleanup.
src/daft-functions/src/lib.rs Removes UuidV7 registration; only the unified Uuid is registered.
tests/dataframe/test_uuid.py Comprehensive new byte-layout tests for v4 and v7 with no issues.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["Python: uuid(version='v4'|'v7')"] --> B{version?}
    B -->|v4| C["_call_builtin_scalar_fn('uuid')"]
    B -->|v7| D["_call_builtin_scalar_fn('uuid', 'v7')"]
    C --> E["Rust: Uuid ScalarUDF"]
    D --> E
    E --> F["UuidArgs::try_into"]
    F --> G{UuidVersion}
    G -->|V4 default| H["RustUuid::new_v4()"]
    G -->|V7| I["RustUuid::now_v7()"]
    H --> J["uuid_kernel"]
    I --> J
    J --> K["UuidArray Series"]
Loading

Comments Outside Diff (1)

  1. src/daft-functions/src/uuid.rs, line 163-172 (link)

    P2 Test name implies broader coverage than it provides

    uuid_kernel_sets_version_and_variant_bits still only asserts v7 bits (0x7 / 0b10). The new uuid_kernel_generates_requested_version covers both versions, but the name of this test now suggests it validates both — a reader would expect to see a v4 assertion here as well. Consider renaming it uuid_v7_kernel_sets_version_and_variant_bits to stay accurate, or extend it to cover the v4 variant byte (bytes[6] >> 4 == 0x4).

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Reviews (1): Last reviewed commit: "Merge branch 'main' into everettVT/uuid-..." | Re-trigger Greptile

Comment thread daft/series.py Outdated
Comment on lines 3 to 4
import sys
from typing import TYPE_CHECKING, Any, Literal, TypeVar
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The sys module is imported here but never referenced anywhere in series.py. This appears to be an accidental leftover from development — no sys. call exists in the file.

Suggested change
import sys
from typing import TYPE_CHECKING, Any, Literal, TypeVar
from typing import TYPE_CHECKING, Any, Literal, TypeVar

Comment thread daft/functions/misc.py
Comment on lines 95 to 99
if version == "v4":
return Expression._call_builtin_scalar_fn("uuid")
if version == "v7":
return Expression._call_builtin_scalar_fn("uuidv7")
return Expression._call_builtin_scalar_fn("uuid", version)
raise ValueError("`version` must be 'v4' or 'v7'")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The v4 and v7 dispatch paths are asymmetric: v4 omits the version argument (relying on the Rust default), while v7 passes it explicitly. This works correctly today, but the two paths can be unified to make the intent clearer and ensure both versions are handled identically at the call site. An invalid version string will still be caught by raise ValueError before either call is reached.

Suggested change
if version == "v4":
return Expression._call_builtin_scalar_fn("uuid")
if version == "v7":
return Expression._call_builtin_scalar_fn("uuidv7")
return Expression._call_builtin_scalar_fn("uuid", version)
raise ValueError("`version` must be 'v4' or 'v7'")
if version in ("v4", "v7"):
return Expression._call_builtin_scalar_fn("uuid", version)
raise ValueError("`version` must be 'v4' or 'v7'")

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce88822625

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@@ -74,6 +74,5 @@ impl FunctionModule for MiscFunctions {
parent.add_fn(ToStructFunction);
parent.add_fn(Slice);
parent.add_fn(Uuid);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve uuidv7 function aliases in registry

Registering only Uuid here removes the previously registered uuidv7/uuid_v7 entry points, so existing SQL or planner expressions that call those names now fail with a function-not-found error after this refactor. The new parameterized uuid(version := 'v7') path is good, but backward-compatible aliases should remain registered (or be exposed via Uuid::aliases) to avoid breaking existing workloads.

Useful? React with 👍 / 👎.

@everettVT everettVT requested review from srilman and removed request for colin-ho May 21, 2026 23:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant