Conversation
Contributor
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #204 +/- ##
==========================================
+ Coverage 96.00% 96.08% +0.07%
==========================================
Files 140 142 +2
Lines 9839 10192 +353
Branches 568 582 +14
==========================================
+ Hits 9446 9793 +347
- Misses 275 280 +5
- Partials 118 119 +1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
c3013e7 to
01562e7
Compare
timkpaine
reviewed
Apr 30, 2026
timkpaine
reviewed
Apr 30, 2026
timkpaine
reviewed
Apr 30, 2026
timkpaine
requested changes
Apr 30, 2026
Collaborator
Author
|
I'm going to rework this a bit after #205, as the current setup of the TPCH generator is propagating suboptimal patterns into this notebook. Will address the feedback above as well (and probably remove the notebook builder). |
01562e7 to
56d2592
Compare
Introduce ccflow.models.narwhals providing: - NarwhalsFrameTransform: pure LazyFrame -> LazyFrame transform base class. Framework-agnostic; usable standalone via lf.pipe(transform). - SequenceTransform: bundles a strict list of transforms; itself a NarwhalsFrameTransform so it nests and JSON-roundtrips. - NarwhalsPipelineModel: CallableModel that pipes a NarwhalsFrameResult source through a list of transforms. Delegates context_type to the source. Output is always a narwhals.LazyFrame (lazy contract enforced by re-coercing after every stage). Supports loose Callable transforms at runtime (strict NarwhalsFrameTransform required for serialization). - JoinTransform: joins another CallableModel's frame onto the input. Other source invoked with NullContext. - JoinBackTransform: runs an inner transform on the input and joins the result back -- for fork/rejoin patterns where window functions do not fit. All classes are pydantic models, JSON-serializable, and integrate with ccflow's graph evaluator via explicit __deps__. Includes 28 unit tests covering base contracts, JSON roundtrip, lazy enforcement, dependency injection, multi-source enrichment, and confluence (pipeline as source of another pipeline). Signed-off-by: Pascal Tomecek <pascal.tomecek@cubistsystematic.com>
JoinTransform now accepts either same-named (on=) or cross-named (left_on=/right_on=) join keys, and supports how='cross'. A model validator enforces mutual exclusion and that cross joins specify no keys. Adds ccflow/examples/narwhals_pipelines.ipynb -- an end-to-end walkthrough of the new abstractions on TPC-H data, refactoring Q1 into transforms, demonstrating dependency injection, JSON serialization, multi-source enrichment via JoinTransform, and the confluence-of-pipelines pattern on Q3. Notebook is generated by build_notebook.py. Signed-off-by: Pascal Tomecek <pascal.tomecek@cubistsystematic.com>
Notebook now relies on a properly-configured environment (`pip install -e .` from this worktree) rather than a worktree-detection shim in cell 1. Signed-off-by: Pascal Tomecek <pascal.tomecek@cubistsystematic.com>
…rializeAsAny - NarwhalsPipelineModel.__call__ now forwards the caller's context to source(context) rather than passing NullContext, so context-keyed sources (e.g. TPCHDataGenerator with TPCHTableContext) can be used directly without per-table adapter wrappers. - JoinTransform gains an other_context field so a single context-keyed source can be reused across multiple joins (e.g. one TPCHDataGenerator instance serving customer/orders/lineitem). Validator now checks isinstance(other_context, other.context_type) at construction. - Drop SerializeAsAny from source and other fields. ccflow's BaseModel metaclass already wraps bare BaseModel-typed fields, so the explicit annotation was redundant. Verified subclass info still survives JSON round-trip in tests. - Loosen SequenceTransform.transforms to accept the same Union[NarwhalsFrameTransform, Callable] as NarwhalsPipelineModel.transforms, for consistency. Beef up the comment on NarwhalsFrameTransformOrCallable to spell out why both branches matter (BaseModel branch enables type_-based round-trip; Callable branch is a runtime escape hatch that doesn't serialize). - Notebook polish: drop hardcoded TOC, drop graph-awareness subsection, drop trailing Pointers section, fix ascii alignment in section 7 diagram, and remove the TPCHTableProvider adapter (and per-table providers) by leaning on context passthrough -- pipelines now use the generator directly. Combine the AggregateByReturnStatus and SortByReturnStatus transforms into a single SummarizeByReturnStatus (group keys = sort keys, naturally one operation). Add a new section 4 'Aside' that frames NarwhalsFrameTransform as an opt-in convention rather than a requirement, with a tradeoffs table for plain function vs plain BaseModel vs NarwhalsFrameTransform. - Tests: add coverage for plain callable + plain ccflow.BaseModel inside SequenceTransform (round-trips via type_), context forwarding to source, other_context flow-through, and other_context type-mismatch rejection (38 tests total, full suite 687 passed). Signed-off-by: Pascal Tomecek <pascal.tomecek@cubistsystematic.com>
56d2592 to
ada154d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds reusable narwhals pipeline abstractions in
ccflow.models.narwhals, plus an end-to-end TPC-H notebook demonstrating them.New classes
NarwhalsFrameTransform— pureLazyFrame -> LazyFrametransform base class. Framework-agnostic; usable standalone vialf.pipe(transform).SequenceTransform— bundles a list of transforms; itself aNarwhalsFrameTransformso it nests and JSON-roundtrips.NarwhalsPipelineModel—CallableModelthat pipes aNarwhalsFrameResultsource through a list of transforms. Delegatescontext_typeto the source.JoinTransform— joins anotherCallableModel's frame onto the input. Supports same-named (on=), cross-named (left_on=/right_on=), andhow="cross"joins.JoinBackTransform— runs an inner transform on the input and joins the result back, for fork/rejoin patterns where window functions don't fit.Notebook
ccflow/examples/narwhals_pipelines.ipynb— TPC-H Q1 + Q3 walkthrough covering refactoring a canonical query into reusable transforms, dependency injection, JSON serialization of full pipelines, multi-source enrichment viaJoinTransform, and the confluence pattern (pipelines composed as inputs to other pipelines).