You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Support nested collection types (Array/Set of Array/Set) (#5947) (#6132)
* feat: Support nested collection types (Array/Set of Array/Set) (#5947)
Add support for 2-level nested collection types: Array(Array(T)),
Array(Set(T)), Set(Array(T)), and Set(Set(T)).
- Add 4 generic ValueType enums (LIST_LIST, LIST_SET, SET_LIST, SET_SET)
backed by RepeatedValue proto messages
- Persist inner type info in Field tags (feast:nested_inner_type),
following the existing Struct schema tag pattern
- Handle edge cases: empty inner collections, Set dedup at inner level,
depth limit enforcement (2 levels max)
- Add proto/JSON/remote transport serialization support
- Add 25 unit tests covering all combinations and edge cases
Signed-off-by: Soojin Lee <lsjin0602@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: soojin <soojin@dable.io>
* fix: Fix remote online read for nested collection types and add docs
- Fix remote online store read path to use declared feature types from
FeatureView instead of ValueType.UNKNOWN, which fails for nested
collection types (LIST_LIST, LIST_SET, SET_LIST, SET_SET)
- Add Nested Collection Types section to type-system.md with type table,
usage examples, and empty-inner-collection→None limitation docs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: soojin <soojin@dable.io>
* fix: Fix JSON deserialization, schema inference, and silent fallback for nested collection types
- Add nested list handling in proto_json from_json_object (list of lists
was raising ParseError since no branch matched list-typed elements)
- Fix pa_to_feast_value_type to recognize nested list PyArrow types
(list<item: list<item: T>>) instead of crashing with KeyError
- Replace silent String fallback in _str_to_feast_type with ValueError
to surface corrupted tag values instead of silently losing type info
- Strengthen test coverage: type str roundtrip, inner value verification,
multi-value batch, proto JSON roundtrip, PyArrow nested type inference
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: soojin <soojin@dable.io>
* fix: Fix mypy type error in nested collection proto construction
Use getattr/CopyFrom instead of **dict unpacking for ProtoValue
construction to satisfy mypy's strict type checking.
Signed-off-by: soojin <soojin@dable.io>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: soojin <soojin@dable.io>
* fix: Fix equality comparison for nested types and JSON deserialization edge case
- Add __eq__/__hash__ to Array and Set so inner element types are compared
(previously Array(Array(String)) == Array(Array(Int32)) was True)
- Fix nested collection detection in proto_json when first element is None
by using any() fallback instead of only checking value[0]
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: soojin <soojin@dable.io>
* feat: Remove depth limit for nested collection types and improve test coverage
- Remove 2-level depth restriction from Array and Set constructors
to support unbounded nesting per maintainer request
- Make _convert_nested_collection_to_proto() recursive for 3+ levels
- Update error message for nested type inference to guide users
toward explicit Field dtype declaration
- Add 3+ level tests for Field roundtrip, str roundtrip, and PyArrow conversion
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: soojin <soojin@dable.io>
* refactor: Replace combinatorial nested collection enums with recursive VALUE_LIST/VALUE_SET
Replace 4 combinatorial enum values (LIST_LIST=36, LIST_SET=37, SET_LIST=38,
SET_SET=39) with 2 recursive enum values (VALUE_LIST=40, VALUE_SET=41) that
use RepeatedValue to enable unlimited nesting depth. This is a breaking change
for an unreleased feature, as suggested in PR #6132 review.
Key changes:
- Proto: Remove 4 enum/oneof fields, add VALUE_LIST/VALUE_SET with reserved 36-39
- Python: Update ValueType enum, type system, serialization, field persistence
- JSON: Update proto_json encode/decode for new field names
- Tests: Rewrite all nested collection tests (204 tests passing)
- Docs: Update type-system.md for recursive design
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: soojin <soojin@dable.io>
* fix: Preserve inner element types in PyArrow schema inference and optimize JSON nested list detection
- Add _parse_pa_type_str() to reconstruct PyArrow types from type strings
for VALUE_LIST/VALUE_SET, avoiding lossy round-trip through placeholder
- Optimize proto_json nested list detection: only scan with any() when
first element is None, avoiding O(n) scan for flat lists
- Add warning log for unrecognized PyArrow type strings
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: soojin <soojin@dable.io>
* fix: Add np.ndarray support in nested collection proto conversion and clarify placeholder pyarrow type
- Add np.ndarray to isinstance check in _convert_nested_collection_to_proto
to fix KeyError for 3+ level nesting during materialization (PyArrow produces
np.ndarray, not Python list)
- Add comment clarifying VALUE_LIST/VALUE_SET placeholder in feast_value_type_to_pa
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: soojin <soojin@dable.io>
---------
Signed-off-by: Soojin Lee <lsjin0602@gmail.com>
Signed-off-by: soojin <soojin@dable.io>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Francisco Javier Arceo <arceofrancisco@gmail.com>
Copy file name to clipboardExpand all lines: docs/reference/type-system.md
+47Lines changed: 47 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -86,6 +86,25 @@ All primitive types (except `Map` and `Json`) have corresponding set types for s
86
86
- Set types are best suited for **online serving** use cases where feature values are written as Python sets and retrieved via `get_online_features`.
87
87
{% endhint %}
88
88
89
+
### Nested Collection Types
90
+
91
+
Feast supports arbitrarily nested collections using a recursive `VALUE_LIST` / `VALUE_SET` design. The outer container determines the proto enum (`VALUE_LIST` for `Array(…)`, `VALUE_SET` for `Set(…)`), while the full inner type structure is persisted via a mandatory `feast:nested_inner_type` Field tag.
92
+
93
+
| Feast Type | Python Type | ValueType | Description |
Where `T` is any supported primitive type (Int32, Int64, Float32, Float64, String, Bytes, Bool, UnixTimestamp) or another nested collection type.
102
+
103
+
**Notes:**
104
+
- Nesting depth is **unlimited**. `Array(Array(Array(T)))`, `Set(Array(Set(T)))`, etc. are all supported.
105
+
- Inner type information is preserved via Field tags (`feast:nested_inner_type`) and restored during deserialization. This tag is mandatory for nested collection types.
106
+
- Empty inner collections (`[]`) are stored as empty proto values and round-trip as `None`. For example, `[[1, 2], [], [3]]` becomes `[[1, 2], None, [3]]` after a write-read cycle.
107
+
89
108
### Map Types
90
109
91
110
Map types allow storing dictionary-like data structures:
0 commit comments