You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[polish] dataset_test: expand grouped-sequence test to cover multiple sub-feature types
`test_launch_sampler_cluster_grouped_sequence_strip_and_rewrite` used
a single LookupFeature sub with `sequence_fields=["cat_key"]` to
construct one `list<list<int64>>` candidate input. The synthetic
shape (multi-value layered under multi-positive) doesn't reflect
HSTUMatch's actual usage where the candidate sequence carries
multiple sub-features of varied types (id, raw, ...).
Replace with three sub-features under a single `sequence_feature`:
* `id_feature item_id` -> `click_seq__item_id: list<int64>`
* `id_feature cat_id` -> `click_seq__cat_id: list<int64>`
* `raw_feature watch_time` -> `click_seq__watch_time: list<float32>`
`attr_fields=["item_id", "cat_id", "watch_time"]` exercises the
prefix-prepend rewrite across all three; the strip filter scopes
correctly per type (int64 / int64 / float32).
Drops the `sequence_fields` complexity (LookupFeature-only knob) and
the `cat_map` parquet column (the `assertNotIn("cat_map", ...)`
assertion was checking sampler-side state unrelated to the strip
filter's behavior).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
0 commit comments