Skip to content

[refactor] sampler: derive sequence state from feature configs + accept bare sub-feature attr_fields#520

Merged
tiankongdeguiji merged 8 commits into
alibaba:masterfrom
tiankongdeguiji:refactor/sampler-bare-attr-fields
May 21, 2026
Merged

[refactor] sampler: derive sequence state from feature configs + accept bare sub-feature attr_fields#520
tiankongdeguiji merged 8 commits into
alibaba:masterfrom
tiankongdeguiji:refactor/sampler-bare-attr-fields

Conversation

@tiankongdeguiji

@tiankongdeguiji tiankongdeguiji commented May 20, 2026

Copy link
Copy Markdown
Collaborator

Summary

Refactor BaseDataset so negative_sampler.attr_fields uses bare sub-feature names (e.g. "video_id") when item_id_field is a grouped sequence sub-feature, matching the convention every other sampler config (DSSM/MIND/TDM) already uses. HSTUMatch was the outlier writing the qualified flattened form ("cand_seq__video_id"), leaking DataParser's flatten convention into user-facing config.

Design

Two BaseDataset fields, both empty unless item_id_field is a grouped sequence sub-feature:

self._sampler_seq_delim: str   # parent sequence's `sequence_delim`
self._sampler_seq_prefix: str  # f"{sequence_name}{feature._underline}"

_underline (the in-feature separator, _ under RTP / __ otherwise) is read off the matching feature — never hardcoded — so the rewrite is RTP-safe.

launch_sampler_cluster prepends the prefix to every attr_fields entry (deep-copying the sampler sub-message so data_config isn't mutated):

if self._sampler_seq_prefix:
    sampler_config = copy.deepcopy(sampler_config)
    sampler_config.attr_fields[:] = [
        self._sampler_seq_prefix + a for a in sampler_config.attr_fields
    ]

Unconditional prepend — no startswith guard. Bare is the single canonical form; legacy qualified configs ("cand_seq__video_id" from #506) fail loud at sampler init.

The outer-list strip filter scopes to set(sampler_config.attr_fields) (post-rewrite). _merge_sampled_features gates the block-suffix combine on _sampler_seq_delim alone.

Domain invariant the design relies on

When item_id_field is a grouped sequence sub-feature, attr_fields are uniformly sequence-positive — they're all item-side per-positive. There's no "mixed" case where some entries are sequence and others are top-level scalars; the per-row multi-positive layout makes any non-sequence item-side attr structurally incoherent. Confirmed by grepping every config in tzrec/tests/configs/ + examples/. Top-level sequence_id_feature as item_id_field is rejected via an assert feature.is_grouped_sequence (unused by any current config; the two _TestReader tests that posited it are migrated to grouped style).

Config + doc update

 negative_sampler {
   input_path: "data/test/kuairand-1k-match-item-gl.txt"
-  attr_fields: "cand_seq__video_id"
+  attr_fields: "video_id"
   item_id_field: "cand_seq__video_id"
 }

docs/source/models/hstu_match.md gains a one-paragraph note: item_id_field is qualified (it doubles as the sequence-prefix source), attr_fields is bare.

Test plan

  • python -m unittest tzrec.datasets.dataset_test tzrec.datasets.utils_test tzrec.datasets.sampler_test — green on local A10.
  • python -m unittest tzrec.tests.match_integration_test.MatchIntegrationTest.test_hstu_with_fg_train_eval — green on local A10 (~193s).
  • pre-commit run — clean on all touched files.

Source

Slice of #518 (feat/hstu-match-scalar-item-export), reimplemented from scratch with simplification driven by the domain invariant: a uniform prefix string captures the boundary, no per-attr branching, no alias dict, no defensive set probes. The HSTUMatchItemTower scalar export view from #518 is orthogonal and deferred to a follow-up PR.

🤖 Generated with Claude Code

tiankongdeguiji and others added 7 commits May 20, 2026 20:25
Replace the global `_seq_field_delims: Dict[str, str]` (every sequence
input -> delim) with two candidate-side fields derived from the
matching feature config:

  _sampler_seq_delim   -- "" if `item_id_field` isn't a sequence input;
                          the parent feature's `sequence_delim`
                          otherwise (covers top-level
                          `sequence_id_feature` AND grouped sequence
                          sub-features).
  _sampler_seq_inputs  -- candidate-side sequence input names:
                          {feature.name} for top-level; union of
                          `sequence_input_names` across all sub-features
                          sharing `sequence_name` for grouped.

Lookup is `item_id_field in feature.sequence_input_names`, which
returns `[feature.name]` for top-level FG_NONE/FG_BUCKETIZE sequences
and the flattened input names for grouped FG_DAG/FG_NORMAL sub-features
-- single check covers both cases the old `_seq_field_delims` dict did.

Three downstream simplifications in `tzrec/datasets/dataset.py`:

* `launch_sampler_cluster`: outer-list-strip filter switches from
  `f.name in self._seq_field_delims` to
  `f.name in self._sampler_seq_inputs`, narrowing strip to the
  candidate-sequence inputs (excludes unrelated grouped sequences and
  non-sequence item-side attrs from the same lookup feature).

* `_apply_negative_sampler`: passes the single `self._sampler_seq_delim`
  to `build_sampler_input` instead of the whole dict.

* `_merge_sampled_features`: gates the per-key block-suffix combine on
  `k in self._sampler_seq_inputs` (a single set membership check
  replacing the per-key dict lookup). Correct for both top-level and
  grouped, and for mixed-attrs configs where a lookup feature exposes
  both candidate-sequence sub-features and unrelated item-side attrs.

`tzrec/datasets/utils.py`:

* `build_sampler_input(...)` signature change from
  `seq_field_delims: Dict[str, str]` to `seq_delim: str`. Docstring
  updated to name both top-level `sequence_id_feature` and grouped
  sub-feature cases (the master version mentioned only top-level).

DSSM/MIND/TDM behaviour unchanged: when `item_id_field` is a top-level
scalar, `_sampler_seq_delim` stays empty, `_sampler_seq_inputs` stays
empty, and every branch degrades to the pre-refactor scalar path.

Tests:

* `dataset_test.py::test_launch_sampler_cluster_multi_attr_strip_decision_matrix`
  -- replace `_seq_field_delims` membership assertions with
  `_sampler_seq_delim` / `_sampler_seq_inputs` checks on the same
  mixed-attrs lookup_feature case (`cat_map` non-sequence item-side
  attr stays excluded; `click_seq__cat_key` grouped sub-feature stays
  included).

* `utils_test.py::test_build_sampler_input` -- update
  `@parameterized.expand` rows to pass `seq_delim=` (str); rename the
  "no entry" case to "empty_seq_delim_passthrough".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…equence candidate

HSTUMatch is the only model whose sampler config carried qualified
`attr_fields` (e.g. `attr_fields: "cand_seq__video_id"`), leaking
DataParser's `{sequence_name}__{sub_feature}` flatten convention into
user-facing config. Every other sampler config (DSSM/MIND/TDM) uses
the bare sub-feature name because their candidate is a top-level
feature where bare-name == flattened-name.

Translate at the dataset boundary via an explicit alias map:

  _sampler_bare_attr_to_sequence_input: Dict[str, str]

For grouped-sequence candidates (when item_id_field's matching feature
is `is_grouped_sequence`), the map is built from the candidate-
sequence input names by stripping `feature.sequence_name + feature._underline`:

    {"video_id": "cand_seq__video_id", ...}

`_resolve_sampler_attr_field(name) -> str` does
`map.get(name, name)`, and `launch_sampler_cluster` rewrites
`sampler_config.attr_fields` in place over a deep-copy of the
sampler sub-message (so `self._data_config` is not mutated). Three
cases handled by the alias-map's `.get(a, a)` fallthrough:

  1. Bare grouped-sequence sub-feature name -> rewritten to qualified.
  2. Already-qualified name -> no map entry -> passed through.
  3. Top-level item-side attr from the same lookup feature
     (e.g. `cat_map` excluded from `sequence_fields`) -> no map entry
     -> passed through.

Flatten prefix uses `feature._underline` directly so RTP-safe ("_" vs
"__" never enumerated). Deep-copy guard is `if alias map is non-empty`,
so DSSM/MIND/TDM (alias map empty) skip the copy entirely.

Test config + doc switch to the new convention:

    attr_fields: "video_id"                 (bare sub-feature name)
    item_id_field: "cand_seq__video_id"     (qualified; doubles as the
                                             sequence_name source)

Add `dataset_test.py::test_launch_sampler_cluster_bare_attr_resolves_against_seq_prefix`
exercising the rewrite path end-to-end: asserts the alias map content,
that `self._data_config` is not mutated, and that the sampler sees the
qualified column name after `launch_sampler_cluster`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-path, flatten loop

Addresses /simplify review on top of the prior two commits:

* Trim the 17-line block comment above `_sampler_seq_*` init to 4 lines.
  Original block mostly paraphrased the code (WHAT); keep the load-
  bearing WHY (HSTUMatch bare-name convention + RTP-safe prefix).
* Restore the scalar-config (DSSM/MIND/TDM) fast-path in
  `launch_sampler_cluster`: skip the `sampler_fields` list-comp when
  `_sampler_seq_inputs` is empty -- pass through `self.input_fields`
  unchanged like master did via the old guard.
* Flatten the three-level init loop with `continue` guards.
* Drop inaccurate `pyre-ignore [16]` on `feature._underline` -- the
  attribute is publicly defined on `BaseFeature.__init__` and other
  in-class call sites (e.g. `feature.py:473,749,787`) read it without
  any ignore.
* Annotate `_sampler_seq_inputs` as `Set[str]` (was bare `set`),
  matching the `Dict[str, str]` style on the next line.

No behavior change. Tests stay green
(`tzrec.datasets.dataset_test`, `tzrec.datasets.utils_test`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… focused test

The pre-existing `test_launch_sampler_cluster_multi_attr_strip_decision_matrix`
still wrote `attr_fields=["cat_map", "click_seq__cat_key"]` -- the OLD
qualified form for the sequence sub-feature, the very leak this PR is
fixing. Update to `["cat_map", "cat_key"]` (bare for the sub-feature,
top-level item-side `cat_map` unchanged), which both:

* Demonstrates the canonical user-facing form under the new convention.
* Exercises the alias-map rewrite (`cat_key` -> `click_seq__cat_key`)
  end-to-end alongside the fallthrough (`cat_map` has no alias entry).

Now-redundant assertions from the focused
`test_launch_sampler_cluster_bare_attr_resolves_against_seq_prefix`
test fold into the multi_attr test:

* `_sampler_bare_attr_to_sequence_input == {"cat_key": "click_seq__cat_key"}`
  (alias-map structural check).
* `list(data_config.negative_sampler.attr_fields) == ["cat_map", "cat_key"]`
  after launch (deep-copy guard: `self._data_config` not mutated).
* `assertNotIn("cat_key", _sampler._attr_names)` (rewrite replaces,
  doesn't append).

Drop the focused test; the multi_attr test is now the comprehensive
end-to-end check (rewrite + fallthrough + outer-list strip in one).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ly grouped item_id_field supported

Domain invariant the prior rounds over-engineered against: when
`item_id_field` is sequence-positive, `attr_fields` are uniformly
sequence too -- they're all item-side per-positive, and the per-row
positive layout makes any non-sequence item-side attr structurally
incoherent. Real configs confirm: every NegativeSampler /
HardNegativeSampler config in `tzrec/tests/configs/` and `examples/`
is either all-scalar (DSSM/MIND/TDM) or uniformly grouped-sequence
(HSTUMatch).

Collapse to two fields driven by that invariant:

  _sampler_seq_delim   -- "" if not in sequence mode; the matching
                          grouped sequence's `sequence_delim` otherwise.
  _sampler_seq_prefix  -- "" if not in sequence mode; the flatten
                          prefix `f"{sequence_name}{_underline}"`
                          otherwise. Replaces both `_sampler_seq_inputs`
                          (set; dropped) and
                          `_sampler_bare_attr_to_sequence_input` (dict;
                          dropped) from the prior rounds.

Rewrite rule shrinks to one line -- unconditional prepend:

    sampler_config.attr_fields[:] = [
        self._sampler_seq_prefix + a
        for a in sampler_config.attr_fields
    ]

No `startswith` guard for the legacy qualified form -- this PR
establishes bare names as the single canonical convention, and the
migration burden is small (HSTUMatch is the only model that ever wrote
the qualified form; that config is flipped in commit ffae57b). Legacy
qualified attr_fields now fail loud at sampler init (the double-
prefixed column doesn't exist).

`launch_sampler_cluster`'s outer-list strip narrows its filter from
`_sampler_seq_inputs` to `set(sampler_config.attr_fields)` -- in every
real config `item_id_field` is one of the `attr_fields` entries
(post-rewrite), so no separate union is needed.

`_merge_sampled_features` collapses to a single delim gate
(`if not self._sampler_seq_delim`); the Round 1 per-key
`_sampler_seq_inputs` membership check was defensive complexity for
the would-be mixed-attrs case that doesn't exist in real configs.

Top-level `sequence_id_feature` as `item_id_field` is now rejected via
an `assert feature.is_grouped_sequence`: this shape is unused by any
current config, and migrating the two `_TestReader` tests that posited
it (`test_dataset_with_sampler_list_item_id`,
`test_dataset_with_hard_negative_sampler_list_item_id`) to the grouped
style matches how multi-positive sampling actually works in
production.

`_resolve_sampler_attr_field` helper dropped (one-line `.get` on a
now-removed dict). `Set` import dropped (no other uses).

`test_launch_sampler_cluster_multi_attr_strip_decision_matrix` renamed
to `test_launch_sampler_cluster_grouped_sequence_strip_and_rewrite`
and reframed: drops the synthetic `cat_map`-in-attr_fields scenario
(mixed-attrs configs don't exist in production), keeps `cat_map` as a
parquet column NOT in attr_fields to verify strip filter precision,
adds the prefix assertion `_sampler_seq_prefix == "click_seq__"`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comments + docstrings I added in earlier rounds had paragraph-style
explanations. Trim to one-liners per `feedback_concise_inline_comments`:

* `BaseDataset.__init__` sampler-seq state init: 7-line block -> 1 line.
* `launch_sampler_cluster` rewrite block: 5-line comment -> 1 line.
* `launch_sampler_cluster` strip block: 7-line comment -> 1 line.
* `_merge_sampled_features` docstring: 7 lines -> 4 lines.
* `test_launch_sampler_cluster_grouped_sequence_strip_and_rewrite`
  docstring: 12 lines -> 5; trim three WHAT-style assertion comments.

No code-behavior change; tests stay green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… sub-feature types

`test_launch_sampler_cluster_grouped_sequence_strip_and_rewrite` used
a single LookupFeature sub with `sequence_fields=["cat_key"]` to
construct one `list<list<int64>>` candidate input. The synthetic
shape (multi-value layered under multi-positive) doesn't reflect
HSTUMatch's actual usage where the candidate sequence carries
multiple sub-features of varied types (id, raw, ...).

Replace with three sub-features under a single `sequence_feature`:

* `id_feature item_id`     -> `click_seq__item_id: list<int64>`
* `id_feature cat_id`      -> `click_seq__cat_id: list<int64>`
* `raw_feature watch_time` -> `click_seq__watch_time: list<float32>`

`attr_fields=["item_id", "cat_id", "watch_time"]` exercises the
prefix-prepend rewrite across all three; the strip filter scopes
correctly per type (int64 / int64 / float32).

Drops the `sequence_fields` complexity (LookupFeature-only knob) and
the `cat_map` parquet column (the `assertNotIn("cat_map", ...)`
assertion was checking sampler-side state unrelated to the strip
filter's behavior).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tiankongdeguiji tiankongdeguiji force-pushed the refactor/sampler-bare-attr-fields branch from f57e06d to b0213f8 Compare May 21, 2026 03:16
@tiankongdeguiji tiankongdeguiji added the claude-review Let Claude Review label May 21, 2026
@github-actions github-actions Bot removed the claude-review Let Claude Review label May 21, 2026
Comment thread tzrec/datasets/dataset.py Outdated
Comment on lines +176 to +181
assert feature.is_grouped_sequence, (
f"item_id_field '{self._sampler_item_id_field}' is a "
f"sequence input but its matching feature is not a "
f"grouped sequence; only grouped sequence sub-features "
f"are supported as item_id_field."
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is user-facing config validation, not an internal invariant — python -O strips the assert, after which feature.sequence_name (defaulted to None at feature.py:427 for non-grouped features) flows into feature.sequence_name + feature._underline two lines below and raises TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'. A confusing failure mode for what's a clear config error. Convert to raise ValueError(...) to match the explicit-raise pattern used elsewhere in this file (e.g. dataset.py:260).

Suggested change
assert feature.is_grouped_sequence, (
f"item_id_field '{self._sampler_item_id_field}' is a "
f"sequence input but its matching feature is not a "
f"grouped sequence; only grouped sequence sub-features "
f"are supported as item_id_field."
)
if not feature.is_grouped_sequence:
raise ValueError(
f"item_id_field '{self._sampler_item_id_field}' is a "
"sequence input but its matching feature is not a "
"grouped sequence; only grouped sequence sub-features "
"are supported as item_id_field."
)

Worth pairing with a small negative test (assertRaisesRegex) so the contract is locked.

Comment thread tzrec/datasets/dataset.py
Comment thread tzrec/datasets/dataset.py Outdated
Comment on lines 695 to +785
@@ -708,23 +715,35 @@ def test_launch_sampler_cluster_multi_attr_strip_decision_matrix(self):
sequence_delim=";",
features=[
feature_pb2.SeqFeatureConfig(
lookup_feature=feature_pb2.LookupFeature(
feature_name="lookup_c",
map="item:cat_map",
key="item:cat_key",
sequence_fields=["cat_key"],
id_feature=feature_pb2.IdFeature(
feature_name="item_id",
expression="item:item_id",
num_buckets=200,
embedding_dim=8,
)
),
feature_pb2.SeqFeatureConfig(
id_feature=feature_pb2.IdFeature(
feature_name="cat_id",
expression="item:cat_id",
num_buckets=10,
embedding_dim=8,
)
),
feature_pb2.SeqFeatureConfig(
raw_feature=feature_pb2.RawFeature(
feature_name="duration",
expression="item:duration",
)
),
],
)
),
]
features = create_features(
feature_cfgs,
fg_mode=data_pb2.FgMode.FG_NORMAL,
neg_fields=["cat_map", "cat_key"],
neg_fields=["item_id", "cat_id", "duration"],
force_base_data_group=True,
)
dataset = _TestDataset(
@@ -736,8 +755,9 @@ def test_launch_sampler_cluster_multi_attr_strip_decision_matrix(self):
negative_sampler=sampler_pb2.NegativeSampler(
input_path=f.name,
num_sample=4,
attr_fields=["cat_map", "click_seq__cat_key"],
item_id_field="click_seq__cat_key",
# bare names; prefix-rewritten at launch
attr_fields=["item_id", "cat_id", "duration"],
item_id_field="click_seq__item_id",
),
force_base_data_group=True,
),
@@ -746,22 +766,23 @@ def test_launch_sampler_cluster_multi_attr_strip_decision_matrix(self):
input_fields=input_fields,
mode=Mode.TRAIN,
)
# Narrowed _seq_field_delims excludes the non-sequence item-side input.
self.assertIn("click_seq__cat_key", dataset._seq_field_delims)
self.assertNotIn("cat_map", dataset._seq_field_delims)
self.assertEqual(dataset._sampler_seq_delim, ";")
self.assertEqual(dataset._sampler_seq_prefix, "click_seq__")

dataset.launch_sampler_cluster(2)
# outer guard True (item_id_field is sequence-positive):
# - cat_key: list<list<int64>> -> list<int64> (one strip).
# - cat_map: list<string>, not in _seq_field_delims -> unstripped.
cat_key_idx = dataset._sampler._attr_names.index("click_seq__cat_key")
cat_map_idx = dataset._sampler._attr_names.index("cat_map")
self.assertEqual(
dataset._sampler._attr_types[cat_key_idx], pa.list_(pa.int64())
)
# Deep-copy guard: data_config not mutated by the rewrite.
self.assertEqual(
dataset._sampler._attr_types[cat_map_idx], pa.list_(pa.string())
list(dataset._data_config.negative_sampler.attr_fields),
["item_id", "cat_id", "duration"],
)
# Each bare entry prefix-rewritten and outer-list stripped.
for qualified, expected_type in [
("click_seq__item_id", pa.int64()),
("click_seq__cat_id", pa.int64()),
("click_seq__duration", pa.float32()),
]:
idx = dataset._sampler._attr_names.index(qualified)
self.assertEqual(dataset._sampler._attr_types[idx], expected_type)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two coverage gaps worth a follow-up commit:

  1. No negative test for the assert feature.is_grouped_sequence path at dataset.py:176. A 5-line assertRaisesRegex construction with a top-level sequence_id_feature + matching item_id_field would lock the rejection — without it, the assertion can be silently weakened in a future refactor (and would be invisible under python -O; see the related comment on the assert itself).

  2. No large_list coverage in this test. The strip predicate at dataset.py:227 handles both is_list and is_large_list, but every input field here is pa.list_(...). Adding a single pa.large_list(pa.int64()) sub-feature row would cover the second arm.

@github-actions

Copy link
Copy Markdown

Review summary

Net read: clean simplification. Collapsing the _seq_field_delims dict into two correlated strings (_sampler_seq_delim + _sampler_seq_prefix) removes a class of latent bugs from per-key ambiguity and lets the hot path (_merge_sampled_features, build_sampler_input) drop a per-key dict lookup. Deep-copy-on-rewrite is the right pattern for keeping data_config immutable while the sampler sees flattened names. Performance: no regressions; the run-once deepcopy is gated to HSTUMatch and immaterial.

Filed 4 inline comments. Highest-value ones:

  1. assert feature.is_grouped_sequenceraise ValueError (dataset.py:176). User-facing config validation; python -O strips the assert and the next line then TypeErrors on None + str for sequence_name. Add a negative test.
  2. Legacy qualified attr_fields doesn't actually "fail loud" (dataset.py:212-216). The PR description's claim is contradicted by sampler.py:265-277 (silently appended to _ignore_attr_names with only a logger.warning). Worth an explicit pre-rewrite guard so the migration is genuinely loud.
  3. feature._underline cross-module reach (dataset.py:183). One-line property on BaseFeature cleans this up.
  4. Missing negative test for the assert path + large_list coverage in the new strip-and-rewrite test.

Doc note at hstu_match.md:214 is fine in isolation but the surrounding cross-reference to the DSSM 负采样配置 section now diverges from the HSTUMatch convention — folding finding #2 into a hard error makes this less load-bearing.

🤖 Generated with Claude Code

… -> raise ValueError

Two PR-review fixes on top of the Round 3 prefix collapse:

1. BaseDataset's `assert feature.is_grouped_sequence, ...` is user-
   facing config validation, not an internal invariant. Under
   `python -O`/`-OO` the assert is stripped and the next line --
   `feature.sequence_name + feature._underline` -- runs unconditionally;
   for the rejected top-level sequence_id_feature shape `sequence_name`
   is None (feature.py:427), producing `TypeError: unsupported operand
   type(s) for +: 'NoneType' and 'str'`. Convert to `raise ValueError`
   matching the explicit-raise pattern at dataset.py:260.

2. `feature._underline` (feature.py:426) is a private attribute; its
   prior call sites at feature.py:473, 749, 787 were all in-class.
   The Round 3 cross-module read from dataset.py is the first one
   crossing the underscore boundary. Add a public `grouped_sequence_prefix`
   property on BaseFeature next to `is_grouped_sequence`:

       @Property
       def grouped_sequence_prefix(self) -> str:
           return f"{self.sequence_name}{self._underline}" if self._is_grouped_seq else ""

   Empty-string for non-grouped features so callers can use it
   unconditionally. Migrate the dataset.py read AND the three in-class
   call sites:

   * feature.py:469-475 -- `name` property: the if/else for grouped vs
     not collapses to one line via the property's "" fallback.
   * feature.py:749 -- _is_sequence_input: prefix construction.
   * feature.py:787 -- side_inputs: seq_prefix construction.

The explicit raise from (1) stays as the source of truth for config
validation (the property's "" fallback only protects against the
None+str TypeError; without the explicit raise a non-grouped item_id
would silently flow through with empty prefix + set delim, producing
a half-configured sequence path that fails further downstream).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tiankongdeguiji tiankongdeguiji merged commit a21d26f into alibaba:master May 21, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants