Skip to content

[refactor] expose features/feature_groups as @property across models and wrappers#519

Merged
tiankongdeguiji merged 5 commits into
alibaba:masterfrom
tiankongdeguiji:refactor/features-feature-groups-as-properties
May 21, 2026
Merged

[refactor] expose features/feature_groups as @property across models and wrappers#519
tiankongdeguiji merged 5 commits into
alibaba:masterfrom
tiankongdeguiji:refactor/features-feature-groups-as-properties

Conversation

@tiankongdeguiji
Copy link
Copy Markdown
Collaborator

Summary

Add features / feature_groups @property accessors to the classes that own the underlying _features / _feature_groups lists (BaseModel, MatchTower, MatchTowerWoEG, TDMEmbedding); migrate wrappers (ScriptWrapper, TowerWrapper, TowerWoEGWrapper) and downstream export callers to read through that property surface so the wrappers no longer have to snapshot the inner module's metadata at construction time.

Today every wrapper mirrors the inner module's _features / _feature_groups onto itself. That snapshot is fine for the static-view models we have, but it pins each wrapper to a single training-shape view. Exposing the lists as @propertys gives subclasses a seam to override the surface per view (training vs. export) without mutating the underscore fields the snapshot was reading from.

Extracted from #518 to land independently. The HSTUMatch scalar export view in #518 will sit on top of this — HSTUMatchItemTower overrides the property to return a lazily-built scalar view when _is_inference is set.

Design

  • _features / _feature_groups underscore fields stay as the source of truth on BaseModel, MatchTower, MatchTowerWoEG, TDMEmbedding. The new properties are default reads of those fields; existing subclasses pick them up unchanged.
  • Wrappers drop their construction-time snapshots and forward live to the inner module's property (getattr(self, self._tower_name).features for the per-tower wrappers; self.model.features for ScriptWrapper).
  • TowerWoEGWrapper.__init__ switches its EmbeddingGroup build to module.features / module.feature_groups. The EmbeddingGroup owns nn.Parameters so it must remain a construction-time snapshot — but it now reads through the same property surface as everything else, so a subclass that overrides the property to a different view (e.g. HSTUMatch's scalar export view in [feat] HSTUMatch: scalar item-tower export view #518) gets the matching EmbeddingGroup.
  • Downstream consumers in tzrec/utils/export_util.py and tzrec/acc/aot_utils.py migrate from model._features / model._feature_groups to the property API. Five sites total, all of which receive a wrapper (InferWrapper(...) or InferWrapper(TowerWrapper/TowerWoEGWrapper(...))).

MatchTower property — completeness fix vs #518

#518 adds the property to MatchTowerWoEG only. Once wrapper snapshots are dropped, TowerWrapper.features reads module.features — and MatchTower inherits from BaseModule, which has no such property. That would silently break DSSMTower (dssm.py), DATTower (dat.py), MINDUserTower / MINDItemTower (mind.py) at export. This PR adds the missing property to MatchTower so DSSM/DAT/MIND continue to work.

Non-functional for DSSM/MIND/DAT/TDM/HSTU

The properties return the same _features / _feature_groups lists the wrappers used to snapshot. No view changes, no model changes — purely a name-routing refactor that unlocks per-view overrides for future PRs (notably #518's HSTUMatch scalar export).

Test plan

  • python -m unittest tzrec.models.match_model_test — green (4 tests, MatchModel + TrainWrapper)
  • python -m unittest tzrec.models.rank_model_test — green (10 tests, RankModel/BaseModel surface)
  • python -m unittest tzrec.models.tdm_test — green (3 tests, TDM + TDMEmbedding)
  • python -m unittest tzrec.models.dssm_test — green (6 tests, MatchTower path)
  • python -m unittest tzrec.models.dssm_v2_test — green (2 tests, MatchTowerWoEG path)
  • python -m unittest tzrec.models.hstu_test — green (1 test, MatchTowerWoEG via HSTUMatchItemTower)
  • python -m unittest tzrec.models.mind_test — green (6 tests, MatchTower path)
  • python -m unittest tzrec.models.dat_test — green (6 tests, MatchTower path)
  • pre-commit run --files <touched> — clean
  • CI: default lane (includes match-tower integration tests that exercise the export pipeline end-to-end through TowerWrapper / TowerWoEGWrapper / ScriptWrapper).

🤖 Generated with Claude Code

tiankongdeguiji and others added 3 commits May 20, 2026 19:39
…classes

Add `features` / `feature_groups` `@property` accessors to the four
classes that own the underlying `_features` / `_feature_groups`
fields. Each property is a default read of the corresponding
underscore field -- non-functional today, but gives subclasses a seam
to override the surface per view (training vs export) without
mutating the underscore fields.

Touched:
* `BaseModel` (tzrec/models/model.py): covers every ranking /
  matching / multi-task model and its `MatchModel` subclass.
* `MatchTower` (tzrec/models/match_model.py): the base for
  `DSSMTower`, `DATTower`, `MINDUserTower`, `MINDItemTower`. Inherits
  from `BaseModule`, which does not own the underscore fields, so
  the property must be defined here for `TowerWrapper`'s
  property-based read (next commits) to succeed.
* `MatchTowerWoEG` (tzrec/models/match_model.py): the base for
  `HSTUUserTower` / `HSTUMatchItemTower`. Inherits from `nn.Module`
  for the same reason.
* `TDMEmbedding` (tzrec/models/tdm.py): an `nn.Module` exported
  separately by the TDM pipeline; owns its own `_features` /
  `_feature_groups` from the parent `EmbeddingGroup`.

Pure additions -- no caller is rewired yet, so the existing
underscore-field reads on these classes and their wrappers continue
to work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… property API

Switch the five call sites that read `model._features` /
`model._feature_groups` (where `model` is a wrapper:
`ScriptWrapper(...)` for non-match, `InferWrapper(TowerWrapper(...))`
or `InferWrapper(TowerWoEGWrapper(...))` for match) to the
property API introduced in the previous commit.

Sites migrated:

* `tzrec/utils/export_util.py::export_model_normal` — `features`
  argument to `create_dataloader` (line 164).
* `tzrec/utils/export_util.py::export_model_aot` setup — second
  dataloader site (line 714).
* `tzrec/utils/export_util.py::split_model` —
  `_compute_seq_share_groups` call site reads both
  `features` and `feature_groups` (lines 1163-1164).
* `tzrec/acc/aot_utils.py::export_unified_model_aot` — sequence
  feature name set (line 470) and `_build_dynamic_shapes` call
  (lines 476-477).
* `tzrec/acc/aot_utils.py::_build_dynamic_shapes` docstring —
  `model._features` / `model._feature_groups` references updated to
  `model.features` / `model.feature_groups` to match the new API.

Today the wrappers still expose the underscore-field snapshot, so
both forms work; this commit threads the property reads through the
export pipeline so the next commit (which drops the underscore-field
snapshots from the wrappers) is a no-op for these call sites.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…se properties

`ScriptWrapper`, `TowerWrapper`, and `TowerWoEGWrapper` previously
mirrored their inner module's `_features` / `_feature_groups` onto
themselves at construction time. With every wrapped class now
exposing those metadata fields via `@property` (previous commits),
the wrappers can drop the snapshot and forward live reads instead.

Concretely:

* `ScriptWrapper.__init__`: no longer copies `model._features` /
  `model._feature_groups`. `DataParser` is constructed from
  `self.model.features`. The class exposes `features` /
  `feature_groups` as `@property` reads of `self.model.features` /
  `self.model.feature_groups`.

* `TowerWrapper.__init__` / `TowerWoEGWrapper.__init__`: same
  treatment, reading the wrapped tower via `getattr(self,
  self._tower_name)`.

* `TowerWoEGWrapper.__init__` additionally switches its
  `EmbeddingGroup` build from `module._features, module._feature_groups`
  to `module.features, module.feature_groups`, so the construction-time
  snapshot that EmbeddingGroup needs (it owns nn.Parameters) goes
  through the same property surface as everything else.

The downstream call sites in `tzrec/utils/export_util.py` and
`tzrec/acc/aot_utils.py` were migrated to the property API in the
previous commit, so this is a no-op for them. No other consumer reads
the wrappers' underscore fields directly.

Non-functional for DSSM/MIND/DAT/TDM/HSTU: properties return the same
`_features` / `_feature_groups` lists the wrappers used to snapshot,
identical to the pre-refactor behaviour.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tiankongdeguiji tiankongdeguiji added the claude-review Let Claude Review label May 20, 2026
@github-actions github-actions Bot removed the claude-review Let Claude Review label May 20, 2026
self._features = module._features
self._feature_groups = module._feature_groups
self._tower_name = tower_name
self._group_name = module._group_name
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent with the rest of the migration: this wrapper now reads features / feature_groups through the property surface but still reaches into module._group_name directly. The same "let subclasses override per-view" rationale applies to group_name — a MatchTowerWoEG subclass that wanted to expose a different group_name at export would have no seam.

Consider either (a) adding a group_name property on MatchTower / MatchTowerWoEG and reading through it here, or (b) at minimum a one-line comment noting _group_name is deliberately kept as a private contract between the tower and its wrapper. Symmetry with the rest of the PR would argue for (a).

@github-actions
Copy link
Copy Markdown

Review summary

Tightly-scoped, mechanical refactor. The shape is right: read-only @property getters forwarding to _features / _feature_groups, with wrappers dropping construction-time snapshots in favor of live reads. The five migrated call sites in export_util.py / aot_utils.py all happen at export/trace time, not per-batch, so this is performance-neutral.

Coverage: match_integration_test.py exercises TowerWrapper / TowerWoEGWrapper / ScriptWrapper construction + export end-to-end (DSSM/DSSMv2/DAT/MIND tests hit the non-AOT path; the AOT variants at :233,260 reach _compute_seq_share_groups and _build_dynamic_shapes). TDM property path is covered by test_tdm_train_eval_export. The subclass-override assertion (override returns different list than _features) is reasonably deferred to #518 where the override actually lands — though a one-line unit test there asserting _build_dynamic_shapes sees the overridden list would be cheap insurance.

TorchScript: confirmed neither TowerWrapper.predict nor TowerWoEGWrapper.predict nor ScriptWrapper.forward reads self.features / self.feature_groups. The protobuf-list return type never enters JIT type inference — safe.

Security: no concerns. _tower_name is constructor-sourced (trusted), properties are read-only, no new shared mutable state.

One noteworthy finding (inline at match_model.py:512): TowerWoEGWrapper.__init__ still reads module._group_name directly while the rest of the wrapper now goes through .features / .feature_groups. Either lift group_name to a property too (symmetric with the per-view override seam this PR introduces), or note explicitly that _group_name is a deliberate private contract.

Nothing else blocking.

tiankongdeguiji and others added 2 commits May 20, 2026 20:20
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tiankongdeguiji tiankongdeguiji merged commit dc236b3 into alibaba:master May 21, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants