Skip to content

add emb_dim in modelConfig#811

Open
yinjuncheng wants to merge 1 commit into
alibaba:mainfrom
yinjuncheng:hotfix/fix_biencoders
Open

add emb_dim in modelConfig#811
yinjuncheng wants to merge 1 commit into
alibaba:mainfrom
yinjuncheng:hotfix/fix_biencoders

Conversation

@yinjuncheng
Copy link
Copy Markdown

No description provided.

@yinjuncheng yinjuncheng requested a review from LLLLKKKK as a code owner March 20, 2026 12:08
@LLLLKKKK
Copy link
Copy Markdown
Collaborator

Code Review: PR #811 — add emb_dim in modelConfig

Reviewer: Automated Code Review (Claude)


Summary

This PR adds "emb_dim" to ModelConfig._python_fields, allowing model_config.emb_dim to be set without triggering the AttributeError guard in __setattr__. The change is a single-line addition in rtp_llm/config/model_config.py.


Findings

1. Missing default initialization in __init__ (Medium)

Every other field in _python_fields is initialized with a default value in ModelConfig.__init__() (e.g., self.inter_size: int = 0, self.model_name: str = ""). emb_dim is not initialized, which means reading model_config.emb_dim before it is explicitly set will raise AttributeError. This is inconsistent with the existing pattern.

Suggested fix — add to __init__:

self.emb_dim: int = 0  # or another appropriate default

2. No current consumer on ModelConfig (Low)

The only existing usage of emb_dim in the codebase is in BiEncoderTbstars.__init__ (internal_source/rtp_llm/models/flot.py:226), where it reads from config_json and stores it as self.emb_dim on the model instance — not on ModelConfig. It would be helpful to clarify in the PR description which code path will set/read model_config.emb_dim.

3. No tests added (Low)

No tests were added or modified. Consider adding a simple test verifying that ModelConfig can get/set emb_dim with the expected default.


Verdict

Needs minor revision. The change is safe and backward-compatible, but emb_dim should be initialized in __init__ with a default value to match the pattern of all other _python_fields and avoid potential AttributeError on read.

@yinjuncheng yinjuncheng force-pushed the hotfix/fix_biencoders branch from 6e219f9 to c79acc9 Compare March 21, 2026 02:04
@LLLLKKKK
Copy link
Copy Markdown
Collaborator

🤖 AI Code Review — PR #811 add emb_dim in modelConfig

概述

ModelConfig 中注册 emb_dim 为 Python-only 字段,并在 __init__ 中初始化为 0。改动极小(1 文件,+2 行),目的是允许在 ModelConfig 实例上设置 emb_dim 属性。

建议改进

P1: 默认值不一致

ModelConfig.__init__emb_dim 默认值为 0,但现有 BiEncoderTbstars.__init__internal_source/rtp_llm/models/flot.py:226)中从 config_json 读取时默认值为 256

# ModelConfig (本 PR)
self.emb_dim: int = 0

# BiEncoderTbstars (现有代码)
self.emb_dim = config_json.get("emb_dim", 256)

如果后续代码迁移为从 config.emb_dim 读取,默认值 0 可能导致 embedding 截断维度为 0,产生静默错误。建议将默认值改为 256 以与现有行为保持一致,或在使用处添加 emb_dim == 0 的校验。

P1: 缺少消费者代码

当前 codebase 中没有任何代码读取 config.emb_dim(ModelConfig 实例的属性)。唯一使用 emb_dim 的地方是 BiEncoderTbstars 中设置在模型实例上的 self.emb_dim

分支名 hotfix/fix_biencoders 暗示这是一个 bugfix,但仅添加字段定义而没有消费侧改动,无法判断是否真正修复了问题。如果是为后续 PR 做准备,建议在 PR 描述中说明。

P1: 缺少 PR 描述和测试

  • PR body 为空,没有说明修改动机和关联的 bug。
  • 没有添加测试来验证 emb_dim 字段的行为。

总结

改动本身不会引入 regression,但默认值与现有代码不一致(0 vs 256)是潜在风险点。建议补充 PR 描述说明使用场景,确认默认值选择,并考虑同时提交消费侧代码。

@LLLLKKKK
Copy link
Copy Markdown
Collaborator

LLLLKKKK commented Apr 9, 2026

🤖 AI Code Review — PR #811
Head SHA: c79acc9a7439 | Verdict: LGTM

Summary

Adds emb_dim field to ModelConfig as a Python-only field with default 0. Follows existing patterns correctly. No issues found.

@LLLLKKKK
Copy link
Copy Markdown
Collaborator

🤖 AI Code Review — PR #811

PR 概述

Title: add emb_dim in modelConfig
Author: yinjuncheng
规模: 1 file, +2/-0

ModelConfig 中新增 emb_dim: int = 0 字段,用于存储 embedding 截断维度。


Review 意见

问题

  1. emb_dim 缺少赋值路径和消费者 [P2]

    当前 model_config.emb_dim 只有声明和初始化,整个代码库中没有任何地方从 config.json 读取并赋值给 model_config.emb_dim,也没有消费者读取它。现有使用 emb_dim 的模型(BiEncoderTbstars)直接从 config_json 读取存储在模型实例上,未使用 ModelConfig.emb_dim

    建议:补充完整的赋值链路和消费链路,否则该字段是死代码。

  2. 默认值与现有逻辑不一致 [P2]

    ModelConfig.emb_dim 默认值为 0,但 BiEncoderTbstars 中 config_json.get("emb_dim", 256) 默认值为 256。后续迁移时可能导致行为变化。

    建议:确认预期默认值,保持一致。

  3. 缺少测试 [P2]

    建议至少验证字段可正常赋值/读取(__setattr__ 白名单生效)及默认值正确。

  4. PR description 为空 [P3]

    未说明添加 emb_dim 的动机和使用场景。

整体评价

极小的配置字段新增,代码无正确性风险。主要问题是字段当前无赋值路径和消费者,属于不完整改动。建议补充完整使用链路或说明后续计划。

LGTM ready to ci — 无 P0/P1 阻塞问题,P2/P3 建议后续改进。

@LLLLKKKK
Copy link
Copy Markdown
Collaborator

AI Code Review — PR #811

Summary: P0/0 · P1/0 · P2/0 · P3/0

Review status: LGTM

lgtm ready to ci

Strengths

  • Change follows the established pattern exactly: registers emb_dim in _python_fields and initializes it with a typed default in __init__, consistent with all other Python-only fields.
  • Minimal, focused diff — two lines, no unrelated changes.

@github-actions
Copy link
Copy Markdown

CI dispatcher could not find a native build run for HEAD SHA c79acc9a.

This can happen if the PR was opened before the CI architecture change, or if the original run was deleted.

To fix: push any commit (even empty: git commit --allow-empty -m "trigger CI" && git push) to create a native build run, then re-approve or post lgtm ready to ci.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants