支持interlm25,支持模型对齐#4131
Conversation
|
Thanks for your contribution! |
Codecov Report❌ Patch coverage is ❌ Your patch status has failed because the patch coverage (51.09%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #4131 +/- ##
==========================================
Coverage ? 47.11%
==========================================
Files ? 482
Lines ? 91611
Branches ? 0
==========================================
Hits ? 43165
Misses ? 48446
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/re-run all-failed |
| module.weight[module._padding_idx].zero_() | ||
|
|
||
| @classmethod | ||
| def _gen_aoa_config(cls, config: InternLM25Config): |
There was a problem hiding this comment.
不建议禁用,参考以下实现:
def _gen_aoa_config(cls, config: InternLM25Config):
model_prefix = cls.base_model_prefix + "." if cls != cls.base_model_class else ""
aoa_statements = [
f"model.tok_embeddings.weight -> {model_prefix}tok_embeddings.weight",
f"model.norm.weight -> {model_prefix}norm.weight",
f"model.layers.$LAYER_ID.attention_norm.weight -> {model_prefix}layers.$LAYER_ID.attention_norm.weight",
f"model.layers.$LAYER_ID.ffn_norm.weight -> {model_prefix}layers.$LAYER_ID.ffn_norm.weight",
]
aoa_statements.extend([
f"model.layers.$LAYER_ID.attention.{w}.weight^T -> {model_prefix}layers.$LAYER_ID.attention.{w}.weight"
for w in ["wqkv", "wo"]
])
aoa_statements.extend([
f"model.layers.$LAYER_ID.feed_forward.{w}.weight^T -> {model_prefix}layers.$LAYER_ID.feed_forward.{w}.weight"
for w in ["w1", "w2", "w3"]
])
if cls != cls.base_model_class:
if getattr(config, "tie_word_embeddings", False):
aoa_statements.append("model.tok_embeddings.weight -> output.weight")
else:
aoa_statements.append("output.weight^T -> output.weight")
return {"aoa_statements": aoa_statements}
| ) | ||
|
|
||
| if attention_mask is not None and attention_mask.ndim == 4: | ||
| if attention_mask.max() != 0: |
There was a problem hiding this comment.
训练在此报错,建议删除这个判断,直接使用 4D mask
| ("Gemma3", "gemma3_text"), | ||
| ("Glm4vMoe", "glm4v_moe"), | ||
| ("GlmOcr", "glm_ocr"), | ||
| ("InternLM2", "intern_lm2_5"), |
| logging_steps: 1 | ||
| gradient_accumulation_steps: 4 | ||
| logging_dir: ./vdl_log | ||
| output_dir: ./checkpoints/qwen3-sft-full |
| # limitations under the License. | ||
|
|
||
| # TODO ,前期不在 .github/workflows/fleet-model-test.yml 中生效,避免直接卡死流程 | ||
| # TODO,提交PR的时候,会提交loss对比材料 |
|
1、copyright 年份错误 |
PaddleFormers Log Analysis
日志分析报告
失败的测试 case: 根本原因分析: 本 PR(#4131)主体修改为新增
修复建议:
🔄 每次 Re-run 后自动更新 |
PR types
New features
PR changes
Add Models
Description
InternLM2.5 模型迁移到 PaddleFormers:
1. 模型功能和对齐测试 (
tests/transformers/intern_lm2_5/test_modeling.py)InternLM25CompatibilityTest等功能测试类2. 转换模型的地址
ms-swift对比
1. ms-swift配置
2. paddleformers-cli配置
3. loss输出diff脚本
4. loss对比结果
common: 102, show: 100