ModelConfig has 11 moe_* fields today (num_experts, moe_top_k, moe_frequency, moe_router, moe_shared_experts, moe_aux_loss_weight, moe_capacity_factor, moe_sequence_aux_loss_weight, moe_gradient_scale, moe_bias_schedule, moe_packed_experts). The prefix does poor-man's namespacing. Phase 14 from moe_eng_production_plan.md adds 2 more. At ~15+ fields it starts hurting ergonomics — cross-field validation in __post_init__ gets awkward, and it's harder to see what's MoE-specific vs model-general.
Eventual shape:
@dataclass
class MoEConfig:
num_experts: int = 0
top_k: int = 2
# ... rest of the current moe_* fields, unprefixed
@dataclass
class ModelConfig:
dim: int = ...
moe: MoEConfig = field(default_factory=MoEConfig)
TOML grows a [model.moe] subtable. is_moe becomes config.model.moe.num_experts > 0. CLI overrides become --model.moe.top_k=4 — loader.py already handles arbitrary dot depth.
Not doing this now. 11 fields isn't sprawl. And _apply_dict_to_dataclass in kempnerforge/config/loader.py rejects unknown TOML keys, so flat-to-nested is a hard cutover — either rewrite every existing MoE config in the same PR or add a compat shim that rewrites moe_* under [model] into [model.moe] with a deprecation warning. Not worth paying for a cosmetic win.
Revisit when:
moe_* field count on ModelConfig hits ≥15, or
- Cross-field validation needs to live in
__post_init__ (e.g. top_k ≤ num_experts, capacity_factor > 0 requires specific routers), or
- Someone wants
moe_layers: list[MoEConfig] for multi-variant composition.
Out of scope: mtp_* fields from Phase 15 (not MoE), DistributedConfig.ep (orthogonal to model arch).
ModelConfighas 11moe_*fields today (num_experts,moe_top_k,moe_frequency,moe_router,moe_shared_experts,moe_aux_loss_weight,moe_capacity_factor,moe_sequence_aux_loss_weight,moe_gradient_scale,moe_bias_schedule,moe_packed_experts). The prefix does poor-man's namespacing. Phase 14 frommoe_eng_production_plan.mdadds 2 more. At ~15+ fields it starts hurting ergonomics — cross-field validation in__post_init__gets awkward, and it's harder to see what's MoE-specific vs model-general.Eventual shape:
TOML grows a
[model.moe]subtable.is_moebecomesconfig.model.moe.num_experts > 0. CLI overrides become--model.moe.top_k=4—loader.pyalready handles arbitrary dot depth.Not doing this now. 11 fields isn't sprawl. And
_apply_dict_to_dataclassinkempnerforge/config/loader.pyrejects unknown TOML keys, so flat-to-nested is a hard cutover — either rewrite every existing MoE config in the same PR or add a compat shim that rewritesmoe_*under[model]into[model.moe]with a deprecation warning. Not worth paying for a cosmetic win.Revisit when:
moe_*field count onModelConfighits ≥15, or__post_init__(e.g.top_k ≤ num_experts,capacity_factor > 0requires specific routers), ormoe_layers: list[MoEConfig]for multi-variant composition.Out of scope:
mtp_*fields from Phase 15 (not MoE),DistributedConfig.ep(orthogonal to model arch).