Skip to content

Defer: nest moe_* fields into ModelConfig.moe when field count grows #14

@mmshad

Description

@mmshad

ModelConfig has 11 moe_* fields today (num_experts, moe_top_k, moe_frequency, moe_router, moe_shared_experts, moe_aux_loss_weight, moe_capacity_factor, moe_sequence_aux_loss_weight, moe_gradient_scale, moe_bias_schedule, moe_packed_experts). The prefix does poor-man's namespacing. Phase 14 from moe_eng_production_plan.md adds 2 more. At ~15+ fields it starts hurting ergonomics — cross-field validation in __post_init__ gets awkward, and it's harder to see what's MoE-specific vs model-general.

Eventual shape:

@dataclass
class MoEConfig:
    num_experts: int = 0
    top_k: int = 2
    # ... rest of the current moe_* fields, unprefixed

@dataclass
class ModelConfig:
    dim: int = ...
    moe: MoEConfig = field(default_factory=MoEConfig)

TOML grows a [model.moe] subtable. is_moe becomes config.model.moe.num_experts > 0. CLI overrides become --model.moe.top_k=4loader.py already handles arbitrary dot depth.

Not doing this now. 11 fields isn't sprawl. And _apply_dict_to_dataclass in kempnerforge/config/loader.py rejects unknown TOML keys, so flat-to-nested is a hard cutover — either rewrite every existing MoE config in the same PR or add a compat shim that rewrites moe_* under [model] into [model.moe] with a deprecation warning. Not worth paying for a cosmetic win.

Revisit when:

  • moe_* field count on ModelConfig hits ≥15, or
  • Cross-field validation needs to live in __post_init__ (e.g. top_k ≤ num_experts, capacity_factor > 0 requires specific routers), or
  • Someone wants moe_layers: list[MoEConfig] for multi-variant composition.

Out of scope: mtp_* fields from Phase 15 (not MoE), DistributedConfig.ep (orthogonal to model arch).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions