Defer: nest moe_* fields into ModelConfig.moe when field count grows

`ModelConfig` has 11 `moe_*` fields today (`num_experts`, `moe_top_k`, `moe_frequency`, `moe_router`, `moe_shared_experts`, `moe_aux_loss_weight`, `moe_capacity_factor`, `moe_sequence_aux_loss_weight`, `moe_gradient_scale`, `moe_bias_schedule`, `moe_packed_experts`). The prefix does poor-man's namespacing. Phase 14 from `moe_eng_production_plan.md` adds 2 more. At ~15+ fields it starts hurting ergonomics — cross-field validation in `__post_init__` gets awkward, and it's harder to see what's MoE-specific vs model-general.

Eventual shape:

```python
@dataclass
class MoEConfig:
    num_experts: int = 0
    top_k: int = 2
    # ... rest of the current moe_* fields, unprefixed

@dataclass
class ModelConfig:
    dim: int = ...
    moe: MoEConfig = field(default_factory=MoEConfig)
```

TOML grows a `[model.moe]` subtable. `is_moe` becomes `config.model.moe.num_experts > 0`. CLI overrides become `--model.moe.top_k=4` — `loader.py` already handles arbitrary dot depth.

Not doing this now. 11 fields isn't sprawl. And `_apply_dict_to_dataclass` in `kempnerforge/config/loader.py` rejects unknown TOML keys, so flat-to-nested is a hard cutover — either rewrite every existing MoE config in the same PR or add a compat shim that rewrites `moe_*` under `[model]` into `[model.moe]` with a deprecation warning. Not worth paying for a cosmetic win.

Revisit when:
- `moe_*` field count on `ModelConfig` hits ≥15, or
- Cross-field validation needs to live in `__post_init__` (e.g. `top_k ≤ num_experts`, `capacity_factor > 0` requires specific routers), or
- Someone wants `moe_layers: list[MoEConfig]` for multi-variant composition.

Out of scope: `mtp_*` fields from Phase 15 (not MoE), `DistributedConfig.ep` (orthogonal to model arch).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Defer: nest moe_* fields into ModelConfig.moe when field count grows #14

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Defer: nest moe_* fields into ModelConfig.moe when field count grows #14

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions