Pass full_param_layout into DDP (Megatron-LM #3812)

## Summary

Megatron-LM PR [NVIDIA/Megatron-LM#3812](https://github.com/NVIDIA/Megatron-LM/pull/3812) refactors `DistributedDataParallel` to accept a `full_param_layout` argument describing how parameters and gradients are mapped in `_ParamAndGradBuffer`. Distributed optimizers compute this mapping via a static `compute_full_param_layout` method.

MBridge will need to pass `full_param_layout` into DDP to fully support this change.

## Urgency

**Not pressing.** DDP currently falls back to the existing behavior when `full_param_layout` is not passed (`_compute_default_per_buffer_param_layout`). However, Deepak plans to remove that fallback code in a future cleanup pass, at which point MBridge must supply `full_param_layout` or DDP initialization will break.

## What needs to happen

1. Understand the `full_param_layout` / `PerBufferParamLayout` / `BufferKey` dataclasses introduced in `param_layout.py`.
2. Wire up `DistributedOptimizer.compute_full_param_layout()` in MBridge's training initialization and pass the result to DDP.
3. Update any MBridge code that constructs `DistributedDataParallel` to forward the layout.
4. Test with distributed optimizer enabled to verify gradient buffer layout matches expectations.

## Context

From Slack discussion with @deepakn94 (2026-04-22). Should be addressed before the fallback path is removed upstream.

cc @deepakn94

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass full_param_layout into DDP (Megatron-LM #3812) #3483

Summary

Urgency

What needs to happen

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pass full_param_layout into DDP (Megatron-LM #3812) #3483

Description

Summary

Urgency

What needs to happen

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions