Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Added `OverrideDecay`, a late-stage decay override usable on both `ComposableScheduler` and `SequentialScheduler` via an `override_decay` field. When `current >= override_decay.start`, the main schedule is interrupted mid-flight and the LR decays from the value the main schedule would have produced at `start` to a target LR over `duration` (linear or cosine). `SequentialScheduler` additionally warns that `t_max` is ignored once the override becomes active.
- `OLMO_RICH_LOGGING` can now explicitly enable *or* disable rich console logging (`0`/`false`/`no`/`off` disables it); previously setting it to any value only force-enabled rich logging.
- `init_distributed()` now bootstraps a minimal single-process environment (`RANK=0`, `WORLD_SIZE=1`, `MASTER_ADDR`/`MASTER_PORT`) when launch env vars are absent, so scripts can be run directly (without `torchrun`) for single-process debugging.
- Added `MultiGroupDistributedDataParallel` (`olmo_core.nn.parallel`), a data-parallel wrapper that accumulates gradients into flat bucket views and supports per-parameter process groups (`param_process_group_fn`), overlapped bucketed all-reduce (finalized via `finalize_grad_reduce()`), and optional fp32 gradient accumulation/reduction.


### Fixed
Expand Down
1 change: 1 addition & 0 deletions docs/source/nn/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,5 +15,6 @@
layer_norm
lm_head
moe
parallel
rope
transformer
6 changes: 6 additions & 0 deletions docs/source/nn/parallel.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
``nn.parallel``
===============

.. automodule:: olmo_core.nn.parallel
:members:
:member-order: bysource
7 changes: 7 additions & 0 deletions src/olmo_core/nn/parallel/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
"""
Data-parallel wrappers.
"""

from .distributed import MultiGroupDistributedDataParallel

__all__ = ["MultiGroupDistributedDataParallel"]
Loading
Loading