Skip to content

Commit 34786d8

Browse files
authored
Update feature statuses and add reproducibility section
Updated the status of several features from 'prototype' to 'supported' in the README, and added a new section for reproducibility and extensibility features.
1 parent 9253ad1 commit 34786d8

1 file changed

Lines changed: 17 additions & 6 deletions

File tree

README.md

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -275,11 +275,12 @@ In the following, we list the most important features of Modalities.
275275
| Memmap for efficient data loading | supported | Optimizes the data pipeline to reduce I/O bottlenecks. |
276276
| Activation Checkpointing | supported | Saves intermediate activations to memory only at certain points during the forward pass and recomputes them during the backward pass, reducing memory usage at the cost of additional computation. |
277277
| Flash Attention | supported | A highly optimized attention mechanism that significantly reduces the computational burden and memory footprint of attention calculations, enabling faster training and inference on large models. |
278-
| Tensor Parallelism | prototype | Implementing vertical model sharding, as an efficient model parallelism technique |
279-
| Sequence Parallelism | prototype | Variant of Tensor Parallelism that shard on the sequence dimension |
280-
| FSDP 2 | prototype | Improved version of the original FSDP |
281-
| Torch Compile | prototype | Speeds up tensor operations by JIT compiling tensor operations into optimized kernels |
282-
| Deferred Initialisation | prototype | Instead of instantiating the model in CPU RAM, the modules are instantiated as fake tensors and operations are recorded. Once sharded (e.g., via FSDP), each rank only instantiates the local tensors by replaying the tensor operations. |
278+
| Tensor Parallelism | supported | Implementing vertical model sharding, as an efficient model parallelism technique |
279+
| Sequence Parallelism | supported | Variant of Tensor Parallelism that shard on the sequence dimension |
280+
| Pipeline Parallelism | supported | Support for GPipe. Alternative schedules such as (interleaved) 1F1B are being implemented. |
281+
| FSDP 2 | supported | Improved version of the original FSDP |
282+
| Torch Compile | supported | Speeds up tensor operations by JIT compiling tensor operations into optimized kernels |
283+
| Deferred Initialisation | supported | Instead of instantiating the model in CPU RAM, the modules are instantiated as fake tensors and operations are recorded. Once sharded (e.g., via FSDP), each rank only instantiates the local tensors by replaying the tensor operations. |
283284
| Adaptive Batch Size Exploration | planned | Dynamically increases the training batch size during the training process to identify the maximum batch size that can be accommodated by a given GPU setup without causing memory overflow or performance degradation. |
284285
| Node Failure Recovery | planned | Implements mechanisms to automatically detect and recover from failures (e.g., node or GPU failures) in distributed training environments, ensuring that training can continue with minimal interruption even if one or more nodes / GPUs in the cluster fail. |
285286
| Loss Parallelism | planned | Reduces memory footprint and communication overhead by computing the loss locally on each rank. |
@@ -302,6 +303,16 @@ In the following, we list the most important features of Modalities.
302303
| Knowledge Distillation | planned | Transfers knowledge from a larger, complex model to a smaller, more efficient model, improving the smaller model's performance without the computational cost of the larger model. |
303304
| Hyperparameter Optimization | planned | Grid search for various hyperparameter such as LR, Optimizer arguments etc. Also the integration of µP might be interesting |
304305

306+
### Reproducibility & Extensibility Features
307+
308+
| Name | Status | Description |
309+
|-------------------------------------|-----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
310+
| Self-contained Configurations | supported | Every experiment configuration fully specifies all components, hyperparameters, and seeds, ensuring that experiments are reproducible by design without requiring external context or hidden state. |
311+
| Registry for Custom Components | supported | Modalities uses a registry-based architecture where all components implement generic interfaces, enabling seamless replacement or extension with (custom) modules at runtime. |
312+
| Generic Benchmarking | supported | Supports systematic grid searches over arbitrary parameters to benchmark throughput, memory footprint, and downstream performance across model, data, and system configurations. |
313+
314+
315+
305316

306317
## Scaling Experiments
307318

@@ -394,4 +405,4 @@ Thanks so much to all of our amazing contributors!
394405
howpublished={\url{https://github.com/Modalities/modalities}},
395406
url="https://github.com/Modalities/modalities",
396407
}
397-
408+

0 commit comments

Comments
 (0)