feat: PyTorch Extras Container#26
Merged
Merged
Conversation
Based off of coreweave/ml-containers PR coreweave#21, with application-specific parts removed, and more precompiled DeepSpeed ops and flash-attn components included.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
torch-extrasContainerThis PR adds a new container named
ml-containers/torch-extras, which isml-containers/torchwith supplementary libraries DeepSpeed and flash-attention.The code is originally based off of #21, but significantly more generalized and with the finetuner application-specific parts removed.
Rationale
DeepSpeed and flash-attention both require CUDA development tools to install properly. This complicates using them with anything but an
nvidia/cuda:...-develbased image. Optionally including them with ourml-containers/torchcontainers allows for still-lightweight images that can use those powerful libraries without the full CUDA development toolkit. It also reduces compile time for downstream Dockerfiles, since flash-attention takes a long time to compile at whatever step it is included.Structure
ml-containers/torch-extrais separated out as a separate container, unlike the tag-differentiatedtorch:baseandtorch:ncclflavours of the baseline torch image. These are simply layers on top of thetorch:baseandtorch:ncclimages, and are built as a second CI step immediately after either of those two are built.Since compatibility of DeepSpeed and flash-attention may lag behind PyTorch releases themselves, the secondary step to build these images can be temporarily disabled via flags in
torch-base.ymlandtorch-nccl.ymluntil they become compatible.I welcome comments and suggestions on this build process and structure, because it requires tradeoffs. It guarantees that the
torch-extrascontainers are always built, whenever possible, on newtorchimage updates, but it makes it more difficult to build thetorch-extrascontainers standalone, if desired.