feat: PyTorch Extras Container by Eta0 · Pull Request #26 · coreweave/ml-containers

Eta0 · 2023-06-29T16:38:18Z

`torch-extras` Container

This PR adds a new container named ml-containers/torch-extras, which is ml-containers/torch with supplementary libraries DeepSpeed and flash-attention.
The code is originally based off of #21, but significantly more generalized and with the finetuner application-specific parts removed.

Rationale

DeepSpeed and flash-attention both require CUDA development tools to install properly. This complicates using them with anything but an nvidia/cuda:...-devel based image. Optionally including them with our ml-containers/torch containers allows for still-lightweight images that can use those powerful libraries without the full CUDA development toolkit. It also reduces compile time for downstream Dockerfiles, since flash-attention takes a long time to compile at whatever step it is included.

Structure

ml-containers/torch-extra is separated out as a separate container, unlike the tag-differentiated torch:base and torch:nccl flavours of the baseline torch image. These are simply layers on top of the torch:base and torch:nccl images, and are built as a second CI step immediately after either of those two are built.
Since compatibility of DeepSpeed and flash-attention may lag behind PyTorch releases themselves, the secondary step to build these images can be temporarily disabled via flags in torch-base.yml and torch-nccl.yml until they become compatible.

I welcome comments and suggestions on this build process and structure, because it requires tradeoffs. It guarantees that the torch-extras containers are always built, whenever possible, on new torch image updates, but it makes it more difficult to build the torch-extras containers standalone, if desired.

Based off of coreweave/ml-containers PR coreweave#21, with application-specific parts removed, and more precompiled DeepSpeed ops and flash-attn components included.

wbrown

👍

Eta0 added 2 commits June 16, 2023 19:53

build: Add workflow outputs to .github/workflows/build.yml

4de7851

feat(torch-extras): Add torch-extras container

a8aa08c

Based off of coreweave/ml-containers PR coreweave#21, with application-specific parts removed, and more precompiled DeepSpeed ops and flash-attn components included.

Eta0 added the enhancement New feature or request label Jun 29, 2023

Eta0 requested a review from wbrown June 29, 2023 16:38

Eta0 self-assigned this Jun 29, 2023

wbrown approved these changes Jul 17, 2023

View reviewed changes

wbrown merged commit 949759a into coreweave:main Jul 17, 2023

Eta0 mentioned this pull request Jul 18, 2023

feat(torch-extras): Add NVIDIA Apex #28

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: PyTorch Extras Container#26

feat: PyTorch Extras Container#26
wbrown merged 2 commits into
coreweave:mainfrom
Eta0:es/torch-extras

Eta0 commented Jun 29, 2023 •

edited

Loading

Uh oh!

wbrown left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Eta0 commented Jun 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

torch-extras Container

Rationale

Structure

Uh oh!

wbrown left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Eta0 commented Jun 29, 2023 •

edited

Loading

`torch-extras` Container