RepoCoreWeaveCoreWeavepublished Oct 31, 2022seen 6d

coreweave/ml-containers

Dockerfile

Open original ↗

Captured source

source ↗
published Oct 31, 2022seen 6dcaptured 8hhttp 200method plain

coreweave/ml-containers

Language: Dockerfile

License: MIT

Stars: 50

Forks: 7

Open issues: 8

Created: 2022-10-31T23:24:44Z

Pushed: 2026-06-10T14:49:02Z

Default branch: main

Fork: no

Archived: no

README:

ml-containers

Repository for building ML images at CoreWeave

Index

See the list of all published images.

Special PyTorch Images:

  • [PyTorch Base Images](#pytorch-base-images)
  • [PyTorch Extras](#pytorch-extras)
  • [PyTorch Nightly](#pytorch-nightly)

PyTorch Base Images

CoreWeave provides custom builds of PyTorch, `torchvision` and `torchaudio` tuned for our platform in a single container image, `ml-containers/torch`.

Versions compiled against CUDA 11.8.0, 12.0.1, 12.1.1, and 12.2.2 are available in this repository, with two variants:

1. base: Tagged as ml-containers/torch:a1b2c3d-base-.... 1. Built from `nvidia/cuda:...-base-ubuntu22.04` as a base. 2. Only includes essentials (CUDA, torch, torchvision, torchaudio), so it has a small image size, making it fast to launch. 2. nccl: Tagged as ml-containers/torch:a1b2c3d-nccl-.... 1. Built from `ghcr.io/coreweave/nccl-tests` as a base. 2. Ultimately inherits from `nvidia/cuda:...-cudnn8-devel-ubuntu22.04`. 3. Larger, but includes development libraries and build tools such as nvcc necessary for compiling other PyTorch extensions. 4. These PyTorch builds are built on component libraries optimized for the CoreWeave cloud—see `coreweave/nccl-tests`.

> [!NOTE] > Most torch images have both a variant built on Ubuntu 22.04 and a variant built on Ubuntu 20.04. > - CUDA 11.8.0 is an exception, and is only available on Ubuntu 20.04. > - Ubuntu 22.04 images use Python 3.10. > - Ubuntu 20.04 images use Python 3.8. > - The base distribution is indicated in the container image tag.

PyTorch Extras

`ml-containers/torch-extras` extends the `ml-containers/torch` images with a set of common PyTorch extensions:

1. DeepSpeed 2. FlashAttention 3. NVIDIA Apex

Each one is compiled specially against the custom PyTorch builds in `ml-containers/torch`.

Both base and nccl editions are available for `ml-containers/torch-extras` matching those for `ml-containers/torch`. The base edition retains a small size, as a multi-stage build is used to avoid including CUDA development libraries in it, despite those libraries being required to build the extensions themselves.

PyTorch Nightly

`ml-containers/nightly-torch` is an experimental, nightly release channel of the [PyTorch Base Images](#pytorch-base-images) in the style of PyTorch's own nightly preview builds, featuring the latest development versions of torch, torchvision, and torchaudio pulled daily from GitHub and compiled from source.

`ml-containers/nightly-torch-extras` is a version of [PyTorch Extras](#pytorch-extras) built on top of the `ml-containers/nightly-torch` container images. These are not nightly versions of the extensions themselves, but rather match the extension versions in the regular [PyTorch Extras](#pytorch-extras) containers.

> ⚠ The *PyTorch Nightly* containers are based on unstable, experimental preview builds of PyTorch, and should be expected to contain bugs and other issues. > For more stable containers use the [PyTorch Base Images](#pytorch-base-images) > and [PyTorch Extras](#pytorch-extras) containers.

Organization

This repository contains multiple container image Dockerfiles, each is expected to be within its own folder along with any other needed files for the build.

CI Builds (Actions)

The current CI builds are set up to run when changes to files in the respective folders are detected so that only the changed container images are built. The actions are set up with an action per image utilizing a reusable base action [build.yml](.github/workflows/build.yml). The reusable action accepts several inputs:

  • folder - the folder containing the dockerfile for the image
  • image-name - the name to use for the image
  • build-args - arguments to pass to the docker build

Images built using the same source can utilize one action as the main reason for the multiple actions is to handle only building the changed images. A build matrix can be helpful for these cases https://docs.github.com/en/actions/using-jobs/using-a-matrix-for-your-jobs.