Skip to content

meta-pytorch/torchcodec

Installation | Simple Example | Detailed Example | Documentation | Contributing | License

TorchCodec

TorchCodec is a Python library for decoding video and audio data into PyTorch tensors, on CPU and CUDA GPU. It also supports video and audio encoding on CPU! It aims to be fast, easy to use, and well integrated into the PyTorch ecosystem. If you want to use PyTorch to train ML models on videos and audio, or run inference, TorchCodec is how you turn these into data.

We achieve these capabilities through:

  • Pythonic APIs that mirror Python and PyTorch conventions.
  • Relying on FFmpeg to do the decoding and encoding. TorchCodec uses the version of FFmpeg you already have installed. FFmpeg is a mature library with broad coverage available on most systems. It is, however, not easy to use. TorchCodec abstracts FFmpeg's complexity to ensure it is used correctly and efficiently.
  • Returning data as PyTorch tensors, ready to be fed into PyTorch transforms or used directly to train models.

Usage Examples

Below are some examples of what you can do with TorchCodec. For more detailed examples and more use-cases, check out our documentation!

Video Decoding

from torchcodec.decoders import VideoDecoder

device = "cpu"  # or e.g. "cuda" !
decoder = VideoDecoder("path/to/video.mp4", device=device)

decoder.metadata
# VideoStreamMetadata:
#   num_frames: 250
#   duration_seconds: 10.0
#   bit_rate: 31315.0
#   codec: h264
#   average_fps: 25.0
#   ... (truncated output)

# Simple Indexing API
decoder[0]  # uint8 tensor of shape [C, H, W]
decoder[0 : -1 : 20]  # uint8 stacked tensor of shape [N, C, H, W]

# Indexing, with PTS and duration info:
decoder.get_frames_at(indices=[2, 100])
# FrameBatch:
#   data (shape): torch.Size([2, 3, 270, 480])
#   pts_seconds: tensor([0.0667, 3.3367], dtype=torch.float64)
#   duration_seconds: tensor([0.0334, 0.0334], dtype=torch.float64)

# Time-based indexing with PTS and duration info
decoder.get_frames_played_at(seconds=[0.5, 10.4])
# FrameBatch:
#   data (shape): torch.Size([2, 3, 270, 480])
#   pts_seconds: tensor([ 0.4671, 10.3770], dtype=torch.float64)
#   duration_seconds: tensor([0.0334, 0.0334], dtype=torch.float64)

You can use the following snippet to generate a video with FFmpeg and try out the VideoDecoder:

ffmpeg -f lavfi -i testsrc2=size=640x400:duration=10:rate=25 /tmp/output_video.mp4

Encoding

from torchcodec.encoders import Encoder

encoder = Encoder()
video_stream = encoder.add_video(
    height=height, width=width, frame_rate=frame_rate,
)
audio_stream = encoder.add_audio(
    sample_rate=sample_rate, num_channels=num_channels,
)
with encoder.open_file("output.mp4"):
    video_stream.add_frames(frames_batch_0)
    audio_stream.add_samples(samples_batch_0)
    video_stream.add_frames(frames_batch_1)
    audio_stream.add_samples(samples_batch_1)
    # ...

Installing TorchCodec

  1. Install FFmpeg, if it's not already installed. TorchCodec supports all major FFmpeg versions in [4, 8]. Linux distributions usually come with FFmpeg pre-installed. You'll need FFmpeg that comes with separate shared libraries. This is especially relevant for Windows users: these are usually called the "shared" releases.

    If FFmpeg is not already installed, or you need a more recent version, an easy way to install it is to use conda:

    conda install "ffmpeg"
    # or
    conda install "ffmpeg" -c conda-forge
  2. Install PyTorch and TorchCodec:

    pip install torch torchcodec

    That's it! On Linux x86 and aarch64, this will install CUDA-enabled wheels by default (matching the default behavior of pip install torch). These wheels should still work even if you do not have a GPU on your machine. On macOS and Windows this will install CPU-only wheels. CPU wheels are available for Linux (x86_64 and aarch64), macOS, and Windows.

    For other versions of PyTorch, refer to the compatibility table below.

CUDA support

CUDA-enabled wheels are installed by default on Linux. For Windows, you'll need to pass --index-url as described below.

Make sure you have a GPU with NVDEC hardware that can decode the format you want. Refer to Nvidia's GPU support matrix here.

You will need the libnpp and libnvrtc CUDA libraries, which are usually part of the CUDA Toolkit.

To select a specific CUDA Toolkit version, use --index-url. Make sure to install the corresponding PyTorch version as well (refer to the official instructions):

# This corresponds to CUDA Toolkit version 13.0.
pip install torch torchcodec --index-url=https://download.pytorch.org/whl/cu130

Make sure your FFmpeg has NVDEC support:

ffmpeg -decoders | grep -i nvidia
# This should show a line like this:
# V..... h264_cuvid           Nvidia CUVID H264 decoder (codec h264)

To check that FFmpeg libraries work with NVDEC correctly you can decode a generated test video:

ffmpeg -hwaccel cuda -hwaccel_output_format cuda -f lavfi -i testsrc2=duration=1 -f null -

CPU-only installation

To install CPU-only wheels explicitly (e.g. on Linux where CUDA wheels are the default):

pip install torchcodec --index-url=https://download.pytorch.org/whl/cpu

Compatibility

The following table indicates the compatibility between versions of torchcodec, torch and Python.

torchcodec torch Python
main / nightly main / nightly >=3.10, <=3.14
0.13 >=2.11 >=3.10, <=3.14
0.12 >=2.11 >=3.10, <=3.14
0.11 2.11 >=3.10, <=3.14
0.10 2.10 >=3.10, <=3.14
older versions
torchcodec torch Python
0.9 2.9 >=3.10, <=3.14
0.8 2.9 >=3.10, <=3.13
0.7 2.8 >=3.9, <=3.13
0.6 2.8 >=3.9, <=3.13
0.5 2.7 >=3.9, <=3.13
0.4 2.7 >=3.9, <=3.13
0.3 2.7 >=3.9, <=3.13
0.2 2.6 >=3.9, <=3.13
0.1 2.5 >=3.9, <=3.12
0.0.3 2.4 >=3.8, <=3.12

Contributing

We welcome contributions to TorchCodec! Please see our contributing guide for more details.

License

TorchCodec is released under the BSD 3 license.

However, TorchCodec may be used with code not written by Meta which may be distributed under different licenses.

For example, if you build TorchCodec with ENABLE_CUDA=1 or use the CUDA-enabled release of torchcodec, please review CUDA's license here: Nvidia licenses.