Installation | Simple Example | Detailed Example | Documentation | Contributing | License
TorchCodec is a Python library for decoding video and audio data into PyTorch tensors, on CPU and CUDA GPU. It also supports video and audio encoding on CPU! It aims to be fast, easy to use, and well integrated into the PyTorch ecosystem. If you want to use PyTorch to train ML models on videos and audio, or run inference, TorchCodec is how you turn these into data.
We achieve these capabilities through:
- Pythonic APIs that mirror Python and PyTorch conventions.
- Relying on FFmpeg to do the decoding and encoding. TorchCodec uses the version of FFmpeg you already have installed. FFmpeg is a mature library with broad coverage available on most systems. It is, however, not easy to use. TorchCodec abstracts FFmpeg's complexity to ensure it is used correctly and efficiently.
- Returning data as PyTorch tensors, ready to be fed into PyTorch transforms or used directly to train models.
Below are some examples of what you can do with TorchCodec. For more detailed examples and more use-cases, check out our documentation!
from torchcodec.decoders import VideoDecoder
device = "cpu" # or e.g. "cuda" !
decoder = VideoDecoder("path/to/video.mp4", device=device)
decoder.metadata
# VideoStreamMetadata:
# num_frames: 250
# duration_seconds: 10.0
# bit_rate: 31315.0
# codec: h264
# average_fps: 25.0
# ... (truncated output)
# Simple Indexing API
decoder[0] # uint8 tensor of shape [C, H, W]
decoder[0 : -1 : 20] # uint8 stacked tensor of shape [N, C, H, W]
# Indexing, with PTS and duration info:
decoder.get_frames_at(indices=[2, 100])
# FrameBatch:
# data (shape): torch.Size([2, 3, 270, 480])
# pts_seconds: tensor([0.0667, 3.3367], dtype=torch.float64)
# duration_seconds: tensor([0.0334, 0.0334], dtype=torch.float64)
# Time-based indexing with PTS and duration info
decoder.get_frames_played_at(seconds=[0.5, 10.4])
# FrameBatch:
# data (shape): torch.Size([2, 3, 270, 480])
# pts_seconds: tensor([ 0.4671, 10.3770], dtype=torch.float64)
# duration_seconds: tensor([0.0334, 0.0334], dtype=torch.float64)You can use the following snippet to generate a video with FFmpeg and try out
the VideoDecoder:
ffmpeg -f lavfi -i testsrc2=size=640x400:duration=10:rate=25 /tmp/output_video.mp4from torchcodec.encoders import Encoder
encoder = Encoder()
video_stream = encoder.add_video(
height=height, width=width, frame_rate=frame_rate,
)
audio_stream = encoder.add_audio(
sample_rate=sample_rate, num_channels=num_channels,
)
with encoder.open_file("output.mp4"):
video_stream.add_frames(frames_batch_0)
audio_stream.add_samples(samples_batch_0)
video_stream.add_frames(frames_batch_1)
audio_stream.add_samples(samples_batch_1)
# ...-
Install FFmpeg, if it's not already installed. TorchCodec supports all major FFmpeg versions in [4, 8]. Linux distributions usually come with FFmpeg pre-installed. You'll need FFmpeg that comes with separate shared libraries. This is especially relevant for Windows users: these are usually called the "shared" releases.
If FFmpeg is not already installed, or you need a more recent version, an easy way to install it is to use
conda:conda install "ffmpeg" # or conda install "ffmpeg" -c conda-forge
-
Install PyTorch and TorchCodec:
pip install torch torchcodec
That's it! On Linux x86 and aarch64, this will install CUDA-enabled wheels by default (matching the default behavior of
pip install torch). These wheels should still work even if you do not have a GPU on your machine. On macOS and Windows this will install CPU-only wheels. CPU wheels are available for Linux (x86_64 and aarch64), macOS, and Windows.For other versions of PyTorch, refer to the compatibility table below.
CUDA-enabled wheels are installed by default on Linux. For Windows, you'll need
to pass --index-url as described below.
Make sure you have a GPU with NVDEC hardware that can decode the format you want. Refer to Nvidia's GPU support matrix here.
You will need the libnpp and libnvrtc CUDA libraries, which are usually
part of the CUDA Toolkit.
To select a specific CUDA Toolkit version, use --index-url. Make sure to
install the corresponding PyTorch version as well (refer to the
official instructions):
# This corresponds to CUDA Toolkit version 13.0.
pip install torch torchcodec --index-url=https://download.pytorch.org/whl/cu130Make sure your FFmpeg has NVDEC support:
ffmpeg -decoders | grep -i nvidia
# This should show a line like this:
# V..... h264_cuvid Nvidia CUVID H264 decoder (codec h264)To check that FFmpeg libraries work with NVDEC correctly you can decode a generated test video:
ffmpeg -hwaccel cuda -hwaccel_output_format cuda -f lavfi -i testsrc2=duration=1 -f null -To install CPU-only wheels explicitly (e.g. on Linux where CUDA wheels are the default):
pip install torchcodec --index-url=https://download.pytorch.org/whl/cpuThe following table indicates the compatibility between versions of
torchcodec, torch and Python.
torchcodec |
torch |
Python |
|---|---|---|
main / nightly |
main / nightly |
>=3.10, <=3.14 |
0.13 |
>=2.11 |
>=3.10, <=3.14 |
0.12 |
>=2.11 |
>=3.10, <=3.14 |
0.11 |
2.11 |
>=3.10, <=3.14 |
0.10 |
2.10 |
>=3.10, <=3.14 |
older versions
torchcodec |
torch |
Python |
|---|---|---|
0.9 |
2.9 |
>=3.10, <=3.14 |
0.8 |
2.9 |
>=3.10, <=3.13 |
0.7 |
2.8 |
>=3.9, <=3.13 |
0.6 |
2.8 |
>=3.9, <=3.13 |
0.5 |
2.7 |
>=3.9, <=3.13 |
0.4 |
2.7 |
>=3.9, <=3.13 |
0.3 |
2.7 |
>=3.9, <=3.13 |
0.2 |
2.6 |
>=3.9, <=3.13 |
0.1 |
2.5 |
>=3.9, <=3.12 |
0.0.3 |
2.4 |
>=3.8, <=3.12 |
We welcome contributions to TorchCodec! Please see our contributing guide for more details.
TorchCodec is released under the BSD 3 license.
However, TorchCodec may be used with code not written by Meta which may be distributed under different licenses.
For example, if you build TorchCodec with ENABLE_CUDA=1 or use the CUDA-enabled release of torchcodec, please review CUDA's license here: Nvidia licenses.