Skip to content

Latest commit

 

History

History
308 lines (243 loc) · 12.7 KB

File metadata and controls

308 lines (243 loc) · 12.7 KB

Contributing to TorchJD

This document explains how to contribute to TorchJD.

Getting Started

  • Minor changes (bug fixes, documentation, small improvements): Open a pull request directly following the guidelines in this document.
  • Significant or major changes (new features, API changes, architectural decisions): Join the SimplexLab Discord server, introduce yourself and your idea, and discuss it with the community to determine if and how it fits within the project's goals before implementing.

Code Ownership

This project uses a CODEOWNERS file to automatically assign reviewers to pull requests based on which files are changed. The code owners are the people or groups who created or maintain specific parts of the codebase.

When you open a pull request, GitHub will automatically request reviews from the relevant code owners for the files you've modified. This ensures that changes are reviewed by the people most familiar with the affected code.

Installation

TorchJD provides a devcontainer configuration that gives you a ready-to-use development environment with all dependencies pre-installed. This is the recommended way to set up your environment.

Using the devcontainer

  1. Install Docker and VS Code with the Dev Containers extension.
  2. Clone the repository and open it in VS Code:
    git clone https://github.com/SimplexLab/TorchJD.git
    code TorchJD
  3. When prompted, click "Reopen in Container" (or press Ctrl+Shift+P and run "Dev Containers: Reopen in Container"). The container will build on first launch and install all dependencies automatically.

Alternatively, you can use GitHub Codespaces directly from the repository page — no local setup required.

The devcontainer includes:

  • Python 3.14 with all TorchJD dependencies (including optional ones)
  • All development tools: uv, ruff, ty, pre-commit, pytest
  • Pre-configured VS Code extensions and settings
  • PYTHONPATH set up for the tests/ folder

GPU support

If you have an NVIDIA GPU and want to run tests on CUDA, add the following to .devcontainer/devcontainer.json inside the top-level object:

"runArgs": ["--gpus", "all"]

CUDA-specific PyTorch installation

By default, the devcontainer installs the CPU-only version of PyTorch. If you need a CUDA version, run the following inside the container:

uv pip install --python-version=3.14 -e '.[full]' --group check --group doc --group test --group plot --index-strategy unsafe-best-match --extra-index-url https://download.pytorch.org/whl/cu126

(replace cu126 with your CUDA version).

Adding LaTeX (for trajectory plotting)

The trajectory plotting scripts require LaTeX. Install it inside the container with:

sudo apt-get update && sudo apt-get install -y texlive-latex-extra texlive-fonts-recommended dvipng cm-super

Manual setup (without devcontainer)

If you prefer a local setup, you'll need:

  • uv
  • Python 3.14 (install via uv python install 3.14.0)
  • gcc (for compiling ecos)

Then, from the repo root:

uv venv
CC=gcc uv pip install --python-version=3.14 -e '.[full]' --group check --group doc --group test --group plot
export UV_NO_SYNC=1
export PYTHONPATH="$PYTHONPATH:$PWD/tests"
uv run pre-commit install

Tip

In the following commands, you can get rid of the uv run prefix if you activate the venv created by uv, using source .venv/bin/activate from the root of TorchJD. This will, however, only work in the current terminal until it is closed.

Clean reinstallation

If you want to update all dependencies or just reinstall from scratch, run the following inside the container:

rm -rf /opt/venv
uv venv /opt/venv
uv pip install --python-version=3.14 -e '.[full]' --group check --group doc --group test --group plot
uv run pre-commit install

Checks

Running tests

  • To verify that your installation was successful, and that unit tests pass, run:

    uv run pytest tests/unit
  • To also run the unit tests that are marked as slow, add the --runslow flag:

    uv run pytest tests/unit --runslow
  • If you have access to a cuda-enabled GPU, you should also check that the unit tests pass on it:

    PYTEST_TORCH_DEVICE=cuda:0 uv run pytest tests/unit
  • To check that the usage examples from docstrings and .rst files are correct, run:

    uv run make doctest -C docs
  • To compute the code coverage locally, you should run the unit tests with the --cov flag:

    uv run pytest tests/unit --cov=src

    [!TIP] The code coverage value reported locally is lower than the value that our CI obtains, because the CI runs the tests in several different environments.

Building the documentation locally

  • Run:
    uv run make clean -C docs
    uv run make html -C docs
  • You can then open docs/build/html/index.html with a web browser.

Type checking

We use ty for type-checking. If you're on VSCode, we recommend using the ty extension. You can also run it from the root of the repo with:

uv run ty check

Development guidelines

The following guidelines should help preserve a good code quality in TorchJD. Contributions that do not respect these guidelines will still be greatly appreciated but will require more work from maintainers to be merged.

Documentation

Most source Python files in TorchJD have a corresponding .rst in docs/source. Please make sure to add such a documentation entry whenever you add a new public module. In most cases, public classes should contain a usage example in their docstring. We also ask contributors to add an entry in the [Unreleased] section of the changelog whenever they make a change that may affect users (we do not report internal changes). If this section does not exist yet (right after a release), you should create it.

Testing

We ask contributors to implement the unit tests necessary to check the correctness of their implementations. We aim for 100% code coverage, but we greatly appreciate any PR, even with insufficient code coverage. To ensure that the tensors generated during the tests are on the right device and dtype, you have to use the partial functions defined in tests/utils/tensors.py to instantiate tensors. For instance, instead of

import torch
a = torch.ones(3, 4)

use

from utils.tensors import ones_
a = ones_(3, 4)

This will automatically call torch.ones with device=DEVICE. This way, your test will automatically be run on cuda when running it with the PYTEST_TORCH_DEVICE=cuda:0 environment variable, and will automatically be run on float64 with PYTEST_TORCH_DTYPE=float64. If the function you need does not exist yet as a partial function in tensors.py, add it. Lastly, when you create a model or a random generator, you have to move them manually to the right device (the DEVICE defined in settings.py).

import torch
from torch.nn import Linear
from settings import DEVICE

model = Linear(3, 4).to(device=DEVICE)
rng = torch.Generator(device=DEVICE)

You may also use a ModuleFactory to make the modules on DEVICE automatically.

Coding style

We try to keep the quality of the codebase as high as possible. Even if this slows down development in the short term, it helps a lot in the long term. To make the code easy to understand and to maintain, we try to keep it simple, and to stick as much as possible to the SOLID principles. Try to preserve the existing coding style of the library when adding new sources. Also, please make sure that new modules are imported by the __init__.py file of the package they are located into. This makes them easier to import for the user.

Adding a new aggregator

Mathematically, an aggregator is a mapping $\mathcal A: \mathbb R^{m \times n} \to \mathbb R^n$. In the context of Jacobian descent, it is used to reduce a Jacobian matrix into a vector that can be used to update the parameters. In TorchJD, an Aggregator subclass should be a faithful implementation of a mathematical aggregator.

Note

We also accept stateful aggregators, whose output depends both on the Jacobian and on some internal state (which can be affected for example by previous Jacobians). Such aggregators should inherit from the Stateful mixin and implement a reset method.

Note

Some aggregators may depend on something else than the Jacobian. To implement them, please add setters so that any extra information can be given by the user to the aggregator before it is actually used. For example, if your aggregator needs to take the loss values, this can be given at every iteration through a set_losses setter.

Note

Before working on the implementation of a new aggregator, please contact us via an issue or a discussion: in many cases, we have already thought about it, or even started an implementation.

Deprecation

To deprecate some public functionality, make it raise a DeprecationWarning. A test should also be added in tests/unit/test_deprecations.py, ensuring that this warning is issued.

Trajectories

The tests/trajectories/ directory contains scripts to generate and visualize optimization trajectories using various aggregators on simple multi-objective problems. They require the plot dependency group.

Available objective keys: EWQ, CQF, HQF.

Available aggregator keys: upgrad, mgda, cagrad, nashmtl, graddrop, imtl_g, aligned_mtl, dualproj, pcgrad, random, mean.

Step 1 — Optimize: run the optimization for an objective and a selection of aggregators:

uv run python tests/trajectories/optimize.py EWQ upgrad mean mgda cagrad dualproj graddrop imtl_g aligned_mtl nashmtl random

This saves trajectory data under tests/trajectories/results/ (gitignored).

Step 2 — Plot: generate the plots from the saved trajectories:

export MPLBACKEND=Agg
uv run python tests/trajectories/plot_params.py EWQ
uv run python tests/trajectories/plot_values.py EWQ
uv run python tests/trajectories/plot_distance_to_pf.py EWQ

To run everything:

export MPLBACKEND=Agg
uv run python tests/trajectories/optimize.py EWQ upgrad mean mgda cagrad dualproj graddrop imtl_g aligned_mtl nashmtl random
uv run python tests/trajectories/plot_params.py EWQ
uv run python tests/trajectories/plot_values.py EWQ
uv run python tests/trajectories/plot_distance_to_pf.py EWQ
uv run python tests/trajectories/optimize.py CQF upgrad mean mgda cagrad dualproj graddrop imtl_g aligned_mtl nashmtl random
uv run python tests/trajectories/plot_params.py CQF
uv run python tests/trajectories/plot_values.py CQF
uv run python tests/trajectories/plot_distance_to_pf.py CQF
uv run python tests/trajectories/optimize.py HQF upgrad mean mgda cagrad dualproj graddrop imtl_g aligned_mtl nashmtl random
uv run python tests/trajectories/plot_params.py HQF
uv run python tests/trajectories/plot_values.py HQF
uv run python tests/trajectories/plot_distance_to_pf.py HQF

The three plot scripts produce PDFs saved to tests/trajectories/results/<objective>/.

Note

The plot scripts require a LaTeX installation for rendering: sudo apt-get install texlive-latex-extra texlive-fonts-recommended dvipng cm-super

Release

This section is addressed to maintainers.

To release a new torchjd version, you have to:

  • If the release introduces changes to the interface, make sure that README.md reflects those changes.
  • Make sure that all tests, including those on cuda, pass (for this, you need access to a machine that has a cuda-enabled GPU).
  • Make sure that all important changes since the last release have been reported in the [Unreleased] section at the top of the changelog.
  • Add a [X.Y.Z] - yyyy-mm-dd header in the changelog just below the [Unreleased] header.
  • Change the version in pyproject.toml.
  • Make a pull request with those changes and merge it.
  • Make a draft of the release on GitHub (click on Releases, then Draft a new release, then fill the details).
  • Publish the release (click on Publish release). This should trigger the deployment of the new version on PyPI and the building and deployment of the documentation on github-pages.
  • Check that the new version is correctly deployed to PyPI, that it is installable and that it works.
  • Check that the documentation has been correctly deployed.