Updated README file

KhusPatel4450 · web-flow · commit 99cc5f6a528c · 2026-06-29T09:54:31.000-04:00
diff --git a/README.md b/README.md
@@ -1,38 +1,51 @@
-<div align="center">
-  <img src="docs/source/_static/logo-light-mode.png" alt="TorchJD" width="400"/>
-</div>
+<picture>
+  <source media="(prefers-color-scheme: dark)" srcset="docs/source/_static/logo-dark-mode.png">
+  <source media="(prefers-color-scheme: light)" srcset="docs/source/_static/logo-light-mode.png">
+  <img alt="Fallback image description" src="docs/source/_static/logo-light-mode.png" width="400">
+</picture>
 
 ---
 
-[![Doc](https://img.shields.io/badge/Doc-torchjd.org-blue?logo=data%3Aimage%2Fsvg%2Bxml%3Bbase64%2CPDI5NGJXd2dkbVZ5YzJsdmJqMGlNUzR3SWlCbGJtTnZaR2x1WnowaVZWUkdMVGdpSUhOMFlXNWtZV3h2Ym1VOUltNXZJajglMkJDajtoUjNKbFlYUmxaQ0IxYzJsdVp5QkxjbWwwWVRvZ2FIUjBjRG92TDJ0eWFYUmhMbTl5WndBdExUNEtDand4TFMwZ1EzSmxZWFJsWkNCMWMybHVaeUJLY21sallUb2daWE1nYUhSMGNEb3ZMMnh2WTJGc2FRQXRMVDRLT0NBZ0lHZHBkSFJvUFNJeU1EUTNMamN5Y0hRaUNpQWdJR2hsYVdkb2REMGlNakEwTnk0M01uQjBJZ29nSUNCSllXd2dkR2hsUFNJeU1EUTNMamN5SURJd05EY3VNamtoQ2lBZ0lIWnBaWGRDYjNnOUlqQWdNQ0F5TURRdU55Y2dNakEwTnk0eU1TQXlNakF3TURBdElERXVNQ0F3SURNME5pNHlNVE1nTkRZdU9ESXpJREF3UXpVd01EQWdOVFl3TURBaUNnPT0=)](https://torchjd.org)
+[![Doc](https://img.shields.io/badge/Doc-torchjd.org-blue?logo=data%3Aimage%2Fsvg%2Bxml%3Bbase64%2CPD94bWwgdmVyc2lvbj0iMS4wIiBlbmNvZGluZz0iVVRGLTgiIHN0YW5kYWxvbmU9Im5vIj8%2BCjwhLS0gQ3JlYXRlZCB1c2luZyBLcml0YTogaHR0cDovL2tyaXRhLm9yZyAtLT4KCjxzdmcKICAgd2lkdGg9IjIwNDcuNzJwdCIKICAgaGVpZ2h0PSIyMDQ3LjcycHQiCiAgIHZpZXdCb3g9IjAgMCAyMDQ3LjcyIDIwNDcuNzIiCiAgIHZlcnNpb249IjEuMSIKICAgaWQ9InN2ZzEiCiAgIHNvZGlwb2RpOmRvY25hbWU9IlRvcmNoSkRfbG9nb19jaXJjdWxhci5zdmciCiAgIGlua3NjYXBlOnZlcnNpb249IjEuMy4yICgwOTFlMjBlZjBmLCAyMDIzLTExLTI1KSIKICAgeG1sbnM6aW5rc2NhcGU9Imh0dHA6Ly93d3cuaW5rc2NhcGUub3JnL25hbWVzcGFjZXMvaW5rc2NhcGUiCiAgIHhtbG5zOnNvZGlwb2RpPSJodHRwOi8vc29kaXBvZGkuc291cmNlZm9yZ2UubmV0L0RURC9zb2RpcG9kaS0wLmR0ZCIKICAgeG1sbnM9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIgogICB4bWxuczpzdmc9Imh0dHA6Ly93d3cudzMub3JnLzIwMDAvc3ZnIj4KICA8c29kaXBvZGk6bmFtZWR2aWV3CiAgICAgaWQ9Im5hbWVkdmlldzEiCiAgICAgcGFnZWNvbG9yPSIjZmZmZmZmIgogICAgIGJvcmRlcmNvbG9yPSIjNjY2NjY2IgogICAgIGJvcmRlcm9wYWNpdHk9IjEuMCIKICAgICBpbmtzY2FwZTpzaG93cGFnZXNoYWRvdz0iMiIKICAgICBpbmtzY2FwZTpwYWdlb3BhY2l0eT0iMC4wIgogICAgIGlua3NjYXBlOnBhZ2VjaGVja2VyYm9hcmQ9IjAiCiAgICAgaW5rc2NhcGU6ZGVza2NvbG9yPSIjZDFkMWQxIgogICAgIGlua3NjYXBlOmRvY3VtZW50LXVuaXRzPSJwdCIKICAgICBpbmtzY2FwZTp6b29tPSIwLjE2Mjk4NjE1IgogICAgIGlua3NjYXBlOmN4PSIxMzk1LjgyNDEiCiAgICAgaW5rc2NhcGU6Y3k9Ijg3NC4zMDczOSIKICAgICBpbmtzY2FwZTp3aW5kb3ctd2lkdGg9IjI1NjAiCiAgICAgaW5rc2NhcGU6d2luZG93LWhlaWdodD0iMTM3MSIKICAgICBpbmtzY2FwZTp3aW5kb3cteD0iMCIKICAgICBpbmtzY2FwZTp3aW5kb3cteT0iMCIKICAgICBpbmtzY2FwZTp3aW5kb3ctbWF4aW1pemVkPSIxIgogICAgIGlua3NjYXBlOmN1cnJlbnQtbGF5ZXI9InN2ZzEiIC8%2BCiAgPGRlZnMKICAgICBpZD0iZGVmczEiIC8%2BCiAgPHBhdGgKICAgICBpZD0ic2hhcGUxIgogICAgIGZpbGw9IiMwMDAwMDAiCiAgICAgZmlsbC1ydWxlPSJldmVub2RkIgogICAgIGQ9Ik0yNTUuMjE1IDg5OS44NzVMMjU1Ljk2NCAyNTUuOTY0TDc2Ny44OTMgMjU1Ljk2NEw3NjcuODkzIDBMMCAwTDAuMDMxMjUzMyA4OTguODQ0QzAuMDMxNzMwNSA4OTguODE0IDg0LjU3MjYgODk5Ljg3NSAyNTUuMjE1IDg5OS44NzVaIgogICAgIHN0eWxlPSJmaWxsOiMxYTgxZWI7ZmlsbC1vcGFjaXR5OjEiCiAgICAgdHJhbnNmb3JtPSJtYXRyaXgoMS4wMDAwMDAwMTQzMDcwNyAwIDAgMS4wMDAwMDAwMTQzMDcwNyAxMjcuOTgyMjI2NTIyMDU2IDEyNy45ODIyMjY1MjIwNTYpIiAvPgogIDxwYXRoCiAgICAgaWQ9InNoYXBlMDEiCiAgICAgdHJhbnNmb3JtPSJtYXRyaXgoLTEuMDAwMDAwMDA5MjIxODUgMCAwIC0xLjAwMDAwMDAwOTIyMTg1IDE5MTkuOTEzNjE3Mzk4NzEgMTkxMC4zMzcxOTY5MzEyNSkiCiAgICAgZmlsbD0iIzAwMDAwMCIKICAgICBmaWxsLXJ1bGU9ImV2ZW5vZGQiCiAgICAgZD0iTTc2OC4wNzQgMTc3Mi42MUMtMjgyLjAwNCAxNTk4LjY1IC0yMjkuNzEyIDE1MS44MjEgNzY4LjA3NCAwQzc2Ny4wODMgMjkuOTMzNyA3NjguMDk2IDE0Mi43NiA3NjguMDc0IDI2MC44ODZDNDEuNDc0NiA0NTYuOTAzIDEzNy40MjMgMTM4MC4wNiA3NjguMDc0IDE1MTMuNjQiCiAgICAgc3R5bGU9ImZpbGw6IzFhODFlYjtmaWxsLW9wYWNpdHk6MSIgLz4KICA8cGF0aAogICAgIGlkPSJzaGFwZTAyIgogICAgIGZpbGw9IiMwMDAwMDAiCiAgICAgZmlsbC1ydWxlPSJldmVub2RkIgogICAgIGQ9Ik03NjcuOTA5IDg4Ny4zMzhDMjYzLjQwMiA4MDMuOTI2IDAuMDc1OTQyMSAzODcuOTY0IDAgMC4wODU2NDk3QzE0LjY4NjggLTAuMDI4NTQ5OSA5OS4wNTUxIC0wLjAyODU0OTkgMjU1LjAxMSAwLjA4NTY0OTdDMjU1LjMxMSAyODEuMTE0IDQ0OC43ODYgNTYyLjE2MyA3NjcuOTA5IDYyNi40OTkiCiAgICAgc3R5bGU9ImZpbGw6IzFhODFlYjtmaWxsLW9wYWNpdHk6MSIKICAgICB0cmFuc2Zvcm09Im1hdHJpeCgwLjk5OTk5OTk2MDczODQ0IDAgMCAwLjk5OTk5OTk2MDczODQ0IDEyNy45NjY1OTE0OTQzMjggMTAyMy43NzIxNDc4MzE0KSIgLz4KICA8ZWxsaXBzZQogICAgIHN0eWxlPSJmaWxsOiMxYTgxZWI7c3Ryb2tlLXdpZHRoOjEuMDY3OTtmaWxsLW9wYWNpdHk6MSIKICAgICBpZD0icGF0aDEiCiAgICAgY3g9IjEwMjYuMzYxIgogICAgIGN5PSIxMDE0LjIyMTEiCiAgICAgcng9IjE4My4yNTU0MyIKICAgICByeT0iMTgzLjUxNTU4IiAvPgo8L3N2Zz4K)](https://torchjd.org)
 [![Static Badge](https://img.shields.io/badge/%F0%9F%92%AC_ChatBot-chat.torchjd.org-blue?logo=%F0%9F%92%AC)](https://chat.torchjd.org)
 [![Tests](https://github.com/SimplexLab/TorchJD/actions/workflows/checks.yml/badge.svg)](https://github.com/SimplexLab/TorchJD/actions/workflows/checks.yml)
 [![codecov](https://codecov.io/gh/SimplexLab/TorchJD/graph/badge.svg?token=8AUCZE76QH)](https://codecov.io/gh/SimplexLab/TorchJD)
 [![PyPI - Python Version](https://img.shields.io/pypi/pyversions/torchjd)](https://pypi.org/project/torchjd/)
 [![Static Badge](https://img.shields.io/badge/PyTorch-%3E%3D2.3-blue?logo=pytorch&logoColor=white)](https://pytorch.org/)
 [![Static Badge](https://img.shields.io/badge/Discord%20-%20community%20-%20%235865F2?logo=discord&logoColor=%23FFFFFF&label=Discord)](https://discord.gg/76KkRnb3nk)
 
-**TorchJD** is a PyTorch library for training neural networks with **multiple losses**. It supports two complementary approaches:
+TorchJD is a PyTorch library for training neural networks with **multiple losses**. It supports
+two complementary approaches:
 
-- **Scalarization** — combine losses into a single scalar before backprop, using methods from the literature (geometric mean, softmax weighting, etc.)
-- **Jacobian descent** — compute the full Jacobian matrix and aggregate it into a conflict-aware update direction using state-of-the-art aggregators (UPGrad, MGDA, CAGrad, and many more)
+- **Scalarization**: combine losses into a single scalar before backprop, using methods from the
+  literature (geometric mean, softmax weighting, etc.). This is often a good baseline.
+- **[Jacobian descent](https://arxiv.org/pdf/2406.16232)**: compute the Jacobian matrix of losses
+  with respect to parameters and aggregate it into an update direction using state-of-the-art
+  aggregators (UPGrad, MGDA, CAGrad, and many more). This in particular allows taking conflict-free
+  optimization directions, which can resolve problems that may be impossible to solve with standard
+  scalarizers.
 
 The full documentation is available at [torchjd.org](https://torchjd.org).
 
 ## Installation
 
+<!-- start installation -->
+TorchJD can be installed directly with pip:
 ```bash
 pip install "torchjd[quadprog_projector]"
 ```
+<!-- end installation -->
 
 This includes the dependencies required by UPGrad and DualProj. Some other aggregators may have
-additional dependencies — refer to the [installation docs](https://torchjd.org/stable/installation).
+additional dependencies. Please refer to the
+[installation documentation](https://torchjd.org/stable/installation) for them.
 
-## Quick start
+## Usage
 
 ### Scalarization
 
-Scalarization methods combine losses into a single scalar loss, which is then optimized with standard gradient descent. This is the simplest approach and is often a strong baseline.
+Scalarization methods combine losses into a single scalar before backprop. This is the simplest
+approach and is often a strong baseline.
 
 ```python
 import torch
@@ -51,61 +64,77 @@ task1_targets, task2_targets = torch.randn(16, 1), torch.randn(16, 1)
 
 output = model(inputs)
 losses = torch.stack([criterion(output, task1_targets), criterion(output, task2_targets)])
-loss = scalarizer(losses)  # combines losses into a single scalar
+loss = scalarizer(losses)
 loss.backward()
 optimizer.step()
 optimizer.zero_grad()
 ```
 
 ### Jacobian descent
 
-Jacobian descent computes the per-task gradients individually and aggregates them into a single conflict-aware update direction. This avoids the issue where averaging conflicting gradients harms one of the objectives.
-
-```python
-import torch
-from torch.nn import Linear, MSELoss, ReLU, Sequential
-from torch.optim import SGD
-
-from torchjd.autojac import mtl_backward, jac_to_grad
-from torchjd.aggregation import UPGrad
-
-shared = Sequential(Linear(10, 5), ReLU(), Linear(5, 3), ReLU())
-task1_head = Linear(3, 1)
-task2_head = Linear(3, 1)
-params = [*shared.parameters(), *task1_head.parameters(), *task2_head.parameters()]
-
-optimizer = SGD(params, lr=0.1)
-criterion = MSELoss()
-aggregator = UPGrad()
-
-inputs = torch.randn(16, 10)
-features = shared(inputs)
-loss1 = criterion(task1_head(features), torch.randn(16, 1))
-loss2 = criterion(task2_head(features), torch.randn(16, 1))
-
-mtl_backward([loss1, loss2], features=features)
-jac_to_grad(shared.parameters(), aggregator)
-optimizer.step()
-optimizer.zero_grad()
+Jacobian descent computes per-loss gradients individually and aggregates them into a single update
+direction. Some aggregators, like [UPGrad](https://torchjd.org/stable/docs/aggregation/upgrad/),
+are specifically designed to find conflict-free directions that are beneficial to all losses
+simultaneously.
+
+```diff
+  import torch
+  from torch.nn import Linear, MSELoss, ReLU, Sequential
+  from torch.optim import SGD
+
++ from torchjd.autojac import jac_to_grad, mtl_backward
++ from torchjd.aggregation import UPGrad
+
+  shared_module = Sequential(Linear(10, 5), ReLU(), Linear(5, 3), ReLU())
+  task1_module = Linear(3, 1)
+  task2_module = Linear(3, 1)
+  params = [*shared_module.parameters(), *task1_module.parameters(), *task2_module.parameters()]
+
+  loss_fn = MSELoss()
+  optimizer = SGD(params, lr=0.1)
++ aggregator = UPGrad()
+
+  for input, target1, target2 in zip(inputs, task1_targets, task2_targets):
+      features = shared_module(input)
+      loss1 = loss_fn(task1_module(features), target1)
+      loss2 = loss_fn(task2_module(features), target2)
+
+-     loss = loss1 + loss2
+-     loss.backward()
++     mtl_backward([loss1, loss2], features=features)
++     jac_to_grad(shared_module.parameters(), aggregator)
+      optimizer.step()
+      optimizer.zero_grad()
 ```
 
-More usage examples — including the memory-efficient `autogram` engine, instance-wise risk minimization, and partial Jacobian descent — can be found [in the docs](https://torchjd.org/stable/examples/).
+More usage examples, including the memory-efficient `autogram` engine, instance-wise risk
+minimization, and partial Jacobian descent, can be found [in the docs](https://torchjd.org/stable/examples/).
 
 ## Supported Scalarizers
 
-| Scalarizer | Description |
+| Scalarizer | Publication |
 |---|---|
-| [Mean](https://torchjd.org/stable/docs/scalarization) | Average of losses (equal weighting) |
-| [Sum](https://torchjd.org/stable/docs/scalarization) | Sum of losses |
-| [Linear](https://torchjd.org/stable/docs/scalarization) | Fixed user-supplied weights |
-| [GeometricMean](https://torchjd.org/stable/docs/scalarization) | Geometric mean (GLS) — [MultiNet++](https://arxiv.org/pdf/1902.08325) |
-| [Random](https://torchjd.org/stable/docs/scalarization) | Random weights sampled each step — [RLW](https://arxiv.org/pdf/2111.10603) |
+| [Constant](https://torchjd.org/stable/docs/scalarization/constant/) | - |
+| [COSMOS](https://torchjd.org/stable/docs/scalarization/cosmos/) | [COSMOS: Enhancing Multi-Objective Optimization with Scalarization](https://arxiv.org/pdf/2303.04536) |
+| [DWA](https://torchjd.org/stable/docs/scalarization/dwa/) | [End-to-End Multi-Task Learning with Attention](https://arxiv.org/pdf/1803.10704) |
+| [FAMO](https://torchjd.org/stable/docs/scalarization/famo/) | [FAMO: Fast Adaptive Multitask Optimization](https://arxiv.org/pdf/2306.03792) |
+| [GeometricMean](https://torchjd.org/stable/docs/scalarization/geometric_mean/) | [MultiNet++: Multi-Stream Feature Aggregation and Geometric Loss Strategy for Multi-Task Learning](https://arxiv.org/pdf/1902.08325) |
+| [IMTL-L](https://torchjd.org/stable/docs/scalarization/imtl_l/) | [Towards Impartial Multi-task Learning](https://discovery.ucl.ac.uk/id/eprint/10120667/) |
+| [Mean](https://torchjd.org/stable/docs/scalarization/mean/) | - |
+| [PBI](https://torchjd.org/stable/docs/scalarization/pbi/) | [A Decomposition-Based Evolutionary Algorithm for Many Objective Optimization](https://ieeexplore.ieee.org/document/7445185) |
+| [Random](https://torchjd.org/stable/docs/scalarization/random/) | [Reasonable Effectiveness of Random Weighting: A Litmus Test for Multi-Task Learning](https://arxiv.org/pdf/2111.10603) |
+| [STCH](https://torchjd.org/stable/docs/scalarization/stch/) | [Smooth Tchebycheff Scalarization for Multi-Objective Optimization](https://arxiv.org/pdf/2402.19078) |
+| [Sum](https://torchjd.org/stable/docs/scalarization/sum/) | - |
+| [UW](https://torchjd.org/stable/docs/scalarization/uw/) | [Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics](https://arxiv.org/pdf/1705.07115) |
 
 ## Supported Aggregators and Weightings
 
+TorchJD provides many existing aggregators from the literature, listed in the following table.
+
+<!-- recommended aggregators first, then alphabetical order -->
 | Aggregator | Weighting | Publication |
-|---|---|---|
-| [UPGrad](https://torchjd.org/stable/docs/aggregation/upgrad/#torchjd.aggregation.UPGrad)  | [UPGradWeighting](https://torchjd.org/stable/docs/aggregation/upgrad/#torchjd.aggregation.UPGradWeighting) | [Jacobian Descent For Multi-Objective Optimization](https://arxiv.org/pdf/2406.16232) |
+|----|----|----|
+| [UPGrad](https://torchjd.org/stable/docs/aggregation/upgrad/#torchjd.aggregation.UPGrad) (recommended) | [UPGradWeighting](https://torchjd.org/stable/docs/aggregation/upgrad/#torchjd.aggregation.UPGradWeighting) | [Jacobian Descent For Multi-Objective Optimization](https://arxiv.org/pdf/2406.16232) |
 | [AlignedMTL](https://torchjd.org/stable/docs/aggregation/aligned_mtl#torchjd.aggregation.AlignedMTL) | [AlignedMTLWeighting](https://torchjd.org/stable/docs/aggregation/aligned_mtl#torchjd.aggregation.AlignedMTLWeighting) | [Independent Component Alignment for Multi-Task Learning](https://arxiv.org/pdf/2305.19000) |
 | [CAGrad](https://torchjd.org/stable/docs/aggregation/cagrad#torchjd.aggregation.CAGrad) | [CAGradWeighting](https://torchjd.org/stable/docs/aggregation/cagrad#torchjd.aggregation.CAGradWeighting) | [Conflict-Averse Gradient Descent for Multi-task Learning](https://arxiv.org/pdf/2110.14048) |
 | [ConFIG](https://torchjd.org/stable/docs/aggregation/config#torchjd.aggregation.ConFIG) | - | [ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks](https://arxiv.org/pdf/2408.11104) |
@@ -120,29 +149,34 @@ More usage examples — including the memory-efficient `autogram` engine, instan
 | [Krum](https://torchjd.org/stable/docs/aggregation/krum#torchjd.aggregation.Krum) | [KrumWeighting](https://torchjd.org/stable/docs/aggregation/krum#torchjd.aggregation.KrumWeighting) | [Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent](https://proceedings.neurips.cc/paper/2017/file/f4b9ec30ad9f68f89b29639786cb62ef-Paper.pdf) |
 | [Mean](https://torchjd.org/stable/docs/aggregation/mean#torchjd.aggregation.Mean) | [MeanWeighting](https://torchjd.org/stable/docs/aggregation/mean#torchjd.aggregation.MeanWeighting) | - |
 | [MGDA](https://torchjd.org/stable/docs/aggregation/mgda#torchjd.aggregation.MGDA) | [MGDAWeighting](https://torchjd.org/stable/docs/aggregation/mgda#torchjd.aggregation.MGDAWeighting) | [Multiple-gradient descent algorithm (MGDA) for multiobjective optimization](https://comptes-rendus.academie-sciences.fr/mathematique/articles/10.1016/j.crma.2012.03.014/) |
-| - | [MoDoWeighting](https://torchjd.org/stable/docs/aggregation/modo/#torchjd.aggregation.MoDoWeighting) | [Three-Way Trade-Off in Multi-Objective Learning](https://www.jmlr.org/papers/volume25/23-1287/23-1287.pdf) |
+| - | [MoDoWeighting](https://torchjd.org/stable/docs/aggregation/modo/#torchjd.aggregation.MoDoWeighting) | [Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance](https://www.jmlr.org/papers/volume25/23-1287/23-1287.pdf) |
 | [NashMTL](https://torchjd.org/stable/docs/aggregation/nash_mtl#torchjd.aggregation.NashMTL) | - | [Multi-Task Learning as a Bargaining Game](https://arxiv.org/pdf/2202.01017) |
 | [PCGrad](https://torchjd.org/stable/docs/aggregation/pcgrad#torchjd.aggregation.PCGrad) | [PCGradWeighting](https://torchjd.org/stable/docs/aggregation/pcgrad#torchjd.aggregation.PCGradWeighting) | [Gradient Surgery for Multi-Task Learning](https://arxiv.org/pdf/2001.06782) |
 | [Random](https://torchjd.org/stable/docs/aggregation/random#torchjd.aggregation.Random) | [RandomWeighting](https://torchjd.org/stable/docs/aggregation/random#torchjd.aggregation.RandomWeighting) | [Reasonable Effectiveness of Random Weighting: A Litmus Test for Multi-Task Learning](https://arxiv.org/pdf/2111.10603) |
-| - | [SDMGradWeighting](https://torchjd.org/stable/docs/aggregation/sdmgrad#torchjd.aggregation.SDMGradWeighting) | [Direction-oriented Multi-objective Learning](https://arxiv.org/pdf/2305.18409) |
+| - | [SDMGradWeighting](https://torchjd.org/stable/docs/aggregation/sdmgrad#torchjd.aggregation.SDMGradWeighting) | [Direction-oriented Multi-objective Learning: Simple and Provable Stochastic Algorithms](https://arxiv.org/pdf/2305.18409) |
 | [Sum](https://torchjd.org/stable/docs/aggregation/sum#torchjd.aggregation.Sum) | [SumWeighting](https://torchjd.org/stable/docs/aggregation/sum#torchjd.aggregation.SumWeighting) | - |
 | [Trimmed Mean](https://torchjd.org/stable/docs/aggregation/trimmed_mean#torchjd.aggregation.TrimmedMean) | - | [Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates](https://proceedings.mlr.press/v80/yin18a/yin18a.pdf) |
 
 ## Release Methodology
 
-TorchJD follows [semantic versioning](https://semver.org/). Since the library is still in beta (`0.x.y`), we sometimes make interface changes in minor versions. Breaking changes are always documented in the [changelog](CHANGELOG.md) with migration instructions.
+We try to make a release whenever we have something worth sharing to users (bug fix, minor or large
+feature, etc.). TorchJD follows [semantic versioning](https://semver.org/). Since the library is
+still in beta (`0.x.y`), we sometimes make interface changes in minor versions. We prioritize the
+long-term quality of the library, which occasionally means introducing breaking changes. Whenever a
+release contains breaking changes, the [changelog](CHANGELOG.md) and the GitHub release notes always
+include clear instructions on how to migrate.
 
 ## Contribution
 
-Please read the [Contributing guide](CONTRIBUTING.md).
+Please read the [Contribution page](CONTRIBUTING.md).
 
-Thanks to our amazing contributors:
+Thanks to our amazing contributors for making this project possible:
 
-[![Contributors](https://stg.contrib.rocks/image?repo=SimplexLab/TorchJD&max=240&columns=18)](https://github.com/SimplexLab/TorchJD/graphs/contributors)
+<a href="https://github.com/SimplexLab/TorchJD/graphs/contributors"><img src="https://stg.contrib.rocks/image?repo=SimplexLab/TorchJD&max=240&columns=18" /></a>
 
 ## Citation
 
-```bibtex
+```
 @article{jacobian_descent,
   title={Jacobian Descent For Multi-Objective Optimization},
   author={Quinton, Pierre and Rey, Valérian},