You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**TorchJD** is a PyTorch library for training neural networks with **multiple losses**. It supports two complementary approaches:
17
+
TorchJD is a PyTorch library for training neural networks with **multiple losses**. It supports
18
+
two complementary approaches:
16
19
17
-
-**Scalarization** — combine losses into a single scalar before backprop, using methods from the literature (geometric mean, softmax weighting, etc.)
18
-
-**Jacobian descent** — compute the full Jacobian matrix and aggregate it into a conflict-aware update direction using state-of-the-art aggregators (UPGrad, MGDA, CAGrad, and many more)
20
+
-**Scalarization**: combine losses into a single scalar before backprop, using methods from the
21
+
literature (geometric mean, softmax weighting, etc.). This is often a good baseline.
22
+
-**[Jacobian descent](https://arxiv.org/pdf/2406.16232)**: compute the Jacobian matrix of losses
23
+
with respect to parameters and aggregate it into an update direction using state-of-the-art
24
+
aggregators (UPGrad, MGDA, CAGrad, and many more). This in particular allows taking conflict-free
25
+
optimization directions, which can resolve problems that may be impossible to solve with standard
26
+
scalarizers.
19
27
20
28
The full documentation is available at [torchjd.org](https://torchjd.org).
21
29
22
30
## Installation
23
31
32
+
<!-- start installation -->
33
+
TorchJD can be installed directly with pip:
24
34
```bash
25
35
pip install "torchjd[quadprog_projector]"
26
36
```
37
+
<!-- end installation -->
27
38
28
39
This includes the dependencies required by UPGrad and DualProj. Some other aggregators may have
29
-
additional dependencies — refer to the [installation docs](https://torchjd.org/stable/installation).
40
+
additional dependencies. Please refer to the
41
+
[installation documentation](https://torchjd.org/stable/installation) for them.
30
42
31
-
## Quick start
43
+
## Usage
32
44
33
45
### Scalarization
34
46
35
-
Scalarization methods combine losses into a single scalar loss, which is then optimized with standard gradient descent. This is the simplest approach and is often a strong baseline.
47
+
Scalarization methods combine losses into a single scalar before backprop. This is the simplest
loss = scalarizer(losses)# combines losses into a single scalar
67
+
loss = scalarizer(losses)
55
68
loss.backward()
56
69
optimizer.step()
57
70
optimizer.zero_grad()
58
71
```
59
72
60
73
### Jacobian descent
61
74
62
-
Jacobian descent computes the per-task gradients individually and aggregates them into a single conflict-aware update direction. This avoids the issue where averaging conflicting gradients harms one of the objectives.
63
-
64
-
```python
65
-
import torch
66
-
from torch.nn import Linear, MSELoss, ReLU, Sequential
67
-
from torch.optim importSGD
68
-
69
-
from torchjd.autojac import mtl_backward, jac_to_grad
More usage examples — including the memory-efficient `autogram` engine, instance-wise risk minimization, and partial Jacobian descent — can be found [in the docs](https://torchjd.org/stable/examples/).
110
+
More usage examples, including the memory-efficient `autogram` engine, instance-wise risk
111
+
minimization, and partial Jacobian descent, can be found [in the docs](https://torchjd.org/stable/examples/).
93
112
94
113
## Supported Scalarizers
95
114
96
-
| Scalarizer |Description|
115
+
| Scalarizer |Publication|
97
116
|---|---|
98
-
|[Mean](https://torchjd.org/stable/docs/scalarization)| Average of losses (equal weighting) |
99
-
|[Sum](https://torchjd.org/stable/docs/scalarization)| Sum of losses |
|[COSMOS](https://torchjd.org/stable/docs/scalarization/cosmos/)|[COSMOS: Enhancing Multi-Objective Optimization with Scalarization](https://arxiv.org/pdf/2303.04536)|
119
+
|[DWA](https://torchjd.org/stable/docs/scalarization/dwa/)|[End-to-End Multi-Task Learning with Attention](https://arxiv.org/pdf/1803.10704)|
120
+
|[FAMO](https://torchjd.org/stable/docs/scalarization/famo/)|[FAMO: Fast Adaptive Multitask Optimization](https://arxiv.org/pdf/2306.03792)|
121
+
|[GeometricMean](https://torchjd.org/stable/docs/scalarization/geometric_mean/)|[MultiNet++: Multi-Stream Feature Aggregation and Geometric Loss Strategy for Multi-Task Learning](https://arxiv.org/pdf/1902.08325)|
|[PBI](https://torchjd.org/stable/docs/scalarization/pbi/)|[A Decomposition-Based Evolutionary Algorithm for Many Objective Optimization](https://ieeexplore.ieee.org/document/7445185)|
125
+
|[Random](https://torchjd.org/stable/docs/scalarization/random/)|[Reasonable Effectiveness of Random Weighting: A Litmus Test for Multi-Task Learning](https://arxiv.org/pdf/2111.10603)|
126
+
|[STCH](https://torchjd.org/stable/docs/scalarization/stch/)|[Smooth Tchebycheff Scalarization for Multi-Objective Optimization](https://arxiv.org/pdf/2402.19078)|
|[UW](https://torchjd.org/stable/docs/scalarization/uw/)|[Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics](https://arxiv.org/pdf/1705.07115)|
103
129
104
130
## Supported Aggregators and Weightings
105
131
132
+
TorchJD provides many existing aggregators from the literature, listed in the following table.
133
+
134
+
<!-- recommended aggregators first, then alphabetical order -->
106
135
| Aggregator | Weighting | Publication |
107
-
|---|---|---|
108
-
|[UPGrad](https://torchjd.org/stable/docs/aggregation/upgrad/#torchjd.aggregation.UPGrad)|[UPGradWeighting](https://torchjd.org/stable/docs/aggregation/upgrad/#torchjd.aggregation.UPGradWeighting)|[Jacobian Descent For Multi-Objective Optimization](https://arxiv.org/pdf/2406.16232)|
136
+
|----|----|----|
137
+
|[UPGrad](https://torchjd.org/stable/docs/aggregation/upgrad/#torchjd.aggregation.UPGrad)(recommended)|[UPGradWeighting](https://torchjd.org/stable/docs/aggregation/upgrad/#torchjd.aggregation.UPGradWeighting)|[Jacobian Descent For Multi-Objective Optimization](https://arxiv.org/pdf/2406.16232)|
109
138
|[AlignedMTL](https://torchjd.org/stable/docs/aggregation/aligned_mtl#torchjd.aggregation.AlignedMTL)|[AlignedMTLWeighting](https://torchjd.org/stable/docs/aggregation/aligned_mtl#torchjd.aggregation.AlignedMTLWeighting)|[Independent Component Alignment for Multi-Task Learning](https://arxiv.org/pdf/2305.19000)|
110
139
|[CAGrad](https://torchjd.org/stable/docs/aggregation/cagrad#torchjd.aggregation.CAGrad)|[CAGradWeighting](https://torchjd.org/stable/docs/aggregation/cagrad#torchjd.aggregation.CAGradWeighting)|[Conflict-Averse Gradient Descent for Multi-task Learning](https://arxiv.org/pdf/2110.14048)|
111
140
|[ConFIG](https://torchjd.org/stable/docs/aggregation/config#torchjd.aggregation.ConFIG)| - |[ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks](https://arxiv.org/pdf/2408.11104)|
@@ -120,29 +149,34 @@ More usage examples — including the memory-efficient `autogram` engine, instan
120
149
|[Krum](https://torchjd.org/stable/docs/aggregation/krum#torchjd.aggregation.Krum)|[KrumWeighting](https://torchjd.org/stable/docs/aggregation/krum#torchjd.aggregation.KrumWeighting)|[Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent](https://proceedings.neurips.cc/paper/2017/file/f4b9ec30ad9f68f89b29639786cb62ef-Paper.pdf)|
|[MGDA](https://torchjd.org/stable/docs/aggregation/mgda#torchjd.aggregation.MGDA)|[MGDAWeighting](https://torchjd.org/stable/docs/aggregation/mgda#torchjd.aggregation.MGDAWeighting)|[Multiple-gradient descent algorithm (MGDA) for multiobjective optimization](https://comptes-rendus.academie-sciences.fr/mathematique/articles/10.1016/j.crma.2012.03.014/)|
123
-
| - |[MoDoWeighting](https://torchjd.org/stable/docs/aggregation/modo/#torchjd.aggregation.MoDoWeighting)|[Three-Way Trade-Off in Multi-Objective Learning](https://www.jmlr.org/papers/volume25/23-1287/23-1287.pdf)|
152
+
| - |[MoDoWeighting](https://torchjd.org/stable/docs/aggregation/modo/#torchjd.aggregation.MoDoWeighting)|[Three-Way Trade-Off in Multi-Objective Learning: Optimization, Generalization and Conflict-Avoidance](https://www.jmlr.org/papers/volume25/23-1287/23-1287.pdf)|
124
153
|[NashMTL](https://torchjd.org/stable/docs/aggregation/nash_mtl#torchjd.aggregation.NashMTL)| - |[Multi-Task Learning as a Bargaining Game](https://arxiv.org/pdf/2202.01017)|
125
154
|[PCGrad](https://torchjd.org/stable/docs/aggregation/pcgrad#torchjd.aggregation.PCGrad)|[PCGradWeighting](https://torchjd.org/stable/docs/aggregation/pcgrad#torchjd.aggregation.PCGradWeighting)|[Gradient Surgery for Multi-Task Learning](https://arxiv.org/pdf/2001.06782)|
126
155
|[Random](https://torchjd.org/stable/docs/aggregation/random#torchjd.aggregation.Random)|[RandomWeighting](https://torchjd.org/stable/docs/aggregation/random#torchjd.aggregation.RandomWeighting)|[Reasonable Effectiveness of Random Weighting: A Litmus Test for Multi-Task Learning](https://arxiv.org/pdf/2111.10603)|
|[Trimmed Mean](https://torchjd.org/stable/docs/aggregation/trimmed_mean#torchjd.aggregation.TrimmedMean)| - |[Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates](https://proceedings.mlr.press/v80/yin18a/yin18a.pdf)|
130
159
131
160
## Release Methodology
132
161
133
-
TorchJD follows [semantic versioning](https://semver.org/). Since the library is still in beta (`0.x.y`), we sometimes make interface changes in minor versions. Breaking changes are always documented in the [changelog](CHANGELOG.md) with migration instructions.
162
+
We try to make a release whenever we have something worth sharing to users (bug fix, minor or large
163
+
feature, etc.). TorchJD follows [semantic versioning](https://semver.org/). Since the library is
164
+
still in beta (`0.x.y`), we sometimes make interface changes in minor versions. We prioritize the
165
+
long-term quality of the library, which occasionally means introducing breaking changes. Whenever a
166
+
release contains breaking changes, the [changelog](CHANGELOG.md) and the GitHub release notes always
167
+
include clear instructions on how to migrate.
134
168
135
169
## Contribution
136
170
137
-
Please read the [Contributing guide](CONTRIBUTING.md).
171
+
Please read the [Contribution page](CONTRIBUTING.md).
138
172
139
-
Thanks to our amazing contributors:
173
+
Thanks to our amazing contributors for making this project possible:
0 commit comments