Skip to content

Commit 0a38f4d

Browse files
Updated README file with all comments addressed
1 parent 99cc5f6 commit 0a38f4d

1 file changed

Lines changed: 47 additions & 29 deletions

File tree

README.md

Lines changed: 47 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,10 @@
1-
<picture>
2-
<source media="(prefers-color-scheme: dark)" srcset="docs/source/_static/logo-dark-mode.png">
3-
<source media="(prefers-color-scheme: light)" srcset="docs/source/_static/logo-light-mode.png">
4-
<img alt="Fallback image description" src="docs/source/_static/logo-light-mode.png" width="400">
5-
</picture>
1+
<div align="center">
2+
<picture>
3+
<source media="(prefers-color-scheme: dark)" srcset="docs/source/_static/logo-dark-mode.png">
4+
<source media="(prefers-color-scheme: light)" srcset="docs/source/_static/logo-light-mode.png">
5+
<img alt="Fallback image description" src="docs/source/_static/logo-light-mode.png" width="400">
6+
</picture>
7+
</div>
68

79
---
810

@@ -44,38 +46,40 @@ additional dependencies. Please refer to the
4446

4547
### Scalarization
4648

47-
Scalarization methods combine losses into a single scalar before backprop. This is the simplest
48-
approach and is often a strong baseline.
49+
Scalarization methods combine losses into a single scalar before backprop. Here is how to change
50+
a standard training loop to use scalarization:
4951

50-
```python
51-
import torch
52-
from torch.nn import Linear, MSELoss, ReLU, Sequential
53-
from torch.optim import SGD
52+
```diff
53+
import torch
54+
from torch.nn import Linear, MSELoss, ReLU, Sequential
55+
from torch.optim import SGD
5456

55-
from torchjd.scalarization import GeometricMean
57+
+ from torchjd.scalarization import GeometricMean
5658

57-
model = Sequential(Linear(10, 5), ReLU(), Linear(5, 1))
58-
optimizer = SGD(model.parameters(), lr=0.1)
59-
criterion = MSELoss()
60-
scalarizer = GeometricMean()
59+
model = Sequential(Linear(10, 5), ReLU(), Linear(5, 1))
60+
optimizer = SGD(model.parameters(), lr=0.1)
61+
criterion = MSELoss()
62+
+ scalarizer = GeometricMean()
6163

62-
inputs = torch.randn(16, 10)
63-
task1_targets, task2_targets = torch.randn(16, 1), torch.randn(16, 1)
64+
inputs = torch.randn(16, 10)
65+
task1_targets, task2_targets = torch.randn(16, 1), torch.randn(16, 1)
6466

65-
output = model(inputs)
66-
losses = torch.stack([criterion(output, task1_targets), criterion(output, task2_targets)])
67-
loss = scalarizer(losses)
68-
loss.backward()
69-
optimizer.step()
70-
optimizer.zero_grad()
67+
output = model(inputs)
68+
- loss = criterion(output, task1_targets) + criterion(output, task2_targets)
69+
- loss.backward()
70+
+ losses = torch.stack([criterion(output, task1_targets), criterion(output, task2_targets)])
71+
+ loss = scalarizer(losses)
72+
+ loss.backward()
73+
optimizer.step()
74+
optimizer.zero_grad()
7175
```
7276

7377
### Jacobian descent
7478

7579
Jacobian descent computes per-loss gradients individually and aggregates them into a single update
7680
direction. Some aggregators, like [UPGrad](https://torchjd.org/stable/docs/aggregation/upgrad/),
77-
are specifically designed to find conflict-free directions that are beneficial to all losses
78-
simultaneously.
81+
are specifically designed to find directions that are beneficial to all losses simultaneously.
82+
Here is how to change a standard multi-task training loop to use Jacobian descent:
7983

8084
```diff
8185
import torch
@@ -94,6 +98,10 @@ simultaneously.
9498
optimizer = SGD(params, lr=0.1)
9599
+ aggregator = UPGrad()
96100

101+
inputs = torch.randn(8, 16, 10)
102+
task1_targets = torch.randn(8, 16, 1)
103+
task2_targets = torch.randn(8, 16, 1)
104+
97105
for input, target1, target2 in zip(inputs, task1_targets, task2_targets):
98106
features = shared_module(input)
99107
loss1 = loss_fn(task1_module(features), target1)
@@ -107,8 +115,16 @@ simultaneously.
107115
optimizer.zero_grad()
108116
```
109117

110-
More usage examples, including the memory-efficient `autogram` engine, instance-wise risk
111-
minimization, and partial Jacobian descent, can be found [in the docs](https://torchjd.org/stable/examples/).
118+
### The `autogram` engine
119+
120+
TorchJD also provides the [`autogram` engine](https://torchjd.org/stable/docs/autogram/engine/),
121+
which computes the Gramian of the Jacobian incrementally without ever storing the full Jacobian in
122+
memory. This makes Jacobian descent feasible on large models where the full Jacobian would be too
123+
expensive to store. See the [autogram examples](https://torchjd.org/stable/examples/) for more
124+
details.
125+
126+
More usage examples, including instance-wise risk minimization and partial Jacobian descent, can be
127+
found [in the docs](https://torchjd.org/stable/examples/).
112128

113129
## Supported Scalarizers
114130

@@ -168,7 +184,9 @@ include clear instructions on how to migrate.
168184

169185
## Contribution
170186

171-
Please read the [Contribution page](CONTRIBUTING.md).
187+
Please read the [Contribution page](CONTRIBUTING.md) and join our
188+
[![Discord](https://img.shields.io/badge/Discord-%235865F2?logo=discord&logoColor=white)](https://discord.gg/76KkRnb3nk)
189+
to get involved!
172190

173191
Thanks to our amazing contributors for making this project possible:
174192

0 commit comments

Comments
 (0)