1- <picture >
2- <source media =" (prefers-color-scheme: dark) " srcset =" docs/source/_static/logo-dark-mode.png " >
3- <source media =" (prefers-color-scheme: light) " srcset =" docs/source/_static/logo-light-mode.png " >
4- <img alt =" Fallback image description " src =" docs/source/_static/logo-light-mode.png " width =" 400 " >
5- </picture >
1+ <div align =" center " >
2+ <picture >
3+ <source media="(prefers-color-scheme: dark)" srcset="docs/source/_static/logo-dark-mode.png">
4+ <source media="(prefers-color-scheme: light)" srcset="docs/source/_static/logo-light-mode.png">
5+ <img alt="Fallback image description" src="docs/source/_static/logo-light-mode.png" width="400">
6+ </picture >
7+ </div >
68
79---
810
@@ -44,38 +46,40 @@ additional dependencies. Please refer to the
4446
4547### Scalarization
4648
47- Scalarization methods combine losses into a single scalar before backprop. This is the simplest
48- approach and is often a strong baseline.
49+ Scalarization methods combine losses into a single scalar before backprop. Here is how to change
50+ a standard training loop to use scalarization:
4951
50- ``` python
51- import torch
52- from torch.nn import Linear, MSELoss, ReLU, Sequential
53- from torch.optim import SGD
52+ ``` diff
53+ import torch
54+ from torch.nn import Linear, MSELoss, ReLU, Sequential
55+ from torch.optim import SGD
5456
55- from torchjd.scalarization import GeometricMean
57+ + from torchjd.scalarization import GeometricMean
5658
57- model = Sequential(Linear(10 , 5 ), ReLU(), Linear(5 , 1 ))
58- optimizer = SGD(model.parameters(), lr = 0.1 )
59- criterion = MSELoss()
60- scalarizer = GeometricMean()
59+ model = Sequential(Linear(10, 5), ReLU(), Linear(5, 1))
60+ optimizer = SGD(model.parameters(), lr=0.1)
61+ criterion = MSELoss()
62+ + scalarizer = GeometricMean()
6163
62- inputs = torch.randn(16 , 10 )
63- task1_targets, task2_targets = torch.randn(16 , 1 ), torch.randn(16 , 1 )
64+ inputs = torch.randn(16, 10)
65+ task1_targets, task2_targets = torch.randn(16, 1), torch.randn(16, 1)
6466
65- output = model(inputs)
66- losses = torch.stack([criterion(output, task1_targets), criterion(output, task2_targets)])
67- loss = scalarizer(losses)
68- loss.backward()
69- optimizer.step()
70- optimizer.zero_grad()
67+ output = model(inputs)
68+ - loss = criterion(output, task1_targets) + criterion(output, task2_targets)
69+ - loss.backward()
70+ + losses = torch.stack([criterion(output, task1_targets), criterion(output, task2_targets)])
71+ + loss = scalarizer(losses)
72+ + loss.backward()
73+ optimizer.step()
74+ optimizer.zero_grad()
7175```
7276
7377### Jacobian descent
7478
7579Jacobian descent computes per-loss gradients individually and aggregates them into a single update
7680direction. Some aggregators, like [ UPGrad] ( https://torchjd.org/stable/docs/aggregation/upgrad/ ) ,
77- are specifically designed to find conflict-free directions that are beneficial to all losses
78- simultaneously.
81+ are specifically designed to find directions that are beneficial to all losses simultaneously.
82+ Here is how to change a standard multi-task training loop to use Jacobian descent:
7983
8084``` diff
8185 import torch
@@ -94,6 +98,10 @@ simultaneously.
9498 optimizer = SGD(params, lr=0.1)
9599+ aggregator = UPGrad()
96100
101+ inputs = torch.randn(8, 16, 10)
102+ task1_targets = torch.randn(8, 16, 1)
103+ task2_targets = torch.randn(8, 16, 1)
104+
97105 for input, target1, target2 in zip(inputs, task1_targets, task2_targets):
98106 features = shared_module(input)
99107 loss1 = loss_fn(task1_module(features), target1)
@@ -107,8 +115,16 @@ simultaneously.
107115 optimizer.zero_grad()
108116```
109117
110- More usage examples, including the memory-efficient ` autogram ` engine, instance-wise risk
111- minimization, and partial Jacobian descent, can be found [ in the docs] ( https://torchjd.org/stable/examples/ ) .
118+ ### The ` autogram ` engine
119+
120+ TorchJD also provides the [ ` autogram ` engine] ( https://torchjd.org/stable/docs/autogram/engine/ ) ,
121+ which computes the Gramian of the Jacobian incrementally without ever storing the full Jacobian in
122+ memory. This makes Jacobian descent feasible on large models where the full Jacobian would be too
123+ expensive to store. See the [ autogram examples] ( https://torchjd.org/stable/examples/ ) for more
124+ details.
125+
126+ More usage examples, including instance-wise risk minimization and partial Jacobian descent, can be
127+ found [ in the docs] ( https://torchjd.org/stable/examples/ ) .
112128
113129## Supported Scalarizers
114130
@@ -168,7 +184,9 @@ include clear instructions on how to migrate.
168184
169185## Contribution
170186
171- Please read the [ Contribution page] ( CONTRIBUTING.md ) .
187+ Please read the [ Contribution page] ( CONTRIBUTING.md ) and join our
188+ [ ![ Discord] ( https://img.shields.io/badge/Discord-%235865F2?logo=discord&logoColor=white )] ( https://discord.gg/76KkRnb3nk )
189+ to get involved!
172190
173191Thanks to our amazing contributors for making this project possible:
174192
0 commit comments