SimplexLab
diff --git a/‎CHANGELOG.md‎
Lines changed: 25 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 25 additions & 0 deletions
diff --git a/‎docs/source/docs/autojac/index.rst‎
Lines changed: 1 addition & 0 deletions b/‎docs/source/docs/autojac/index.rst‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/source/docs/autojac/jac_to_grad.rst‎
Lines changed: 6 additions & 0 deletions b/‎docs/source/docs/autojac/jac_to_grad.rst‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎docs/source/examples/amp.rst‎
Lines changed: 4 additions & 3 deletions b/‎docs/source/examples/amp.rst‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎docs/source/examples/basic_usage.rst‎
Lines changed: 6 additions & 3 deletions b/‎docs/source/examples/basic_usage.rst‎
Lines changed: 6 additions & 3 deletions
diff --git a/‎docs/source/examples/iwrm.rst‎
Lines changed: 4 additions & 4 deletions b/‎docs/source/examples/iwrm.rst‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎docs/source/examples/lightning_integration.rst‎
Lines changed: 4 additions & 3 deletions b/‎docs/source/examples/lightning_integration.rst‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎docs/source/examples/monitoring.rst‎
Lines changed: 3 additions & 2 deletions b/‎docs/source/examples/monitoring.rst‎
Lines changed: 3 additions & 2 deletions
diff --git a/‎docs/source/examples/mtl.rst‎
Lines changed: 4 additions & 3 deletions b/‎docs/source/examples/mtl.rst‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎docs/source/examples/rnn.rst‎
Lines changed: 4 additions & 3 deletions b/‎docs/source/examples/rnn.rst‎
Lines changed: 4 additions & 3 deletions
@@ -10,6 +10,31 @@ changelog does not include internal changes that do not affect the user.
 
 ### Changed
 
+- **BREAKING**: Removed from `backward` and `mtl_backward` the responsibility to aggregate the
+  Jacobian. Now, these functions compute and populate the `.jac` fields of the parameters, and a new
+  function `torchjd.autojac.jac_to_grad` should then be called to aggregate those `.jac` fields into
+  `.grad` fields.
+  This means that users now have more control on what they do with the Jacobians (they can easily
+  aggregate them group by group or even param by param if they want), but it now requires an extra
+  line of code to do the Jacobian descent step. To update, please change:
+  ```python
+  backward(losses, aggregator)
+  ```
+  to
+  ```python
+  backward(losses)
+  jac_to_grad(model.parameters(), aggregator)
+  ```
+  and
+  ```python
+  mtl_backward(losses, features, aggregator)
+  ```
+  to
+  ```python
+  mtl_backward(losses, features)
+  jac_to_grad(shared_module.parameters(), aggregator)
+  ```
+
 - Removed an unnecessary internal cloning of gradient. This should slightly improve the memory
   efficiency of `autojac`.
 
 
@@ -10,3 +10,4 @@ autojac
 
     backward.rst
     mtl_backward.rst
+    jac_to_grad.rst
@@ -0,0 +1,6 @@
+:hide-toc:
+
+jac_to_grad
+===========
+
+.. autofunction:: torchjd.autojac.jac_to_grad
@@ -12,15 +12,15 @@ case, the losses) should preferably be scaled with a `GradScaler
 following example shows the resulting code for a multi-task learning use-case.
 
 .. code-block:: python
-    :emphasize-lines: 2, 17, 27, 34-37
+    :emphasize-lines: 2, 17, 27, 34-35, 37-38
 
     import torch
     from torch.amp import GradScaler
     from torch.nn import Linear, MSELoss, ReLU, Sequential
     from torch.optim import SGD
 
     from torchjd.aggregation import UPGrad
-    from torchjd.autojac import mtl_backward
+    from torchjd.autojac import mtl_backward, jac_to_grad
 
     shared_module = Sequential(Linear(10, 5), ReLU(), Linear(5, 3), ReLU())
     task1_module = Linear(3, 1)
@@ -48,7 +48,8 @@ following example shows the resulting code for a multi-task learning use-case.
             loss2 = loss_fn(output2, target2)
 
         scaled_losses = scaler.scale([loss1, loss2])
-        mtl_backward(losses=scaled_losses, features=features, aggregator=aggregator)
+        mtl_backward(losses=scaled_losses, features=features)
+        jac_to_grad(shared_module.parameters(), aggregator)
         scaler.step(optimizer)
         scaler.update()
         optimizer.zero_grad()
 
@@ -20,6 +20,7 @@ Import several classes from ``torch`` and ``torchjd``:
 
     from torchjd import autojac
     from torchjd.aggregation import UPGrad
+    from torchjd.autojac import jac_to_grad
 
 Define the model and the optimizer, as usual:
 
@@ -63,10 +64,12 @@ Perform the Jacobian descent backward pass:
 
 .. code-block:: python
 
-    autojac.backward([loss1, loss2], aggregator)
+    autojac.backward([loss1, loss2])
+    jac_to_grad(model.parameters(), aggregator)
 
-This will populate the ``.grad`` field of each model parameter with the corresponding aggregated
-Jacobian matrix.
+The first function will populate the ``.jac`` field of each model parameter with the corresponding
+Jacobian, and the second one will aggregate these Jacobians and store the result in the ``.grad``
+field of the parameters. It also deletes the ``.jac`` fields save some memory.
 
 Update each parameter based on its ``.grad`` field, using the ``optimizer``:
 
 
@@ -76,14 +76,14 @@ batch of data. When minimizing per-instance losses (IWRM), we use either autojac
     .. tab-item:: autojac
 
         .. code-block:: python
-            :emphasize-lines: 5-6, 12, 16, 21-22
+            :emphasize-lines: 5-6, 12, 16, 21-23
 
             import torch
             from torch.nn import Linear, MSELoss, ReLU, Sequential
             from torch.optim import SGD
 
             from torchjd.aggregation import UPGrad
-            from torchjd.autojac import backward
+            from torchjd.autojac import backward, jac_to_grad
 
             X = torch.randn(8, 16, 10)
             Y = torch.randn(8, 16)
@@ -99,8 +99,8 @@ batch of data. When minimizing per-instance losses (IWRM), we use either autojac
             for x, y in zip(X, Y):
                 y_hat = model(x).squeeze(dim=1)  # shape: [16]
                 losses = loss_fn(y_hat, y)  # shape: [16]
-                backward(losses, aggregator)
-
+                backward(losses)
+                jac_to_grad(model.parameters(), aggregator)
 
                 optimizer.step()
                 optimizer.zero_grad()
 
@@ -11,7 +11,7 @@ The following code example demonstrates a basic multi-task learning setup using
 <../docs/autojac/mtl_backward>` at each training iteration.
 
 .. code-block:: python
-    :emphasize-lines: 9-10, 18, 31
+    :emphasize-lines: 9-10, 18, 31-32
 
     import torch
     from lightning import LightningModule, Trainer
@@ -22,7 +22,7 @@ The following code example demonstrates a basic multi-task learning setup using
     from torch.utils.data import DataLoader, TensorDataset
 
     from torchjd.aggregation import UPGrad
-    from torchjd.autojac import mtl_backward
+    from torchjd.autojac import mtl_backward, jac_to_grad
 
     class Model(LightningModule):
         def __init__(self):
@@ -43,7 +43,8 @@ The following code example demonstrates a basic multi-task learning setup using
             loss2 = mse_loss(output2, target2)
 
             opt = self.optimizers()
-            mtl_backward(losses=[loss1, loss2], features=features, aggregator=UPGrad())
+            mtl_backward(losses=[loss1, loss2], features=features)
+            jac_to_grad(self.feature_extractor.parameters(), UPGrad())
             opt.step()
             opt.zero_grad()
 
 
@@ -23,7 +23,7 @@ they have a negative inner product).
     from torch.optim import SGD
 
     from torchjd.aggregation import UPGrad
-    from torchjd.autojac import mtl_backward
+    from torchjd.autojac import mtl_backward, jac_to_grad
 
     def print_weights(_, __, weights: torch.Tensor) -> None:
         """Prints the extracted weights."""
@@ -63,6 +63,7 @@ they have a negative inner product).
         loss1 = loss_fn(output1, target1)
         loss2 = loss_fn(output2, target2)
 
-        mtl_backward(losses=[loss1, loss2], features=features, aggregator=aggregator)
+        mtl_backward(losses=[loss1, loss2], features=features)
+        jac_to_grad(shared_module.parameters(), aggregator)
         optimizer.step()
         optimizer.zero_grad()
@@ -19,14 +19,14 @@ vectors of dimension 10, and their corresponding scalar labels for both tasks.
 
 
 .. code-block:: python
-    :emphasize-lines: 5-6, 19, 32
+    :emphasize-lines: 5-6, 19, 32-33
 
     import torch
     from torch.nn import Linear, MSELoss, ReLU, Sequential
     from torch.optim import SGD
 
     from torchjd.aggregation import UPGrad
-    from torchjd.autojac import mtl_backward
+    from torchjd.autojac import mtl_backward, jac_to_grad
 
     shared_module = Sequential(Linear(10, 5), ReLU(), Linear(5, 3), ReLU())
     task1_module = Linear(3, 1)
@@ -52,7 +52,8 @@ vectors of dimension 10, and their corresponding scalar labels for both tasks.
         loss1 = loss_fn(output1, target1)
         loss2 = loss_fn(output2, target2)
 
-        mtl_backward(losses=[loss1, loss2], features=features, aggregator=aggregator)
+        mtl_backward(losses=[loss1, loss2], features=features)
+        jac_to_grad(shared_module.parameters(), aggregator)
         optimizer.step()
         optimizer.zero_grad()
 
 
@@ -6,14 +6,14 @@ element of the output sequences. If the gradients of these losses are likely to
 descent can be leveraged to enhance optimization.
 
 .. code-block:: python
-    :emphasize-lines: 5-6, 10, 17, 19
+    :emphasize-lines: 5-6, 10, 17, 19-20
 
     import torch
     from torch.nn import RNN
     from torch.optim import SGD
 
     from torchjd.aggregation import UPGrad
-    from torchjd.autojac import backward
+    from torchjd.autojac import backward, jac_to_grad
 
     rnn = RNN(input_size=10, hidden_size=20, num_layers=2)
     optimizer = SGD(rnn.parameters(), lr=0.1)
@@ -26,7 +26,8 @@ descent can be leveraged to enhance optimization.
         output, _ = rnn(input)  # output is of shape [5, 3, 20].
         losses = ((output - target) ** 2).mean(dim=[1, 2])  # 1 loss per sequence element.
 
-        backward(losses, aggregator, parallel_chunk_size=1)
+        backward(losses, parallel_chunk_size=1)
+        jac_to_grad(rnn.parameters(), aggregator)
         optimizer.step()
         optimizer.zero_grad()
Original file line number	Diff line number	Diff line change
`@@ -10,3 +10,4 @@ autojac`
`10`	`10`
`11`	`11`	`backward.rst`
`12`	`12`	`mtl_backward.rst`
	`13`	`+ jac_to_grad.rst`