Updated README file with all comments addressed

KhusPatel4450 · web-flow · commit 0a38f4d312a2 · 2026-06-30T14:25:35.000-04:00
diff --git a/README.md b/README.md
@@ -1,8 +1,10 @@
-<picture>
-  <source media="(prefers-color-scheme: dark)" srcset="docs/source/_static/logo-dark-mode.png">
-  <source media="(prefers-color-scheme: light)" srcset="docs/source/_static/logo-light-mode.png">
-  <img alt="Fallback image description" src="docs/source/_static/logo-light-mode.png" width="400">
-</picture>
+<div align="center">
+  <picture>
+    <source media="(prefers-color-scheme: dark)" srcset="docs/source/_static/logo-dark-mode.png">
+    <source media="(prefers-color-scheme: light)" srcset="docs/source/_static/logo-light-mode.png">
+    <img alt="Fallback image description" src="docs/source/_static/logo-light-mode.png" width="400">
+  </picture>
+</div>
 
 ---
 
@@ -44,38 +46,40 @@ additional dependencies. Please refer to the
 
 ### Scalarization
 
-Scalarization methods combine losses into a single scalar before backprop. This is the simplest
-approach and is often a strong baseline.
+Scalarization methods combine losses into a single scalar before backprop. Here is how to change
+a standard training loop to use scalarization:
 
-```python
-import torch
-from torch.nn import Linear, MSELoss, ReLU, Sequential
-from torch.optim import SGD
+```diff
+  import torch
+  from torch.nn import Linear, MSELoss, ReLU, Sequential
+  from torch.optim import SGD
 
-from torchjd.scalarization import GeometricMean
++ from torchjd.scalarization import GeometricMean
 
-model = Sequential(Linear(10, 5), ReLU(), Linear(5, 1))
-optimizer = SGD(model.parameters(), lr=0.1)
-criterion = MSELoss()
-scalarizer = GeometricMean()
+  model = Sequential(Linear(10, 5), ReLU(), Linear(5, 1))
+  optimizer = SGD(model.parameters(), lr=0.1)
+  criterion = MSELoss()
++ scalarizer = GeometricMean()
 
-inputs = torch.randn(16, 10)
-task1_targets, task2_targets = torch.randn(16, 1), torch.randn(16, 1)
+  inputs = torch.randn(16, 10)
+  task1_targets, task2_targets = torch.randn(16, 1), torch.randn(16, 1)
 
-output = model(inputs)
-losses = torch.stack([criterion(output, task1_targets), criterion(output, task2_targets)])
-loss = scalarizer(losses)
-loss.backward()
-optimizer.step()
-optimizer.zero_grad()
+  output = model(inputs)
+- loss = criterion(output, task1_targets) + criterion(output, task2_targets)
+- loss.backward()
++ losses = torch.stack([criterion(output, task1_targets), criterion(output, task2_targets)])
++ loss = scalarizer(losses)
++ loss.backward()
+  optimizer.step()
+  optimizer.zero_grad()
 ```
 
 ### Jacobian descent
 
 Jacobian descent computes per-loss gradients individually and aggregates them into a single update
 direction. Some aggregators, like [UPGrad](https://torchjd.org/stable/docs/aggregation/upgrad/),
-are specifically designed to find conflict-free directions that are beneficial to all losses
-simultaneously.
+are specifically designed to find directions that are beneficial to all losses simultaneously.
+Here is how to change a standard multi-task training loop to use Jacobian descent:
 
 ```diff
   import torch
@@ -94,6 +98,10 @@ simultaneously.
   optimizer = SGD(params, lr=0.1)
 + aggregator = UPGrad()
 
+  inputs = torch.randn(8, 16, 10)
+  task1_targets = torch.randn(8, 16, 1)
+  task2_targets = torch.randn(8, 16, 1)
+
   for input, target1, target2 in zip(inputs, task1_targets, task2_targets):
       features = shared_module(input)
       loss1 = loss_fn(task1_module(features), target1)
@@ -107,8 +115,16 @@ simultaneously.
       optimizer.zero_grad()
 ```
 
-More usage examples, including the memory-efficient `autogram` engine, instance-wise risk
-minimization, and partial Jacobian descent, can be found [in the docs](https://torchjd.org/stable/examples/).
+### The `autogram` engine
+
+TorchJD also provides the [`autogram` engine](https://torchjd.org/stable/docs/autogram/engine/),
+which computes the Gramian of the Jacobian incrementally without ever storing the full Jacobian in
+memory. This makes Jacobian descent feasible on large models where the full Jacobian would be too
+expensive to store. See the [autogram examples](https://torchjd.org/stable/examples/) for more
+details.
+
+More usage examples, including instance-wise risk minimization and partial Jacobian descent, can be
+found [in the docs](https://torchjd.org/stable/examples/).
 
 ## Supported Scalarizers
 
@@ -168,7 +184,9 @@ include clear instructions on how to migrate.
 
 ## Contribution
 
-Please read the [Contribution page](CONTRIBUTING.md).
+Please read the [Contribution page](CONTRIBUTING.md) and join our
+[![Discord](https://img.shields.io/badge/Discord-%235865F2?logo=discord&logoColor=white)](https://discord.gg/76KkRnb3nk)
+to get involved!
 
 Thanks to our amazing contributors for making this project possible: