All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning. This changelog does not include internal changes that do not affect the user.
- Added usage example showing how to combine TorchJD with automatic mixed precision (AMP).
- Refactored the underlying optimization problem that
UPGradandDualProjhave to solve to project onto the dual cone. This may minimally affect the output of these aggregators. - Refactored internal verifications in the autojac engine so that they do not run at runtime
anymore. This should minimally improve the performance and reduce the memory usage of
backwardandmtl_backward.
- Fixed the behavior of
backwardandmtl_backwardwhen some tensors are repeated (i.e. when they appear several times in a list of tensors provided as argument). Instead of raising an exception in these cases, we are now aligned with the behavior oftorch.autograd.backward. Repeated tensors that we differentiate lead to repeated rows in the Jacobian, prior to aggregation, and repeated tensors with respect to which we differentiate count only once. - Removed arbitrary exception handling in
IMTLGandAlignedMTLwhen the computation fails. In practice, this fix should only affect some matrices with extremely large values, which should not usually happen.
- Added new aggregator
ConFIGfrom ConFIG: Towards Conflict-free Training of Physics Informed Neural Networks.
- Added Python 3.13 classifier in pyproject.toml (we now also run tests on Python 3.13 in the CI).
- Fixed a bug introduced in v0.4.0 that could cause
backwardandmtl_backwardto fail with some tensor shapes.
-
Changed how the Jacobians are computed when calling
backwardormtl_backwardwithparallel_chunk_size=1to not rely ontorch.autograd.vmapin this case. Whenevervmapdoes not support something (compiled functions, RNN on cuda, etc.), users should now be able to avoid usingvmapby callingbackwardormtl_backwardwithparallel_chunk_size=1. -
Changed the effect of the parameter
retain_graphofbackwardandmtl_backward. When set toFalse, it now frees the graph only after all gradients have been computed. In most cases, users should now leave the default valueretain_graph=False, no matter what the value ofparallel_chunk_sizeis. This will reduce the memory overhead.
- RNN training usage example in the documentation.
- Improved the performance of the graph traversal function called by
backwardandmtl_backwardto find the tensors with respect to which differentiation should be done. It now visits every node at most once.
- Added a default value to the
inputsparameter ofbackward. If not provided, theinputswill default to all leaf tensors that were used to compute thetensorsparameter. This is in line with the behavior of torch.autograd.backward. - Added a default value to the
shared_paramsand to thetasks_paramsarguments ofmtl_backward. If not provided, theshared_paramswill default to all leaf tensors that were used to compute thefeatures, and thetasks_paramswill default to all leaf tensors that were used to compute each of thelosses, excluding those used to compute thefeatures. - Note in the documentation about the incompatibility of
backwardandmtl_backwardwith tensors that retain grad.
- BREAKING: Changed the name of the parameter
Atoaggregatorinbackwardandmtl_backward. - BREAKING: Changed the order of the parameters of
backwardandmtl_backwardto make it possible to have a default value forinputsand forshared_paramsandtasks_params, respectively. Usages ofbackwardandmtl_backwardthat rely on the order between arguments must be updated. - Switched to the PEP 735 dependency groups format in
pyproject.toml(from a[tool.pdm.dev-dependencies]to a[dependency-groups]section). This should only affect development dependencies.
- BREAKING: Added a check in
mtl_backwardto ensure thattasks_paramsandshared_paramshave no overlap. Previously, the behavior in this scenario was quite arbitrary.
- PyTorch Lightning integration example.
- Explanation about Jacobian descent in the README.
- Made the dependency on ecos explicit in pyproject.toml
(before
cvxpy1.16.0, it was installed automatically when installingcvxpy).
- Removed upper cap on
numpyversion in the dependencies. This makestorchjdcompatible with the most recent numpy versions too.
- Prevented
IMTLGfrom dividing by zero during its weight rescaling step. If the input matrix consists only of zeros, it will now return a vector of zeros instead of a vector ofnan.
autojacpackage containing the backward pass functions and their dependencies.mtl_backwardfunction to make a backward pass for multi-task learning.- Multi-task learning example.
- BREAKING: Moved the
backwardmodule to theautojacpackage. Some imports may have to be adapted. - Improved documentation of
backward.
- Fixed wrong tensor device with
IMTLGin some rare cases. - BREAKING: Removed the possibility of populating the
.gradfield of a tensor that does not expect it when callingbackward. If an inputtprovided to backward does not satisfyt.requires_grad and (t.is_leaf or t.retains_grad), an error is now raised. - BREAKING: When using
backward, aggregations are now accumulated into the.gradfields of the inputs rather than replacing those fields if they already existed. This is in line with the behavior oftorch.autograd.backward.
- Basic project structure.
aggregationpackage:Aggregatorbase class to aggregate Jacobian matrices.AlignedMTLfrom Independent Component Alignment for Multi-Task Learning.CAGradfrom Conflict-Averse Gradient Descent for Multi-task Learning.Constantto aggregate with constant weights.DualProjadapted from Gradient Episodic Memory for Continual Learning.GradDropfrom Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout.IMTLGfrom Towards Impartial Multi-task Learning.Krumfrom Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent.Meanto average the rows of the matrix.MGDAfrom Multiple-gradient descent algorithm (MGDA) for multiobjective optimization.NashMTLfrom Multi-Task Learning as a Bargaining Game.PCGradfrom Gradient Surgery for Multi-Task Learning.Randomfrom Reasonable Effectiveness of Random Weighting: A Litmus Test for Multi-Task Learning.Sumto sum the rows of the matrix.TrimmedMeanfrom Byzantine-Robust Distributed Learning: Towards Optimal Statistical Rates.UPGradfrom Jacobian Descent for Multi-Objective Optimization.
backwardfunction to perform a step of Jacobian descent.- Documentation of the public API and of some usage examples.
- Tests:
- Unit tests.
- Documentation tests.
- Plotting utilities to verify qualitatively that aggregators work as expected.