02 Jan 21:41

7602801

v0.4.1 Latest

Latest

Bug fix

This patch fixes a bug introduced in the (yanked) v0.4.0 which could cause backward and mtl_backward to fail on some specific tensor shapes.

Changelog

Fixed

Fixed a bug introduced in v0.4.0 that could cause backward and mtl_backward to fail with some
tensor shapes.

Contributors

@ValerianRey

Contributors

ValerianRey

Assets 2

02 Jan 20:36

ValerianRey

v0.4.0

1a8454e

v0.4.0

Sequential differentiation improvements

This version provides some improvements to how backward and mtl_backward differentiate when parallel_chunk_size is such that not all tensors can be differentiated in parallel at once (for instance if parallel_chunk_size=2 but you have 3 losses).

In particular, when a single tensor has to be differentiated (e.g. when using parellel_chunk_size=1), we now avoid relying on torch.vmap, which has several issues.

The parameter retain_graph of backward and mtl_backward has also been changed to be only used during the last differentiation. In most cases, you can now simply use the default retain_graph=False (prior to this change, you had to use retain_graph=True if the differentiations were not all made in parallel at once). This should provide some improvements in terms of memory overhead.

Lastly, this update enables the usage of torchjd for training recurrent neural networks. As @lth456321 discovered, there can be an incompatibility between torch.vmap and torch.nn.RNN when running on CUDA. With this update, you can now simply set the parellel_chunk_size to 1 to avoid using torch.vmap and fix the problem. A usage example for RNNs has therefore been added to the documentation.

Changelog

Changed

Changed how the Jacobians are computed when calling backward or mtl_backward with
parallel_chunk_size=1 to not rely on torch.autograd.vmap in this case. Whenever vmap does
not support something (compiled functions, RNN on cuda, etc.), users should now be able to avoid
using vmap by calling backward or mtl_backward with parallel_chunk_size=1.
Changed the effect of the parameter retain_graph of backward and mtl_backward. When set to
False, it now frees the graph only after all gradients have been computed. In most cases, users
should now leave the default value retain_graph=False, no matter what the value of
parallel_chunk_size is. This will reduce the memory overhead.

Added

RNN training usage example in the documentation.

Contributors

PierreQuinton, ValerianRey, and lth456321

Assets 2

21 Dec 14:38

ValerianRey

v0.3.1

645629c

v0.3.1

Performance improvement patch

This patch improves the performance of the function finding the default tensors with respect to which backward and mtl_backward should differentiate. We thank @austen260 for finding the source of the performance issue and for proposing a working solution.

Changelog

Changed

Improved the performance of the graph traversal function called by backward and mtl_backward
to find the tensors with respect to which differentiation should be done. It now visits every node
at most once.

Contributors

PierreQuinton, ValerianRey, and austen260

Assets 2

10 Dec 21:25

ValerianRey

v0.3.0

1eaafee

v0.3.0

The interface update

This version greatly improves the interface of backward and mtl_backward, at the cost of some easy-to-fix breaking changes (some parameters of these functions have been renamed, or their order has been swapped due to becoming optional).

Downstream changes to make to keep using backward and mtl_backward:

Rename A to aggregator or pass it as a positional argument.
For backward, unless you specifically want to avoid differentiating with respect to some parameters, you can now simply use the default value of the inputs argument.
For mtl_backward, unless you want to customize which params should be updated with a step of JD and which should be updated with a step of GD, you can now simply use the default value of the shared_params and of the tasks_params arguments.
If you keep providing the inputs or the shared_params or tasks_params arguments as positional arguments, you should provide them after the aggregator.

For instance,

backward(tensors, inputs, A=aggregator)

should become

backward(tensors, aggregator)

and

mtl_backward(losses, features, tasks_params, shared_params, A=aggregator)

should become

mtl_backward(losses, features, aggregator)

We thank @raeudigerRaeffi for sharing his idea of having default values for the tensors with respect to which the differentiation should be made in backward and mtl_backward, and for implementing the first working version of the function that automatically finds these parameters from the autograd graph.

Changelog

Added

Added a default value to the inputs parameter of backward. If not provided, the inputs will
default to all leaf tensors that were used to compute the tensors parameter. This is in line
with the behavior of
torch.autograd.backward.
Added a default value to the shared_params and to the tasks_params arguments of
mtl_backward. If not provided, the shared_params will default to all leaf tensors that were
used to compute the features, and the tasks_params will default to all leaf tensors that were
used to compute each of the losses, excluding those used to compute the features.
Note in the documentation about the incompatibility of backward and mtl_backward with tensors
that retain grad.

Changed

BREAKING: Changed the name of the parameter A to aggregator in backward and
mtl_backward.
BREAKING: Changed the order of the parameters of backward and mtl_backward to make it
possible to have a default value for inputs and for shared_params and tasks_params,
respectively. Usages of backward and mtl_backward that rely on the order between arguments
must be updated.
Switched to the PEP 735 dependency groups format in
pyproject.toml (from a [tool.pdm.dev-dependencies] to a [dependency-groups] section). This
should only affect development dependencies.

Fixed

BREAKING: Added a check in mtl_backward to ensure that tasks_params and shared_params
have no overlap. Previously, the behavior in this scenario was quite arbitrary.

Contributors

PierreQuinton, ValerianRey, and raeudigerRaeffi

Assets 2

11 Nov 22:34

ValerianRey

v0.2.2

d0974e2

v0.2.2

This version fixes a dependency-related bug and improves the documentation.

Changelog:

Added

PyTorch Lightning integration example.
Explanation about Jacobian descent in the README.

Fixed

Made the dependency on ecos explicit in pyproject.toml
(before cvxpy 1.16.0, it was installed automatically when installing cvxpy).

Contributors

PierreQuinton and ValerianRey

Assets 2

17 Sep 16:53

ValerianRey

v0.2.1

802c399

v0.2.1

This version fixes some bugs and inconveniences.

Changelog:

Changed

Removed upper cap on numpy version in the dependencies. This makes torchjd compatible with
the most recent numpy versions too.

Fixed

Prevented IMTLG from dividing by zero during its weight rescaling step. If the input matrix
consists only of zeros, it will now return a vector of zeros instead of a vector of nan.

Contributors

PierreQuinton and ValerianRey

Assets 2

04 Sep 23:07

ValerianRey

v0.2.0

929be29

v0.2.0

The multi-task learning update

This version mainly introduces mtl_backward, enabling multi-task learning with Jacobian descent. See this new example to get started!
It also brings many improvements to the documentation, to the unit tests and to the internal code structure. Lastly, it fixes a few bugs and invalid behaviors.

Changelog:

Added

autojac package containing the backward pass functions and their dependencies.
mtl_backward function to make a backward pass for multi-task learning.
Multi-task learning example.

Changed

BREAKING: Moved the backward module to the autojac package. Some imports may have to be
adapted.
Improved documentation of backward.

Fixed

Fixed wrong tensor device with IMTLG in some rare cases.
BREAKING: Removed the possibility of populating the .grad field of a tensor that does not
expect it when calling backward. If an input t provided to backward does not satisfy
t.requires_grad and (t.is_leaf or t.retains_grad), an error is now raised.
BREAKING: When using backward, aggregations are now accumulated into the .grad fields
of the inputs rather than replacing those fields if they already existed. This is in line with the
behavior of torch.autograd.backward.

Contributors

PierreQuinton and ValerianRey

Assets 2

22 Jun 22:34

ValerianRey

v0.1.0

6b278b8

v0.1.0

Initial release of TorchJD.

Full changelog: CHANGELOG.md

Contributors

PierreQuinton and ValerianRey

Assets 2

Releases: TorchJD/torchjd

v0.4.1

Bug fix

Changelog

Fixed

Contributors

Contributors

v0.4.0

Sequential differentiation improvements

Changelog

Changed

Added

Contributors

Contributors

v0.3.1

Performance improvement patch

Changelog

Changed

Contributors

Contributors

v0.3.0

The interface update

Changelog

Added

Changed

Fixed

Contributors

Contributors

v0.2.2

Changelog:

Added

Fixed

Contributors

Contributors

v0.2.1

Changelog:

Changed

Fixed

Contributors

Contributors

v0.2.0

The multi-task learning update

Changelog:

Added

Changed

Fixed

Contributors

Contributors

v0.1.0

Contributors

Contributors