Releases: TorchJD/torchjd
v0.4.1
Bug fix
This patch fixes a bug introduced in the (yanked) v0.4.0 which could cause backward
and mtl_backward
to fail on some specific tensor shapes.
Changelog
Fixed
- Fixed a bug introduced in v0.4.0 that could cause
backward
andmtl_backward
to fail with some
tensor shapes.
Contributors
v0.4.0
Sequential differentiation improvements
This version provides some improvements to how backward
and mtl_backward
differentiate when parallel_chunk_size
is such that not all tensors can be differentiated in parallel at once (for instance if parallel_chunk_size=2
but you have 3
losses).
In particular, when a single tensor has to be differentiated (e.g. when using parellel_chunk_size=1
), we now avoid relying on torch.vmap
, which has several issues.
The parameter retain_graph
of backward
and mtl_backward
has also been changed to be only used during the last differentiation. In most cases, you can now simply use the default retain_graph=False
(prior to this change, you had to use retain_graph=True
if the differentiations were not all made in parallel at once). This should provide some improvements in terms of memory overhead.
Lastly, this update enables the usage of torchjd for training recurrent neural networks. As @lth456321 discovered, there can be an incompatibility between torch.vmap
and torch.nn.RNN
when running on CUDA. With this update, you can now simply set the parellel_chunk_size
to 1
to avoid using torch.vmap
and fix the problem. A usage example for RNNs has therefore been added to the documentation.
Changelog
Changed
-
Changed how the Jacobians are computed when calling
backward
ormtl_backward
with
parallel_chunk_size=1
to not rely ontorch.autograd.vmap
in this case. Whenevervmap
does
not support something (compiled functions, RNN on cuda, etc.), users should now be able to avoid
usingvmap
by callingbackward
ormtl_backward
withparallel_chunk_size=1
. -
Changed the effect of the parameter
retain_graph
ofbackward
andmtl_backward
. When set to
False
, it now frees the graph only after all gradients have been computed. In most cases, users
should now leave the default valueretain_graph=False
, no matter what the value of
parallel_chunk_size
is. This will reduce the memory overhead.
Added
- RNN training usage example in the documentation.
Contributors
v0.3.1
Performance improvement patch
This patch improves the performance of the function finding the default tensors with respect to which backward
and mtl_backward
should differentiate. We thank @austen260 for finding the source of the performance issue and for proposing a working solution.
Changelog
Changed
- Improved the performance of the graph traversal function called by
backward
andmtl_backward
to find the tensors with respect to which differentiation should be done. It now visits every node
at most once.
Contributors
v0.3.0
The interface update
This version greatly improves the interface of backward
and mtl_backward
, at the cost of some easy-to-fix breaking changes (some parameters of these functions have been renamed, or their order has been swapped due to becoming optional).
Downstream changes to make to keep using backward
and mtl_backward
:
- Rename
A
toaggregator
or pass it as a positional argument. - For
backward
, unless you specifically want to avoid differentiating with respect to some parameters, you can now simply use the default value of theinputs
argument. - For
mtl_backward
, unless you want to customize which params should be updated with a step of JD and which should be updated with a step of GD, you can now simply use the default value of theshared_params
and of thetasks_params
arguments. - If you keep providing the
inputs
or theshared_params
ortasks_params
arguments as positional arguments, you should provide them after the aggregator.
For instance,
backward(tensors, inputs, A=aggregator)
should become
backward(tensors, aggregator)
and
mtl_backward(losses, features, tasks_params, shared_params, A=aggregator)
should become
mtl_backward(losses, features, aggregator)
We thank @raeudigerRaeffi for sharing his idea of having default values for the tensors with respect to which the differentiation should be made in backward
and mtl_backward
, and for implementing the first working version of the function that automatically finds these parameters from the autograd graph.
Changelog
Added
- Added a default value to the
inputs
parameter ofbackward
. If not provided, theinputs
will
default to all leaf tensors that were used to compute thetensors
parameter. This is in line
with the behavior of
torch.autograd.backward. - Added a default value to the
shared_params
and to thetasks_params
arguments of
mtl_backward
. If not provided, theshared_params
will default to all leaf tensors that were
used to compute thefeatures
, and thetasks_params
will default to all leaf tensors that were
used to compute each of thelosses
, excluding those used to compute thefeatures
. - Note in the documentation about the incompatibility of
backward
andmtl_backward
with tensors
that retain grad.
Changed
- BREAKING: Changed the name of the parameter
A
toaggregator
inbackward
and
mtl_backward
. - BREAKING: Changed the order of the parameters of
backward
andmtl_backward
to make it
possible to have a default value forinputs
and forshared_params
andtasks_params
,
respectively. Usages ofbackward
andmtl_backward
that rely on the order between arguments
must be updated. - Switched to the PEP 735 dependency groups format in
pyproject.toml
(from a[tool.pdm.dev-dependencies]
to a[dependency-groups]
section). This
should only affect development dependencies.
Fixed
- BREAKING: Added a check in
mtl_backward
to ensure thattasks_params
andshared_params
have no overlap. Previously, the behavior in this scenario was quite arbitrary.
Contributors
v0.2.2
This version fixes a dependency-related bug and improves the documentation.
Changelog:
Added
- PyTorch Lightning integration example.
- Explanation about Jacobian descent in the README.
Fixed
- Made the dependency on ecos explicit in pyproject.toml
(beforecvxpy
1.16.0, it was installed automatically when installingcvxpy
).
Contributors
v0.2.1
This version fixes some bugs and inconveniences.
Changelog:
Changed
- Removed upper cap on
numpy
version in the dependencies. This makestorchjd
compatible with
the most recent numpy versions too.
Fixed
- Prevented
IMTLG
from dividing by zero during its weight rescaling step. If the input matrix
consists only of zeros, it will now return a vector of zeros instead of a vector ofnan
.
Contributors
v0.2.0
The multi-task learning update
This version mainly introduces mtl_backward, enabling multi-task learning with Jacobian descent. See this new example to get started!
It also brings many improvements to the documentation, to the unit tests and to the internal code structure. Lastly, it fixes a few bugs and invalid behaviors.
Changelog:
Added
autojac
package containing the backward pass functions and their dependencies.mtl_backward
function to make a backward pass for multi-task learning.- Multi-task learning example.
Changed
- BREAKING: Moved the
backward
module to theautojac
package. Some imports may have to be
adapted. - Improved documentation of
backward
.
Fixed
- Fixed wrong tensor device with
IMTLG
in some rare cases. - BREAKING: Removed the possibility of populating the
.grad
field of a tensor that does not
expect it when callingbackward
. If an inputt
provided to backward does not satisfy
t.requires_grad and (t.is_leaf or t.retains_grad)
, an error is now raised. - BREAKING: When using
backward
, aggregations are now accumulated into the.grad
fields
of the inputs rather than replacing those fields if they already existed. This is in line with the
behavior oftorch.autograd.backward
.