diff --git a/docs/paper/paper.md b/docs/paper/paper.md index a38a2c0d..9936f56a 100644 --- a/docs/paper/paper.md +++ b/docs/paper/paper.md @@ -46,7 +46,9 @@ The package we present, InvertibleNetworks.jl, is a pure Julia [@bezanson2017jul # Statement of need -This software package focuses on memory efficiency. The promise of neural networks is in learning high-dimensional distributions from examples, thus normalizing flow packages should allow easy application to large-dimensional inputs such as images or 3D volumes. Interestingly, the invertibility of normalizing flows naturally alleviates memory concerns since intermediate network activations can be recomputed instead of saved in memory, greatly reducing the memory needed during backpropagation. The problem is that directly implementing normalizing flows in automatic differentiation frameworks such as PyTorch [@paszke2017automatic] will not automatically exploit this invertibility. The available packages for normalizing flows such as nflows [@nflows], normflows [@stimper2023normflows] and FrEIA [@freia] are built depending on automatic differentiation frameworks, and thus do not efficiently exploit invertibility for memory. + +This software package focuses on memory efficiency. The promise of neural networks is in learning high-dimensional distributions from examples thus normalizing flow packages should allow easy application to large dimensional inputs such as images or 3D volumes. Interestingly, the invertibility of normalizing flows naturally alleviates memory concerns since intermediate network activations can be recomputed instead of saved in memory, greatly reducing the memory needed during backpropagation. The problem is that directly implementing normalizing flows in automatic differentiation frameworks such as PyTorch [@paszke2017automatic] will not automatically exploit this invertibility. The available packages for normalizing flows such as nflows [@nflows], normflows [@stimper2023normflows] and FrEIA [@freia] are built depending on automatic differentiation frameworks and thus do not exploit invertibility for memory efficiently. + # Memory efficiency By implementing gradients by hand, instead of depending completely on automatic differentiation, our layers are capable of scaling to large inputs. By scaling, we mean that these codes are not prone to out-of-memory errors when training on GPU accelerators. Indeed, previous literature has described memory problems when using normalizing flows as their invertibility requires the latent code to maintain the same dimensionality as the input [@khorashadizadeh2023conditional]. @@ -60,12 +62,12 @@ In \autoref{fig:memory}, we show the relation between input size and the memory \label{fig:memory-depth}](./figs/mem_used_new_depth.png) -Since traditional normalizing flow architectures need to be invertible, they might be less expressive than non-invertible counterparts. In order to increase their expressiveness, practitioners stack many invertible layers to increase the overall expressive power. Increasing the depth of a neural network would in most cases increase the memory consumption of the network but in this case since normalizing flows are invertible, the memory consumption does not increase. Our package displays this phenomenon as shown in \autoref{fig:memory-depth} while the PyTorch (normflows) package that has been implemented with automatic differentiation does not display this constant memory phenomenon. +Since traditional normalizing flow architectures need to be invertible they might be less expressive then their non-invertible counterparts. In order to increase their expressiveness, practitioners stack many invertible layers to increase the overall expressive power. Increasing the depth of a neural network would in most cases increase the memory consumption of the network but in this case, since normalizing flows are invertible, the memory consumption does not increase. Our package displays this phenomenon as shown in \autoref{fig:memory-depth} while the PyTorch (normflows) package, which has been implemented with automatic differentiation, does not display this constant memory phenomenon. # Ease of use -Although the normalizing flow layers gradients are hand-written, the package is fully compatible with ChainRules [@frames_white_2023_10100624] in order to integrate with automatic differentiation frameworks in Julia such as Zygote [@innes2019differentiable]. This integration allows users to add arbitrary neural networks which will be differentiated by automatic differentiation while the memory bottleneck created by normalizing flow gradients will be dealt with InvertibleNetworks.jl. The typical use case for this combination is the summary networks used in amortized variational inference such as BayesFlow [@radev2020bayesflow], which has been implemented in our package. +Although the normalizing flow layer gradients are hand-written, the package is fully compatibly with ChainRules [@frames_white_2023_10100624] in order to integrate with automatic differentiation frameworks in Julia such as Zygote [@innes2019differentiable]. This integration allows users to add arbitrary neural networks which will be differentiated by automatic differentiation while the memory bottleneck created by normalizing flow gradients will be dealt with by InvertibleNetworks.jl. The typical use case for this combination are the summary networks used in amortized variational inference such as BayesFlow [@radev2020bayesflow], which is also implemented in our package. -All implemented layers are tested for invertibility and correctness of their gradients with continuous integration testing via GitHub actions. There are many examples of layers, networks and application workflows allowing new users to quickly build networks for a variety of applications. The ease of use is demonstrated by the publications that have used the package. +All implemented layers are tested for invertibility and correctness of their gradients with continuous integration testing via GitHub actions. There are many examples for layers, networks and application workflows allowing new users to quickly build networks for a variety of applications. The ease of use is demonstrated by the publications that made use of the package. Many publications have used InvertibleNetworks.jl for diverse applications including change point detection, [@peters2022point], acoustic data denoising [@kumar2021enabling], seismic imaging [@rizzuti2020parameterizing; @siahkoohi2021preconditioned; @siahkoohi2022wave;@siahkoohi2023reliable; @louboutin2023learned], fluid flow dynamics [@yin2023solving], medical imaging [@orozco2023adjoint;@orozco2023amortized; @orozco2021photoacoustic;@orozco2023refining] and monitoring CO2 for combating climate change [@gahlot2023inference]. @@ -73,7 +75,6 @@ Many publications have used InvertibleNetworks.jl for diverse applications inclu The neural network primitives (convolutions, non-linearities, pooling etc) are implemented in NNlib.jl abstractions, thus support for AMD, Intel and Apple GPUs can be trivially extended. Also, while our package can currently handle 3D inputs and has been used on large volume-based medical imaging [@orozco2022memory], there are interesting avenues of research regarding the "channel explosion" seen in invertible down and upsampling used in invertible networks [@peters2019symmetric]. - # References ::: {#refs}