v0.102
============================== Release Notes: v0.102 ==============================
Support for new training algorithms:
- LTFB is now a first-class training algorithm.
- LTFB now allows multiple metrics. The local algorithm is favored by
each trainer and a partner model must win every metric to be declared
the tournament winner. - The batched iterative optimizer (sgd_training_algorithm) was
refactored for consistency. - Improved documentation of training algorithm infrastructure.
Support for new network structures:
- ATOM WAE model - character-based Wasserstein Autoencoder
- Community GAN model for graph data sets
Support for new layers:
- "DFTAbs" layer that computes the absolute value of the channel-wise
DFT of the input data - Adding support for 3D Matrix Multiplication
- Added scatter and gather neural network layers
- CPU-based GRU layers using oneDNN
- Added batch-wise reduce-sum
- ArcFace loss
Python front-end:
- Added 3D U-Net Model
- Added Cosmoflow Model
- Ported CANDLE Pilot1 models
- Support nvprof
- Added channelwise fully connected layer
- Added support for non square kernels, padding, stride, and
dilation for the convolution module - Support for OpenMPI launcher
Performance optimizations:
- Use cuDNN 8 RNN API and CUDA Graphs in GRU layer
- Cache CUDA Graphs for each active mini-batch size
- Tuned performance of slice, concatenate, and tessellate layers on
ARM processors - Parallelize computation of Gaussian random numbers
- Optimizing tessellate, concatenate, and slice layers on CPU
Experiments & Applications:
- Added experiment scripts for ATOM cWAE Gordon Bell simulations
- LBANN-ATOM model inference and analysis
Internal features:
- Wrapper classes for CUDA Graphs API
- Elementary examples of using complex numbers
- cuDNN handles are now wrapped in RAII management classes
- Improved HWLOC compatility for v1.11 and v2.x
- Added an enum type of visitor hooks that will eventually be used to
allow callbacks or other visitors to operate at user defined hook
points - Changed checkpoint logic to checkpoint at the start of epochs
and changed the naming scheme to use the callback phase (visitor
hook) in the name rather than the current execution context. - Added in-memory binary model exchange for LTFB.
- Added support for ROCm and MIOpen
- Added support for oneDNN
- Updated the bamboo test environment to use local executable rather
than hard coded executables - Overhauled and refactored serialization throughout code to use
Cereal serialization library - Significant cleanup and refactoring of code base to improve compile
times. Moving to ensure that code adheres to standard split of
header between declaration and implementation functions (for
templated code). Specifically focused on serialization functions
and comm class. Reduced dependencies through over reaching header
inclusions. - The relationship of execution_contexts and training_algorithms was
clarified. There is still work to do here. - Added DistConv tests both convolution and pooling layers
- Support padding in distributed embedding layer
- Added dump model graph callback
- Added perturb learning rate callback
- Added batched inference algorithm
- Switched ATOM tests to use CPU embedding and tessellate layers to
minimize noise
I/O & data readers:
- Experimental data reader that generates graph random walks with
HavoqGT - Added explict tournament execution mode
- Added support to split training data reader into validation and
tournament readers - node2vec data reader
Build system:
- Hydrogen v1.5.0+
- Aluminum v0.5.0+
- DiHydrogen v0.2.0 is required
- C++14 or newer standard with CUDA (CMake: "-DCMAKE_CUDA_STANDARD=14")
- OpenCV is now an optional dependency via CMake "LBANN_WITH_VISION"
- CNPY is now an optional dependency via CMake "LBANN_WITH_CNPY"
- Adds support in the build_lbann.sh script for concretizing extra
packages with the primary LBANN installation - New features in the build script to setup / configure the build
environment, but stop and allow the user to manually add extra
packages - Add a set of user-focused build scripts that use the main
build_lbann.sh script to setup good defaults on known systems - Added application specific build scripts for users such as ATOM
- Added support for pulling from Spack mirrors and setting them up
- Split embedded Python support from Python Front End
- Switched Spack-based build script to use Spack's clingo concretizer
Bug fixes:
- Fixed a bug where LBANN didn't set the Hydrogen RNG seed
- Fixed both CosmoFlow and UNet models PFE as well as addressed
issues in the data reader and data coordinator. - Fixed the HDF5 data reader to properly specify the supported I/O
types - Fixed calculation of the linearized response size
- Fixed the data coordinator's interface to input_layer
- Fixed error with deterministic execution of dropout layers
Retired features:
- Removed deprecated JAG leader mode which was made obsolete when the
data reader moved into the data coordinator - Removed the deprecated partitioned data reader modes that were used
to partition and overlap data sets for multiple models - Removed deprecated ActivationDescriptor class