See update below on selecting value of NLMS order for LFZip
LFZip (NLMS prediction mode) is now available on conda through the conda-forge channel. For the neural network prediction mode or to run from source, see the next section.
conda create --name lfzip_env
conda activate lfzip_env
conda config --add channels conda-forge
conda install lfzip
After the installation, LFZip (NLMS) can be run using the command lfzip-nlms
. To install LFZip in a conda virtual environment, follow the instructions here.
Download:
git clone https://github.com/shubhamchandak94/LFZip.git
To set up virtual environment and dependencies (on Linux):
cd LFZip/src/
python3 -m venv env
source env/bin/activate
./install.sh
On macOS, you need gcc compiler for running BSC which is the entropy coder used in LFZip. For this, install gcc@9 using brew as follows:
brew update
brew install gcc@9
and then replace the last statement of the Linux instructions with
./install_macos.sh
If you get an error related to the compilation flags, please look at issue #6 which might help you out.
For processors without AVX instructions (e.g., Intel Pentium/Celeron) used in the latest Tensorflow package, do the following instead (requires a working conda installation):
cd LFZip/src/
conda create --name no_avx_env python=3.6
conda activate no_avx_env
./install_without_avx.sh
- Note that LFZip (NLMS), LFZip (NN) and CA (critical aperture) expect the input to be in numpy array (.npy) format and support only float32 arrays.
- LFZip (NLMS) additionally supports multivariate time series with at most 256 variables, where the input is a numpy array of shape
(k,T)
wherek
is the number of variables andT
is the length of the time series. - During compression, the reconstructed time series is also generated as a byproduct and stored as
compressed_file.bsc.recon.npy
. This can be used to verify the correctness of the compression-decompression pipeline. - Examples are shown after the usages below [link].
If installed using conda, replace python3 nlms_compress.py
by lfzip-nlms
.
python3 nlms_compress.py [-h] --mode MODE --infile INFILE --outfile OUTFILE
[--NLMS_order N [N ...]] [--mu MU [MU ...]]
[--absolute_error MAXERROR [MAXERROR ...]]
[--quantization_bytes QUANTIZATION_BYTES [QUANTIZATION_BYTES ...]]
with the parameters:
-h, --help show this help message and exit
--mode MODE, -m MODE c or d (compress/decompress)
--infile INFILE, -i INFILE
infile .npy/.bsc
--outfile OUTFILE, -o OUTFILE
outfile .bsc/.npy
--NLMS_order N [N ...], -n N [N ...]
order of NLMS filter for compression (default 32) -
single value or one per variable
--mu MU [MU ...] learning rate of NLMS for compression (default 0.5) -
single value or one per variable
--absolute_error MAXERROR [MAXERROR ...], -a MAXERROR [MAXERROR ...]
max allowed error for compression - single value or
one per variable
--quantization_bytes QUANTIZATION_BYTES [QUANTIZATION_BYTES ...], -q QUANTIZATION_BYTES [QUANTIZATION_BYTES ...]
number of bytes used to encode quantized error -
decides number of quantization levels. Valid values
are 1, 2 (default: 2) - single value or one per variable
Note that nlms_compress_python.py
is an older and slower version with a similar interface
but with the core NLMS compression code written in Python instead of C++.
While the default order for NLMS is 32, we have found that for certain dataset, the optimal order is 0 (i.e., the prediction step is skipped). We recommend that the user try out both values using the -n
flag for a given data source before selecting the order. We are currently working on making this process automatic.
First select the appropriate function from models.py
, e.g., FC
or biGRU
. Then call
python3 nn_trainer.py -train training_data.npy -val validation_data.npy -model_file saved_model.h5 \
-model_name model_name -model_params model_params [-lr lr -noise noise -epochs epochs]
with the parameters:
model_name: (str) name of model (function name from models.py)
model_params: space separated list of parameters to the function model_name
lr: (float) learning rate (default 1e-3 for Adam)
noise: (float) noise added to input during training (uniform[-noise,noise]), default 0
epochs: (int) number of epochs to train (0 means store random model)
CUDA_VISIBLE_DEVICES="" PYTHONHASHSEED=0 python3 nn_compress.py [-h] --mode MODE --infile INFILE --outfile OUTFILE
[--absolute_error MAXERROR] --model_file MODEL_FILE
[--quantization_bytes QUANTIZATION_BYTES]
[--model_update_period MODEL_UPDATE_PERIOD] [--lr LR]
[--epochs NUM_EPOCHS]
with the parameters:
-h, --help show this help message and exit
--mode MODE, -m MODE c or d (compress/decompress)
--infile INFILE, -i INFILE
infile .npy/bsc
--outfile OUTFILE, -o OUTFILE
outfile .bsc/.npy
--absolute_error MAXERROR, -a MAXERROR
max allowed error for compression
--model_file MODEL_FILE
model file
--quantization_bytes QUANTIZATION_BYTES, -q QUANTIZATION_BYTES
number of bytes used to encode quantized error -
decides number of quantization levels. Valid values
are 1, 2 (deafult: 2)
--model_update_period MODEL_UPDATE_PERIOD
train model (both during compression & decompression)
after seeing these many symbols (default: never train)
--lr LR learning rate for Adam when model update used
--epochs NUM_EPOCHS number of epochs to train when model update used
The CUDA_VISIBLE_DEVICES="" PYTHONHASHSEED=0
environment variables are set to ensure that the decompression works precisely the same as the compression and generates the correct reconstruction.
WARNING: in some cases, maxerror constraint can be slightly violated (~1e-5) due to numerical precision issues (only for the CA implementation).
python3 ca_compress.py [-h] --mode MODE --infile INFILE --outfile OUTFILE
[--absolute_error MAXERROR]
Input
optional arguments:
-h, --help show this help message and exit
--mode MODE, -m MODE c or d (compress/decompress)
--infile INFILE, -i INFILE
infile .npy/.bsc
--outfile OUTFILE, -o OUTFILE
outfile .bsc/.npy
--absolute_error MAXERROR, -a MAXERROR
max allowed error for compression
data/dat_to_np.py
: convert a .dat file (with 1 time series value in plaintext per line) to .npy filedata/npy_to_bin.py
: convert a .npy file to binary file used as input to SZdata/bin_to_npy.py
: convert a .bin file to .npy file
If installed using conda, replace python nlms_compress.py
by lfzip-nlms
. See also update above on selecting the NLMS order.
Compression:
python nlms_compress.py -m c -i ../data/evaluation_datasets/dna/nanopore_test.npy -o nanopore_test_compressed.bsc -a 0.01
Decompression:
python nlms_compress.py -m d -i nanopore_test_compressed.bsc -o nanopore_test.decompressed.npy
Verification:
cmp nanopore_test.decompressed.npy nanopore_test_compressed.bsc.recon.npy
Training a fully connected model (FC
in models.py
) with input_dim = 32
, num_hidden_layers = 4
, hidden_layer_size = 128
for 5 epochs with uniform noise in [-0.05,0.05] added to the input.
python nn_trainer.py -train ../data/evaluation_datasets/dna/nanopore_train.npy -val ../data/evaluation_datasets/dna/nanopore_val.npy -model_name FC -model_params 32 4 128 -model_file nanopore_trained.h5 -noise 0.05 -epochs 5
Compression:
CUDA_VISIBLE_DEVICES="" PYTHONHASHSEED=0 python nn_compress.py -m c -i ../data/evaluation_datasets/dna/nanopore_test.npy -o nanopore_test_compressed.bsc -a 0.01 --model_file nanopore_trained.h5
Decompression:
CUDA_VISIBLE_DEVICES="" PYTHONHASHSEED=0 python nn_compress.py -m d -i nanopore_test_compressed.bsc -o nanopore_test.decompressed.npy --model_file nanopore_trained.h5
Verification:
cmp nanopore_test.decompressed.npy nanopore_test_compressed.bsc.recon.npy
Compression:
python ca_compress.py -m c -i ../data/evaluation_datasets/dna/nanopore_test.npy -o nanopore_test_compressed.bsc -a 0.01
Decompression:
python ca_compress.py -m d -i nanopore_test_compressed.bsc -o nanopore_test.decompressed.npy
Verification:
cmp nanopore_test.decompressed.npy nanopore_test_compressed.bsc.recon.npy