Generative models for anomaly detection. This Julia package contains code for the paper "Comparison of Anomaly Detectors: Context Matters" arXiv preprint.
- Clone this repo somewhere.
- Run Julia in the cloned dir.
cd path/to/repo/GenerativeAD.jl
julia --project
- Install all the packages using
]instantiate
and compile the package.
(@julia) pkg> instantiate
(@julia) using GenerativeAD
Some of the bash scripts are calling julia
without --project
flag and uses @quickactivate
macro to activate the environment, however this fails, unless DrWatson
is installed in the base julia environment. In order to avoid these problems install DrWatson
in your base environment.
cd ~
julia -e 'using Pkg; Pkg.add("DrWatson");'
Some models (PIDforest, scikit-learn, PyOD) are available only through PyCall with appropriate environment active. With upcoming bayesian optimisation from scikit-optimize
every model will require an active environment, which can be setup in following way using python's venv
module. (Most of the scripts have hardcoded path to this environment, though this can be easily changed).
cd ~
python -m venv sklearn-env
source ${HOME}/sklearn-env/bin/activate
export PYTHON="${HOME}/sklearn-env/bin/python"
Then install requirements inside this repository
cd path/to/repo/GenerativeAD.jl
pip install -r requirements.txt
pip install git+https://github.com/janfrancu/pidforest.git # not registerd anywhere
julia --project -e 'using Pkg; Pkg.build("PyCall");' # rebuilds PyCall.jl to point to the current environment
- First, load Julia and Python modules.
ml Julia
ml Python
- Install the package somewhere on the RCI cluster.
- Then the experiments can be run via
slurm
. This will run 20 experiments with the basic VAE model, each with 5 crossvalidation repetitions on all datasets in the text file with 10 parallel processes for each dataset. All data will be saved inGenerativeAD.jl/data/experiments/tabular
cd GenerativeAD.jl/scripts/experiments_tabular
./run_parallel.sh vae 20 5 10 datasets_tabular.txt
Only UCI datasets are available upon installation via the UCI
package. Remaining tabular and image datasets are downloaded upon first request (e.g. via the GenerativeAD.Datasets.load_data(dataset)
function). First download requires user input to accept download terms for individual datasets. If you want to avoid this, do
export DATADEPS_ALWAYS_ACCEPT=true
before running Julia.