Deep Learning based Anomaly Detection using Ensembles with Leave-out classes

Anomalies are the out-of-distribution data that needs to be recognized and removed before it affects the system. Modern neural networks are capable to classify data with high accuracy based on the knowledge it gained from the training data but are unable to identify if the new sample belongs to in-distribution or is simply an anomaly. To address this issue, various state-of-the-art methods exploit the knowledge of the softmax distribution of a sample by a classifier and determine if the sample is an anomaly or not.

In our work, to improve the anomaly detection performance, we exploit the strength of ensemble learning and combine the knowledge gained by each classifier trained on different samples of data by leaving one class out at a time. We propose two methods in this work. The first method is a statistical comparison with the help of In- and Out-of-Distribution (OOD) reference vectors and utilizing them to determine anomalies. The second method focuses on training a binary classifier on the dataset generated by combining the output softmax distribution of a sample by each classifier in the ensemble and subsequently classify a given instance as an anomaly or not. Our work includes experiments and evaluation of ensemble method with leave-out classes to detect anomalies on image datasets (MNIST and CIFAR-10) and text datasets (20 Newsgroups). Our implementation outperforms the baseline method on the MNIST dataset.

In the context of anomaly detection using ensembles, we try to answer the following questions:

RQ-1: Can ensemble learning aided with different data distributions help in distinguishing anomalous data from the ones the model was trained on?
RQ-2: Can the knowledge gained from leaving out classes in the classifiers of an ensemble be exploited to learn more about the anomalous data?

Methodology

Method 1 - Statistical comparison

Method 1 is a statistical comparison with the help of rules or similarity measures to determine whether the input is IN data or OOD data.

Method 2 - Softmax Pattern Learning

The second method that we propose involves training a binary classifier on the dataset built from the softmax probability distribution. The motivation behind this method is the ability of classifiers to understand the underlying patterns in a given data. It is believed that if enough training data is provided to the classifiers, they are capable to learn from it and recognize the underlying pattern and classify/predict the test data correctly.

Results

Conclusion

Though our experiments are not exhaustive mainly due to the limited computational resources, we attempt to answer the two questions we started with, with the obtained figures.

From our experiments, it is observed that

only the MNIST dataset outperforms the baseline scores. This prompts us to conclude that training an ensemble of classifiers with leave-out classes need not necessarily work well for all datasets.
However, all three datasets show a common trend of giving good AUROC scores when we exploited entropy values for IN and OOD data.
Also, it is worth noting decision rules based on hardcoded threshold counts would not work well. Instead, a binary classifier trained on the softmax distributions should help better for distinguishing instances the model was trained on from those which were not.

Nevertheless, different distribution of IN and OOD data, with a different number of leave-out classes, trained on better architectures should be experimented in order to arrive at a concrete conclusion on the potential questions this topic poses.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
docs		docs
notebooks		notebooks
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning based Anomaly Detection using Ensembles with Leave-out classes

Methodology

Method 1 - Statistical comparison

Method 2 - Softmax Pattern Learning

Results

Conclusion

About

Releases

Packages

Contributors 3

Languages

libinjohn26/Anomaly_Detection

Folders and files

Latest commit

History

Repository files navigation

Deep Learning based Anomaly Detection using Ensembles with Leave-out classes

Methodology

Method 1 - Statistical comparison

Method 2 - Softmax Pattern Learning

Results

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages