Anomalies are the out-of-distribution data that needs to be recognized and removed before it affects the system. Modern neural networks are capable to classify data with high accuracy based on the knowledge it gained from the training data but are unable to identify if the new sample belongs to in-distribution or is simply an anomaly. To address this issue, various state-of-the-art methods exploit the knowledge of the softmax distribution of a sample by a classifier and determine if the sample is an anomaly or not.
In our work, to improve the anomaly detection performance, we exploit the strength of ensemble learning and combine the knowledge gained by each classifier trained on different samples of data by leaving one class out at a time. We propose two methods in this work. The first method is a statistical comparison with the help of In- and Out-of-Distribution (OOD) reference vectors and utilizing them to determine anomalies. The second method focuses on training a binary classifier on the dataset generated by combining the output softmax distribution of a sample by each classifier in the ensemble and subsequently classify a given instance as an anomaly or not. Our work includes experiments and evaluation of ensemble method with leave-out classes to detect anomalies on image datasets (MNIST and CIFAR-10) and text datasets (20 Newsgroups). Our implementation outperforms the baseline method on the MNIST dataset.
In the context of anomaly detection using ensembles, we try to answer the following questions:
- RQ-1: Can ensemble learning aided with different data distributions help in distinguishing anomalous data from the ones the model was trained on?
- RQ-2: Can the knowledge gained from leaving out classes in the classifiers of an ensemble be exploited to learn more about the anomalous data?
Method 1 is a statistical comparison with the help of rules or similarity measures to determine whether the input is IN data or OOD data.
The second method that we propose involves training a binary classifier on the dataset built from the softmax probability distribution. The motivation behind this method is the ability of classifiers to understand the underlying patterns in a given data. It is believed that if enough training data is provided to the classifiers, they are capable to learn from it and recognize the underlying pattern and classify/predict the test data correctly.
Though our experiments are not exhaustive mainly due to the limited computational resources, we attempt to answer the two questions we started with, with the obtained figures.
From our experiments, it is observed that
- only the MNIST dataset outperforms the baseline scores. This prompts us to conclude that training an ensemble of classifiers with leave-out classes need not necessarily work well for all datasets.
- However, all three datasets show a common trend of giving good AUROC scores when we exploited entropy values for IN and OOD data.
- Also, it is worth noting decision rules based on hardcoded threshold counts would not work well. Instead, a binary classifier trained on the softmax distributions should help better for distinguishing instances the model was trained on from those which were not.
Nevertheless, different distribution of IN and OOD data, with a different number of leave-out classes, trained on better architectures should be experimented in order to arrive at a concrete conclusion on the potential questions this topic poses.