Abstract
Autoencoders have been recently applied to outlier detection. However, neural networks are known to be vulnerable to overfitting, and therefore have limited potential in the unsupervised outlier detection setting. The majority of existing deep learning methods for anomaly detection is sensitive to contamination of the training data to anomalous instances. To overcome the aforementioned limitations we develop a Boosting-based Autoencoder Ensemble approach (BAE). BAE is an unsupervised ensemble method that, similarly to boosting, builds an adaptive cascade of autoencoders to achieve improved and robust results. BAE trains the autoencoder components sequentially by performing a weighted sampling of the data, aimed at reducing the amount of outliers used during training, and at injecting diversity in the ensemble. We perform extensive experiments and show that the proposed methodology outperforms state-of-the-art approaches under a variety of conditions.
Giovanni Stilo—His work is partially supported by Territori Aperti a project funded by Fondo Territori Lavoro e Conoscenza CGIL CISL UIL and by SoBigData-PlusPlus H2020-INFRAIA-2019-1 EU project, contract number 871042.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We use two colors—black and white—to highlight the sampling of outliers. The inliers are represented without a color variation.
- 2.
The code of BAE is available at https://gitlab.com/bardhp95/bae.
- 3.
By \(2.5\%\) on average.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
Used data available at: https://figshare.com/articles/dataset/MNIST_dataset_for_Outliers_Detection_-_MNIST4OD_/9954986.
- 10.
The ELKI project implementation of HiCS produces a warning message and the execution does not reach a converging state.
References
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. In: ACM Sigmod Record, vol. 29, pp. 93–104. ACM (2000)
Campos, G.O., Zimek, A., Meira, W.: An unsupervised boosting strategy for outlier detection ensembles. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 564–576. Springer (2018)
Campos, G.O., Zimek, A., Sander, J., Campello, R.J., Micenková, B., Schubert, E., Assent, I., Houle, M.E.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Disc. 30(4), 891–927 (2016)
Chalapathy, R., Chawla, S.: Deep learning for anomaly detection: a survey. arXiv:1901.03407 (2019)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3) (2009)
Chen, J., Sathe, S., Aggarwal, C., Turaga, D.: Outlier detection with autoencoder ensembles. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 90–98. SIAM (2017)
Hawkins, S., He, H., Williams, G., Baxter, R.: Outlier detection using replicator neural networks. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 170–180. Springer (2002)
Keller, F., Muller, E., Bohm, K.: Hics: high contrast subspaces for density-based outlier ranking. In: 2012 IEEE 28th international conference on data engineering, pp. 1037–1048. IEEE (2012)
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE (2008)
Liu, Y., Li, Z., Zhou, C., Jiang, Y., Sun, J., Wang, M., He, X.: Generative adversarial active learning for unsupervised outlier detection. IEEE Trans. Knowl. Data Eng. 38(8), 1517–1528 (2019)
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: ACM Sigmod Record, vol. 29, pp. 427–438. ACM (2000)
Rayana, S., Akoglu, L.: Less is more: Building selective anomaly ensembles with application to event detection in temporal graphs. In: Proceedings of the 2015 SIAM International Conference on Data Mining, pp. 622–630. SIAM (2015)
Rayana, S., Zhong, W., Akoglu, L.: Sequential ensemble learning for outlier detection: a bias-variance perspective. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 1167–1172. IEEE (2016)
Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PloS one 10(3) (2015)
Sarvari, H., Domeniconi, C., Stilo, G.: Graph-based selective outlier ensembles. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pp. 518–525. ACM (2019)
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural comput. 13(7) (2001)
Schubert, R., Wojdanowski, R., Zimek, A., Kriegel, H.P.: On evaluation of outlier rankings and outlier scores. In: Proceedings of the SIAM International Conference on Data Mining, pp. 1047–1058. SIAM (2012)
Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. pp. 813–822. Springer (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Sarvari, H., Domeniconi, C., Prenkaj, B., Stilo, G. (2021). Unsupervised Boosting-Based Autoencoder Ensembles for Outlier Detection. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12712. Springer, Cham. https://doi.org/10.1007/978-3-030-75762-5_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-75762-5_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75761-8
Online ISBN: 978-3-030-75762-5
eBook Packages: Computer ScienceComputer Science (R0)