Unsupervised Boosting-Based Autoencoder Ensembles for Outlier Detection

Sarvari, Hamed; Domeniconi, Carlotta; Prenkaj, Bardh; Stilo, Giovanni

doi:10.1007/978-3-030-75762-5_8

Hamed Sarvari¹⁵,
Carlotta Domeniconi¹⁵,
Bardh Prenkaj¹⁶ &
…
Giovanni Stilo¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12712))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

3854 Accesses
8 Citations

Abstract

Autoencoders have been recently applied to outlier detection. However, neural networks are known to be vulnerable to overfitting, and therefore have limited potential in the unsupervised outlier detection setting. The majority of existing deep learning methods for anomaly detection is sensitive to contamination of the training data to anomalous instances. To overcome the aforementioned limitations we develop a Boosting-based Autoencoder Ensemble approach (BAE). BAE is an unsupervised ensemble method that, similarly to boosting, builds an adaptive cascade of autoencoders to achieve improved and robust results. BAE trains the autoencoder components sequentially by performing a weighted sampling of the data, aimed at reducing the amount of outliers used during training, and at injecting diversity in the ensemble. We perform extensive experiments and show that the proposed methodology outperforms state-of-the-art approaches under a variety of conditions.

Giovanni Stilo—His work is partially supported by Territori Aperti a project funded by Fondo Territori Lavoro e Conoscenza CGIL CISL UIL and by SoBigData-PlusPlus H2020-INFRAIA-2019-1 EU project, contract number 871042.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We use two colors—black and white—to highlight the sampling of outliers. The inliers are represented without a color variation.
2.
The code of BAE is available at https://gitlab.com/bardhp95/bae.
3.
By \(2.5\%\) on average.
4.
https://elki-project.github.io/.
5.
https://pyod.readthedocs.io/.
6.
https://archive.ics.uci.edu/ml/datasets.php.
7.
Data from: https://www.dbs.ifi.lmu.de/research/outlier-evaluation/DAMI/.
8.
https://sci2s.ugr.es/keel/imbalanced.php.
9.
Used data available at: https://figshare.com/articles/dataset/MNIST_dataset_for_Outliers_Detection_-_MNIST4OD_/9954986.
10.
The ELKI project implementation of HiCS produces a warning message and the execution does not reach a converging state.

References

Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. In: ACM Sigmod Record, vol. 29, pp. 93–104. ACM (2000)
Google Scholar
Campos, G.O., Zimek, A., Meira, W.: An unsupervised boosting strategy for outlier detection ensembles. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 564–576. Springer (2018)
Google Scholar
Campos, G.O., Zimek, A., Sander, J., Campello, R.J., Micenková, B., Schubert, E., Assent, I., Houle, M.E.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Disc. 30(4), 891–927 (2016)
Article MathSciNet Google Scholar
Chalapathy, R., Chawla, S.: Deep learning for anomaly detection: a survey. arXiv:1901.03407 (2019)
Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3) (2009)
Google Scholar
Chen, J., Sathe, S., Aggarwal, C., Turaga, D.: Outlier detection with autoencoder ensembles. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 90–98. SIAM (2017)
Google Scholar
Hawkins, S., He, H., Williams, G., Baxter, R.: Outlier detection using replicator neural networks. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 170–180. Springer (2002)
Google Scholar
Keller, F., Muller, E., Bohm, K.: Hics: high contrast subspaces for density-based outlier ranking. In: 2012 IEEE 28th international conference on data engineering, pp. 1037–1048. IEEE (2012)
Google Scholar
Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE (2008)
Google Scholar
Liu, Y., Li, Z., Zhou, C., Jiang, Y., Sun, J., Wang, M., He, X.: Generative adversarial active learning for unsupervised outlier detection. IEEE Trans. Knowl. Data Eng. 38(8), 1517–1528 (2019)
Google Scholar
Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: ACM Sigmod Record, vol. 29, pp. 427–438. ACM (2000)
Google Scholar
Rayana, S., Akoglu, L.: Less is more: Building selective anomaly ensembles with application to event detection in temporal graphs. In: Proceedings of the 2015 SIAM International Conference on Data Mining, pp. 622–630. SIAM (2015)
Google Scholar
Rayana, S., Zhong, W., Akoglu, L.: Sequential ensemble learning for outlier detection: a bias-variance perspective. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 1167–1172. IEEE (2016)
Google Scholar
Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PloS one 10(3) (2015)
Google Scholar
Sarvari, H., Domeniconi, C., Stilo, G.: Graph-based selective outlier ensembles. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pp. 518–525. ACM (2019)
Google Scholar
Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural comput. 13(7) (2001)
Google Scholar
Schubert, R., Wojdanowski, R., Zimek, A., Kriegel, H.P.: On evaluation of outlier rankings and outlier scores. In: Proceedings of the SIAM International Conference on Data Mining, pp. 1047–1058. SIAM (2012)
Google Scholar
Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. pp. 813–822. Springer (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

George Mason University, Fairfax, VA, USA
Hamed Sarvari & Carlotta Domeniconi
Sapienza University of Rome, Rome, Italy
Bardh Prenkaj
University of L’Aquila, L’Aquila, Italy
Giovanni Stilo

Authors

Hamed Sarvari
View author publications
You can also search for this author in PubMed Google Scholar
Carlotta Domeniconi
View author publications
You can also search for this author in PubMed Google Scholar
Bardh Prenkaj
View author publications
You can also search for this author in PubMed Google Scholar
Giovanni Stilo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giovanni Stilo .

Editor information

Editors and Affiliations

IIIT, Hyderabad, Hyderabad, India
Kamal Karlapalem
Chinese University of Hong Kong, Shatin, Hong Kong
Hong Cheng
Virginia Tech, Arlington, VA, USA
Naren Ramakrishnan
Jawaharlal Nehru University, New Delhi, India
R. K. Agrawal
IIIT Hyderabad, Hyderabad, India
P. Krishna Reddy
University of Minnesota, Minneapolis, MN, USA
Jaideep Srivastava
IIIT Delhi, New Delhi, India
Tanmoy Chakraborty

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sarvari, H., Domeniconi, C., Prenkaj, B., Stilo, G. (2021). Unsupervised Boosting-Based Autoencoder Ensembles for Outlier Detection. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12712. Springer, Cham. https://doi.org/10.1007/978-3-030-75762-5_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-75762-5_8
Published: 09 May 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75761-8
Online ISBN: 978-3-030-75762-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics