Skip to main content

Unsupervised Boosting-Based Autoencoder Ensembles for Outlier Detection

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12712))

Included in the following conference series:

Abstract

Autoencoders have been recently applied to outlier detection. However, neural networks are known to be vulnerable to overfitting, and therefore have limited potential in the unsupervised outlier detection setting. The majority of existing deep learning methods for anomaly detection is sensitive to contamination of the training data to anomalous instances. To overcome the aforementioned limitations we develop a Boosting-based Autoencoder Ensemble approach (BAE). BAE is an unsupervised ensemble method that, similarly to boosting, builds an adaptive cascade of autoencoders to achieve improved and robust results. BAE trains the autoencoder components sequentially by performing a weighted sampling of the data, aimed at reducing the amount of outliers used during training, and at injecting diversity in the ensemble. We perform extensive experiments and show that the proposed methodology outperforms state-of-the-art approaches under a variety of conditions.

Giovanni Stilo—His work is partially supported by Territori Aperti a project funded by Fondo Territori Lavoro e Conoscenza CGIL CISL UIL and by SoBigData-PlusPlus H2020-INFRAIA-2019-1 EU project, contract number 871042.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We use two colors—black and white—to highlight the sampling of outliers. The inliers are represented without a color variation.

  2. 2.

    The code of BAE is available at https://gitlab.com/bardhp95/bae.

  3. 3.

    By \(2.5\%\) on average.

  4. 4.

    https://elki-project.github.io/.

  5. 5.

    https://pyod.readthedocs.io/.

  6. 6.

    https://archive.ics.uci.edu/ml/datasets.php.

  7. 7.

    Data from: https://www.dbs.ifi.lmu.de/research/outlier-evaluation/DAMI/.

  8. 8.

    https://sci2s.ugr.es/keel/imbalanced.php.

  9. 9.

    Used data available at: https://figshare.com/articles/dataset/MNIST_dataset_for_Outliers_Detection_-_MNIST4OD_/9954986.

  10. 10.

    The ELKI project implementation of HiCS produces a warning message and the execution does not reach a converging state.

References

  1. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: Lof: identifying density-based local outliers. In: ACM Sigmod Record, vol. 29, pp. 93–104. ACM (2000)

    Google Scholar 

  2. Campos, G.O., Zimek, A., Meira, W.: An unsupervised boosting strategy for outlier detection ensembles. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 564–576. Springer (2018)

    Google Scholar 

  3. Campos, G.O., Zimek, A., Sander, J., Campello, R.J., Micenková, B., Schubert, E., Assent, I., Houle, M.E.: On the evaluation of unsupervised outlier detection: measures, datasets, and an empirical study. Data Min. Knowl. Disc. 30(4), 891–927 (2016)

    Article  MathSciNet  Google Scholar 

  4. Chalapathy, R., Chawla, S.: Deep learning for anomaly detection: a survey. arXiv:1901.03407 (2019)

  5. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surv. 41(3) (2009)

    Google Scholar 

  6. Chen, J., Sathe, S., Aggarwal, C., Turaga, D.: Outlier detection with autoencoder ensembles. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 90–98. SIAM (2017)

    Google Scholar 

  7. Hawkins, S., He, H., Williams, G., Baxter, R.: Outlier detection using replicator neural networks. In: International Conference on Data Warehousing and Knowledge Discovery, pp. 170–180. Springer (2002)

    Google Scholar 

  8. Keller, F., Muller, E., Bohm, K.: Hics: high contrast subspaces for density-based outlier ranking. In: 2012 IEEE 28th international conference on data engineering, pp. 1037–1048. IEEE (2012)

    Google Scholar 

  9. Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE (2008)

    Google Scholar 

  10. Liu, Y., Li, Z., Zhou, C., Jiang, Y., Sun, J., Wang, M., He, X.: Generative adversarial active learning for unsupervised outlier detection. IEEE Trans. Knowl. Data Eng. 38(8), 1517–1528 (2019)

    Google Scholar 

  11. Ramaswamy, S., Rastogi, R., Shim, K.: Efficient algorithms for mining outliers from large data sets. In: ACM Sigmod Record, vol. 29, pp. 427–438. ACM (2000)

    Google Scholar 

  12. Rayana, S., Akoglu, L.: Less is more: Building selective anomaly ensembles with application to event detection in temporal graphs. In: Proceedings of the 2015 SIAM International Conference on Data Mining, pp. 622–630. SIAM (2015)

    Google Scholar 

  13. Rayana, S., Zhong, W., Akoglu, L.: Sequential ensemble learning for outlier detection: a bias-variance perspective. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 1167–1172. IEEE (2016)

    Google Scholar 

  14. Saito, T., Rehmsmeier, M.: The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PloS one 10(3) (2015)

    Google Scholar 

  15. Sarvari, H., Domeniconi, C., Stilo, G.: Graph-based selective outlier ensembles. In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, pp. 518–525. ACM (2019)

    Google Scholar 

  16. Schölkopf, B., Platt, J.C., Shawe-Taylor, J., Smola, A.J., Williamson, R.C.: Estimating the support of a high-dimensional distribution. Neural comput. 13(7) (2001)

    Google Scholar 

  17. Schubert, R., Wojdanowski, R., Zimek, A., Kriegel, H.P.: On evaluation of outlier rankings and outlier scores. In: Proceedings of the SIAM International Conference on Data Mining, pp. 1047–1058. SIAM (2012)

    Google Scholar 

  18. Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining. pp. 813–822. Springer (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giovanni Stilo .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sarvari, H., Domeniconi, C., Prenkaj, B., Stilo, G. (2021). Unsupervised Boosting-Based Autoencoder Ensembles for Outlier Detection. In: Karlapalem, K., et al. Advances in Knowledge Discovery and Data Mining. PAKDD 2021. Lecture Notes in Computer Science(), vol 12712. Springer, Cham. https://doi.org/10.1007/978-3-030-75762-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-75762-5_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-75761-8

  • Online ISBN: 978-3-030-75762-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics