Skip to main content

Bias Reduction in Outlier Ensembles: The Guessing Game

  • Chapter
  • First Online:
Outlier Ensembles

Abstract

Bias reduction is a difficult problem in unsupervised problem like outlier detection. The main reason is that bias-reduction algorithms often require a quantification of error in intermediate steps of the algorithm. An example of such a bias reduction algorithm from classification is referred to as “boosting”. In boosting, the outputs of highly biased detectors are used to learn portions of the decision space in which the bias performance affects the algorithm in a negative way.

Informed decision-making comes from a long tradition of guessing and then blaming others for inadequate results.

Scott Adams

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 84.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. C. C. Aggarwal. Outlier Ensembles: Position Paper, ACM SIGKDD Explorations, 14(2), pp. 49–58, December, 2012.

    Google Scholar 

  2. C. C. Aggarwal. Active Learning: A Survey. Data Classification: Algorithms and Applications, CRC Press, 2014.

    Google Scholar 

  3. C. C. Aggarwal Data Mining: The Textbook, Springer, 2015.

    Google Scholar 

  4. C. C. Aggarwal. Outlier Analysis, Second Edition, Springer, 2017.

    Google Scholar 

  5. C. C. Aggarwal and S. Sathe. Theoretical Foundations and Algorithms for Outlier Ensembles, ACM SIGKDD Explorations, 17(1), June 2015.

    Google Scholar 

  6. C. C. Aggarwal and P. S. Yu. Outlier Detection in High Dimensional Data, ACM SIGMOD Conference, 2001.

    Google Scholar 

  7. D. Barbara, Y. Li, J. Couto, J.-L. Lin, and S. Jajodia. Bootstrapping a Data Mining Intrusion Detection System. Symposium on Applied Computing, 2003.

    Google Scholar 

  8. Y. Bengio, A. Courville, and P. Vincent. Representation learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(8), pp. 1798–1828, 2013.

    Google Scholar 

  9. C. Brodley and M. Friedl. Identifying Mislabeled Training Data. Journal of Artificial Intelligence Research, pp. 131–167, 1999.

    Google Scholar 

  10. C. Campbell, and K. P. Bennett. A Linear-Programming Approach to Novel Class Detection. Advances in Neural Information Processing Systems, 2000.

    Google Scholar 

  11. N. Chawla, A. Lazarevic, L. Hall, and K. Bowyer. SMOTEBoost: Improving prediction of the minority class in boosting, PKDD, pp. 107–119, 2003.

    Google Scholar 

  12. P. Domingos. Bayesian Averaging of Classifiers and the Overfitting Problem. ICML Conference, 2000.

    Google Scholar 

  13. C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank aggregation methods for the Web. WWW Conference, 2001.

    Google Scholar 

  14. A. Emmott, S. Das, T. Dietterich, A. Fern, and W. Wong. Systematic Construction of Anomaly Detection Benchmarks from Real Data. arXiv:1503.01158, 2015. https://arxiv.org/abs/1503.01158

  15. Y. Freund and R. Schapire. A Decision-theoretic Generalization of Online Learning and Application to Boosting, Computational Learning Theory, 1995.

    Google Scholar 

  16. Y. Freund and R. Schapire. Experiments with a New Boosting Algorithm. ICML Conference, pp. 148–156, 1996.

    Google Scholar 

  17. J. Gao, P.-N. Tan. Converting output scores from outlier detection algorithms into probability estimates. ICDM Conference, 2006.

    Google Scholar 

  18. https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume11/opitz99a-html/node14.html

  19. J. Hoeting, D. Madigan, A. Raftery, and C. Volinsky. Bayesian Model Averaging: A Tutorial. Statistical Science, 14(4), pp. 382–401, 1999.

    Google Scholar 

  20. G. John. Robust Decision Trees: Removing Outliers from Data. KDD Conference, pp. 174–179, 1995.

    Google Scholar 

  21. M. Joshi, V. Kumar, and R. Agarwal. Evaluating Boosting Algorithms to Classify Rare Classes: Comparison and Improvements. ICDM Conference, pp. 257–264, 2001.

    Google Scholar 

  22. F. Keller, E. Muller, K. Bohm. HiCS: High-Contrast Subspaces for Density-based Outlier Ranking, IEEE ICDE Conference, 2012.

    Google Scholar 

  23. J. Kemeny. Mathematics without numbers. Daedalus, pp. 577591, 1959.

    Google Scholar 

  24. R. Kolde, S. Laur, P. Adler, and J. Vilo. Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics, 28(4), pp. 573–580, 2012.

    Google Scholar 

  25. A. Lazarevic, and V. Kumar. Feature Bagging for Outlier Detection, ACM KDD Conference, 2005.

    Google Scholar 

  26. L. M. Manevitz and M. Yousef. One-class SVMs for Document Classification. Journal of Machine Learning Research, 2: pp, 139–154, 2001.

    Google Scholar 

  27. B. Micenkova, B. McWiliams, and I. Assent. Learning Outlier Ensembles: The Best of Both Worlds – Supervised and Unsupervised. Outlier Detection and Description Workshop, 2014. Extended version: http://arxiv.org/pdf/1507.08104v1.pdf

  28. E. Muller, M. Schiffer, and T. Seidl. Statistical Selection of Relevant Subspace Projections for Outlier Ranking. ICDE Conference, pp, 434–445, 2011.

    Google Scholar 

  29. M. Perrone and L. Cooper. When Networks Disagree: Ensemble Method for Neural networks. Artifical Neural Networks for Speech and Vision, Chapman and Hall, pp. 126–142, 1993.

    Google Scholar 

  30. G. Ratsch, S. Mika, B. Scholkopf, K. Muller. Constructing boosting algorithms from SVMs: an application to one-class classification. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(9), pp. 1184–1199, 2002.

    Google Scholar 

  31. S. Rayana, L. Akoglu. Less is More: Building Selective Anomaly Ensembles with Application to Event Detection in Temporal Graphs. SDM Conference, 2015.

    Google Scholar 

  32. S. Rayana, L. Akoglu. Less is More: Building Selective Anomaly Ensembles. ACM Transactions on Knowledge Discovery and Data Mining, 10(4), 42, 2016.

    Google Scholar 

  33. S. Rayana, W. Zhong, and L. Akoglu. Sequential Ensemble Learning for Outlier Detection: A Bias-Variance Perspective. IEEE ICDM Conference, 2016.

    Google Scholar 

  34. L. Rokach. Pattern classification using ensemble methods, World Scientific Publishing Company, 2010.

    Google Scholar 

  35. G. Seni and J. Elder. Ensemble Methods in Data Mining: Improving Accuracy through Combining Predictions, Synthesis Lectures in Data Mining and Knowledge Discovery, Morgan and Claypool, 2010.

    Google Scholar 

  36. M. Salehi, C. Leckie, M. Moshtaghi, and T. Vaithianathan. A Relevance Weighted Ensemble Model for Anomaly Detection in Switching Data Streams. Advances in Knowledge Discovery and Data Mining, pp. 461–473, 2014.

    Google Scholar 

  37. M. Salehi, X. Zhang, J. Bezdek, and C. Leckie. Smart Sampling: A Novel Unsupervised Boosting Approach for Outlier Detection. Australasian Joint Conference on Artificial Intelligence, Springer, pp. 469–481, 2016. http://rd.springer.com/book/10.1007/978-3-319-50127-7

  38. S. Weisberg. Applied Linear Regression. John Wiley and Sons, 1985.

    Google Scholar 

  39. D. Wilson. Asymptotic Properties of Nearest-Neighbor Rules using Edited Data. Man and Cybernetics, 2, pp. 408–421, 1972.

    Google Scholar 

  40. D. Wolpert. Stacked Generalization, Neural Networks, 5(2), pp. 241–259, 1992.

    Google Scholar 

  41. H. Xu, C. Caramanis, and S. Sanghavi. Robust PCA via Outlier Pursuit. Advances in Neural Information Processing Systems, pp. 2496–2504, 2010.

    Google Scholar 

  42. Z.-H. Zhou. Ensemble Methods: Foundations and Algorithms. Chapman and Hall/CRC Press, 2012.

    Google Scholar 

  43. Z.-H. Zhou, J. Wu, and W. Tang. Ensembling Neural Networks: Many could be Better than All. Artificial Intelligence, 137(1), pp. 239–263, 2002.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charu C. Aggarwal .

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Aggarwal, C.C., Sathe, S. (2017). Bias Reduction in Outlier Ensembles: The Guessing Game. In: Outlier Ensembles. Springer, Cham. https://doi.org/10.1007/978-3-319-54765-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54765-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54764-0

  • Online ISBN: 978-3-319-54765-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics