Skip to main content

A Fast Outlier Detection Algorithm for Big Datasets

  • Conference paper
  • First Online:
Recent Advances in Information and Communication Technology 2016

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 463))

Abstract

This paper proposes a fast outlier detection algorithm for big datasets, which is a combination of a Cell-based method and a rank-difference outlier detection method associated with a new weighted distance definition. Firstly, a Cell-based method is used to transform a dataset having a very large number of objects into a significant small set of weighted cells based on predefined lower bound and upper bound sizes. A weighted distance function is defined to measure distances between two cells based on their coordinates and weights. Then, a rank-based outlier detection method with different depths is used to calculate outlier scores of cells. Finally, cells are ranked based on scores, outlier objects are identified from ranked cells and eliminated from the provided dataset. Based on experiment results, this proposed method is appropriate for datasets that have a very large number of objects.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chandola, V., Banerjee, A., Kumar, V.: Anomaly detection: a survey. ACM Comput. Surveys 41, 15:1–15:58 (2009)

    Google Scholar 

  2. Hawkins, D. M.: Introduction. In: Hawkins, D.M. (ed.) Identification of Outliers, pp. 1–9. Chapman & Hall (1980)

    Google Scholar 

  3. Aggarwal, C.C.: Outlier Analysis. In: Aggarwal, C.C. (ed.) Data Mining, pp. 237–263. Springer International Publishing Switzerland (2015)

    Google Scholar 

  4. Hodge, V.J.: Outlier detection in Big Data. In: Wang, J. (ed.) Encyclopedia of Business Analytics and Optimization, vol. 5, pp. 1762–1771. IGI Global (2014)

    Google Scholar 

  5. Bhattacharya, G., Ghosh, K., Chowdhury, A.S.: Outlier detection using neighborhood rank difference. Pattern Recogn. Lett. 60–61, 24–31 (2015)

    Article  Google Scholar 

  6. Shaikh, S., Kitagawa, H.: Top-k outlier detection from uncertain data. Int. J. Autom. Comput. 11, 128–142 (2014)

    Article  Google Scholar 

  7. Breunig, M.M., Kriegel, H.P., Raymond, T.: Ng, and Sander, J.: LOF: identifying density-based local outliers. ACM. SIGMOD Record 29, 93–104 (2000)

    Article  Google Scholar 

  8. Tang, J., Chen, Z., Fu, A.W.C., Cheung, D.W.L.: Enhancing effectiveness of outlier detections for low density patterns. In: Proceedings of the 6th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining (2002)

    Google Scholar 

  9. Jin, W., Tung, A.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.K., Kitsuregawa, M., Li, J., Chang, K. (eds.) Advances in Knowledge Discovery and Data Mining, vol. 3918, pp. 577–593. Springer, Berlin (2006)

    Chapter  Google Scholar 

  10. Huang, H., Mehrotraa, K., Mohana, C.K.: Rank-based outlier detection. J. Stat. Comput. Simul. 83, 518–531 (2013)

    Article  MathSciNet  Google Scholar 

  11. Huang, H., Mehrotra, K., Mohan, C.: Algorithms for detecting outliers via clustering and ranks. In: Jiang, H., Ding, W., Ali, M., Wu, X. (eds.) Advanced Research in Applied Artificial Intelligence, vol. 7345, pp. 20–29. Springer, Berlin (2012)

    Chapter  Google Scholar 

  12. Ha, J., Seok, S., Lee, J.S.: A precise ranking method for outlier detection. Inf. Sci. 324, 88–107 (2015)

    Article  MathSciNet  Google Scholar 

  13. Hodge, V.J.: Outlier Detection in Big Data. In: Wang, J. (ed.) Encyclopedia of Business Analytics and Optimization, vol. 5, pp. 1762–1771. Business Science Reference, Hershey (2014)

    Chapter  Google Scholar 

  14. Hieu, D.V., Meesad, P.: A Cell-MST-Based method for big dataset clustering on limited memory computers. In: 7th International Conference on Information Technology and Electrical Engineering, pp. 632–637. Chiang Mai, Thailand (2015)

    Google Scholar 

  15. Yuan, J., Zheng, Y., Xie, X. Sun, G.: Driving with knowledge from the physical world. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining. San Diego, California, USA (2011)

    Google Scholar 

  16. Lichman, M.: Machine Learning Repository. http://archive.ics.uci.edu/ml/datasets.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Duong van Hieu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

van Hieu, D., Meesad, P. (2016). A Fast Outlier Detection Algorithm for Big Datasets. In: Meesad, P., Boonkrong, S., Unger, H. (eds) Recent Advances in Information and Communication Technology 2016. Advances in Intelligent Systems and Computing, vol 463. Springer, Cham. https://doi.org/10.1007/978-3-319-40415-8_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-40415-8_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-40414-1

  • Online ISBN: 978-3-319-40415-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics