Advertisement

A Fast Outlier Detection Method for Big Data

  • Boyuan Liu
  • Wenhui Fan
  • Tianyuan Xiao
Part of the Communications in Computer and Information Science book series (CCIS, volume 402)

Abstract

Outlier in simulation can help people to know the defect of simulation system. With the rapid expansion of data scale, conventional outlier detection methods begin to have trouble dealing with large datasets. In this paper, we propose an Entropy based Fast Detection (EFD) algorithm which incorporates the new ideas in handling big data. The algorithm takes the information entropy measure as the core, with attribute frequency value as the auxiliary. By means of rapid computation of decreased entropy, the outliers can be got quickly. The results show that EFD algorithm can detect the outliers in high efficiency without obvious loss of accuracy.

Keywords

Outlier Information Entropy Big data 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Hawkins, D.M.: Identification of outliers. Chapman and Hall, London (1980)CrossRefzbMATHGoogle Scholar
  2. 2.
    Mayer-Sch Nberger, V.C.K.: Big data: a revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, Boston (2013)Google Scholar
  3. 3.
    Barnett, V., Lewis, T.: Outliers in statistical data (3rd edition). J. Oper Res. Soc. 46, 1034 (1995)Google Scholar
  4. 4.
    Angiulli, F., Fassetti, F.: DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets. ACM Transactions on Knowledge Discovery from Data 3 (2009)Google Scholar
  5. 5.
    Breunig, M.M., Kriegel, H., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: 2000 ACM SIGMOD - International Conference on Management of Data, Dallas, TX, United states, vol. 29, pp. 93–104 (2000)Google Scholar
  6. 6.
    Rajaraman, A.U.J.D.: Mining of massive datasets. Cambridge University Press, Cambridge (2012)Google Scholar
  7. 7.
    Zengyou, H., Shengchun, D., Xiaofei, X., Huang, J.Z.: A fast greedy algorithm for outlier mining. Applications of Evolutionary Computing. In: Proceedings of the EvoWorkshops 2006: EvoBIO, EvoCOMNET, EvoHOT EvoIASP, EvoINTERACTION, EvoMUSART, and EvoSTOC. LNCS, vol. 3907, pp. 567–576 (2006)Google Scholar
  8. 8.
    He, Z., Deng, S., Xu, X.: An optimization model for outlier detection in categorical data. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3644, pp. 400–409. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  9. 9.
    Koufakou, A., Ortiz, E.G., Georgiopoulos, M., Anagnostopoulos, G.C., Reynolds, K.M.: A scalable and efficient outlier detection strategy for categorical data. In: 19th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2007, October 29-31, vol. 2, pp. 210–217. IEEE Computer Society, Patras (2007)CrossRefGoogle Scholar
  10. 10.
    Hawkins, S., He, H., Williams, G.J., Baxter, R.A.: Outlier detection using replicator neural networks. In: Kambayashi, Y., Winiwarter, W., Arikawa, M. (eds.) DaWaK 2002. LNCS, vol. 2454, pp. 170–180. Springer, Heidelberg (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Boyuan Liu
    • 1
  • Wenhui Fan
    • 1
  • Tianyuan Xiao
    • 1
  1. 1.State CIMS Engineering Research CenterTsinghua UniversityBeijingChina

Personalised recommendations