A Fast Outlier Detection Method for Big Data
Outlier in simulation can help people to know the defect of simulation system. With the rapid expansion of data scale, conventional outlier detection methods begin to have trouble dealing with large datasets. In this paper, we propose an Entropy based Fast Detection (EFD) algorithm which incorporates the new ideas in handling big data. The algorithm takes the information entropy measure as the core, with attribute frequency value as the auxiliary. By means of rapid computation of decreased entropy, the outliers can be got quickly. The results show that EFD algorithm can detect the outliers in high efficiency without obvious loss of accuracy.
KeywordsOutlier Information Entropy Big data
Unable to display preview. Download preview PDF.
- 2.Mayer-Sch Nberger, V.C.K.: Big data: a revolution that will transform how we live, work, and think. Houghton Mifflin Harcourt, Boston (2013)Google Scholar
- 3.Barnett, V., Lewis, T.: Outliers in statistical data (3rd edition). J. Oper Res. Soc. 46, 1034 (1995)Google Scholar
- 4.Angiulli, F., Fassetti, F.: DOLPHIN: An efficient algorithm for mining distance-based outliers in very large datasets. ACM Transactions on Knowledge Discovery from Data 3 (2009)Google Scholar
- 5.Breunig, M.M., Kriegel, H., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: 2000 ACM SIGMOD - International Conference on Management of Data, Dallas, TX, United states, vol. 29, pp. 93–104 (2000)Google Scholar
- 6.Rajaraman, A.U.J.D.: Mining of massive datasets. Cambridge University Press, Cambridge (2012)Google Scholar
- 7.Zengyou, H., Shengchun, D., Xiaofei, X., Huang, J.Z.: A fast greedy algorithm for outlier mining. Applications of Evolutionary Computing. In: Proceedings of the EvoWorkshops 2006: EvoBIO, EvoCOMNET, EvoHOT EvoIASP, EvoINTERACTION, EvoMUSART, and EvoSTOC. LNCS, vol. 3907, pp. 567–576 (2006)Google Scholar
- 9.Koufakou, A., Ortiz, E.G., Georgiopoulos, M., Anagnostopoulos, G.C., Reynolds, K.M.: A scalable and efficient outlier detection strategy for categorical data. In: 19th IEEE International Conference on Tools with Artificial Intelligence, ICTAI 2007, October 29-31, vol. 2, pp. 210–217. IEEE Computer Society, Patras (2007)CrossRefGoogle Scholar