Abstract
With the rapid development of information technology, the structure of data resources is becoming more and more complex, and outlier mining is attracting more and more attention. Based on Gaussian kernel function, this paper considers three neighbors: k nearest neighbors, reverse k neighbors and shared nearest neighbors. A local outlier detection algorithm based on Gaussian kernel function is proposed. Firstly, the algorithm stores the nearest neighbors of each data object through kNN maps, including k-nearest neighbors, reverse k-nearest neighbors, and shared nearest neighbors, forming a kernel neighbor set S. Secondly, Estimating density of data objects through kernel density estimation KDE method. Finally, the relative density outlier factor RDOF is used to estimate the degree of data objects deviating from the neighborhood, and then determines whether the data objects are outliers, and the validity of the algorithm is proved on the real and synthetic data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, C.: Outlier Analysis, pp. 75–99. Springer, Germany (2015). https://doi.org/10.1007/978-1-4614-6396-2_3
Braun, T.D., Siegal, H.J., Beck, N., et al.: A comparison study of static mapping heuristics for a class of meta-tasks on heterogeneous computing systems. In: Eighth Heterogeneous Computing Workshop. IEEE Computer Society (1999)
Han, J., Kamber, M., Pei, J.: Data Mining: Concepts and Techniques. Morgan Kaufmann, Burlington (2006). 5(4), 1–18
Pham, N., Pagh, R.: A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data. In: ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM (2012)
Tang, J., Chen, Z., Fu, A.W., Cheung, D.W.: Enhancing effectiveness of outlier detections for low density patterns. In: Chen, M.S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 535–548. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-47887-6_53
Qian, X.Z., Deng, J., Qian, H., et al.: An efficient density biased sampling algorithm for clustering large high-dimensional datasets. Int. J. Pattern Recognit Artif Intell. 29(08), 1550026 (2015)
Han, J.W., Micheline, K.: Data Mining: Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers, San Francisco (2006)
Muller, E., Sanchez, P.I., Mulle, Y., et al.: Ranking outlier nodes in subspaces of attributed graphs (2013)
Hoeting, J., Raftery, A.E., Madigan, D.: A method for simultaneous variable selection and outlier identification in linear regression. Comput. Stat. Data Anal. 54(12), 3181–3193 (1996)
Knorr, E.M., Tucakov, V., Tucakov, V.: Distance-based outliers: algorithms and applications. VLDB J.—Int. J. Very Large Data Bases 8, 237–253 (2000)
Zhang, H., Wu, Q., Pu, J.: A novel fuzzy kernel clustering algorithm for outlier detection. In: International Conference on Mechatronics & Automation. IEEE (2007)
Pamula, R., Deka, J.K., Nandi, S.: An Outlier Detection Method Based on Clustering (2011)
Nguyen, H.V., Müller, E., Vreeken, J., et al.: CMI: an information-theoretic contrast measure for enhancing subspace cluster and outlier detection. In: SDM, pp. 198–206 (2013)
Zhou, S., Zhao, Y., Guan, J., Huang, J.: A neighborhood-based clustering algorithm. In: Ho, T.B., Cheung, D., Liu, H. (eds.) PAKDD 2005. LNCS (LNAI), vol. 3518, pp. 361–371. Springer, Heidelberg (2005). https://doi.org/10.1007/11430919_43
Wu, S., Wang, S.: Information-theoretic outlier detection for large-scale categorical data. IEEE Trans. Knowl. Data Eng. 25(3), 589–602 (2013)
Sun, P., Chawla, S., Arunasalam, B.: Mining for outliers in sequential databases. In: Proceedings of the Sixth SIAM International Conference on Data Mining, Bethesda, pp. 94–105 (2006)
Lazarus, D., Weinkauf, M., Diver, P.: Pacman profiling: a simple procedure to identify stratigraphic outliers in high-density deep-sea microfossil data. Paleobiology 38(1), 144–161 (2012)
Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 813–822. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_84
Bache, K., Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu.html
Hettich, S., Bay, S., Musster, K., Winner, J.: KDD CUP (1999). http://kdd.isc.uci.edu/databases/kddcpu99/kddcpu99.html. Accessed 01 Sept 2011
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Zhang, Z., Liu, J., Miao, C. (2019). Local Outlier Detection Algorithm Based on Gaussian Kernel Density Function. In: Peng, H., Deng, C., Wu, Z., Liu, Y. (eds) Computational Intelligence and Intelligent Systems. ISICA 2018. Communications in Computer and Information Science, vol 986. Springer, Singapore. https://doi.org/10.1007/978-981-13-6473-0_29
Download citation
DOI: https://doi.org/10.1007/978-981-13-6473-0_29
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-6472-3
Online ISBN: 978-981-13-6473-0
eBook Packages: Computer ScienceComputer Science (R0)