A Novel Density-Based Clustering Approach for Outlier Detection in High-Dimensional Data

Messaoud, Thouraya Aouled; Smiti, Abir; Louati, Aymen

doi:10.1007/978-3-030-29859-3_28

A Novel Density-Based Clustering Approach for Outlier Detection in High-Dimensional Data

Thouraya Aouled Messaoud¹³,
Abir Smiti¹⁴ &
Aymen Louati¹³

Conference paper
First Online: 26 August 2019

1390 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11734))

Abstract

Outlier detection is a primary aspect in data-mining and machine learning applications, also known as outlier mining. The importance of outlier detection in medical data came from the fact that outliers may carry some precious information however outlier detection can show very bad performance in the presence of high dimensional data. In this paper, a new outlier detection technique is proposed based on a feature selection strategy to avoid the curse of dimensionality, named Infinite Feature Selection DBSCAN. The main purpose of our proposed method is to reduce the dimensions of a high dimensional data set in order to efficiently identify outliers using clustering techniques. Simulations on real databases proved the effectiveness of our method taking into account the accuracy, the error-rate, F-score and the retrieval time of the algorithm.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Laurikkala, J., Juhola, M., Kentala, E., Lavrac, N., Miksch, S., Kavsek, B.: Informal identification of outliers in medical data. In: Fifth International Workshop on Intelligent Data Analysis in Medicine and Pharmacology, vol. 1, pp. 20–24 (2000)
Google Scholar
Goldstein, M., Dengel, A.: Histogram-based outlier score (HBOS): a fast unsupervised anomaly detection algorithm. In: KI-2012: Poster and Demo Track, pp. 59–63 (2012)
Google Scholar
Kriegel, H.-P., Zimek, A., et al.: Angle-based outlier detection in high-dimensional data. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 444–452. ACM (2008)
Google Scholar
Zhang, K., Hutter, M., Jin, H.: A new local distance-based outlier detection approach for scattered real-world data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 813–822. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_84
Chapter Google Scholar
Breunig, M.M., Kriegel, H.-P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: ACM SIGMOD Record, vol. 29, pp. 93–104. ACM (2000)
Google Scholar
Jin, W., Tung, A.K.H., Han, J., Wang, W.: Ranking outliers using symmetric neighborhood relationship. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 577–593. Springer, Heidelberg (2006). https://doi.org/10.1007/11731139_68
Chapter Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xiaowei, X., et al.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: KDD, vol. 96, pp. 226–231 (1996)
Google Scholar
Xianting, Q., Pan, W.: A density-based clustering algorithm for high-dimensional data with feature selection. In: 2016 International Conference on Industrial Informatics-Computing Technology, Intelligent Technology, Industrial Information Integration(ICIICII), pp. 114–118. IEEE (2016)
Google Scholar
Huang, J., Zhu, Q., Yang, L., Cheng, D.D., Quanwang, W.: A novel outlier cluster detection algorithm without top-n parameter. Knowl. Based Syst. 121, 32–40 (2017)
Article Google Scholar
Smiti, A., Elouedi, Z.: COID: maintaining case method based on clustering, outliers and internal detection. In: Lee, R., Ma, J., Bacon, L., Du, W., Petridis, M. (eds.) Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing 2010. SCI, vol. 295, pp. 39–52. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-13265-0_4
Chapter Google Scholar
Smiti, A., Elouedi, Z.: WCOID: maintaining case-based reasoning systems using weighting, clustering, outliers and internal cases detection. In: International Conference on Intelligent Systems Design and Applications (ISDA), pp. 356–361. IEEE Computer Society (2011)
Google Scholar
UCI machine learning repository. https://archive.ics.uci.edu/ml/index.php/
Roffo, G., Melzi, S., Cristani, M.: Infinite feature selection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4202–4210 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Institut Supérieur d’Informatique du Kef, Université de Jendouba, Jendouba, Tunisie
Thouraya Aouled Messaoud & Aymen Louati
LARODEC, Institut Supérieur de Gestion de Tunis, Tunis, Tunisie
Abir Smiti

Authors

Thouraya Aouled Messaoud
View author publications
You can also search for this author in PubMed Google Scholar
Abir Smiti
View author publications
You can also search for this author in PubMed Google Scholar
Aymen Louati
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Thouraya Aouled Messaoud or Abir Smiti .

Editor information

Editors and Affiliations

University of León, León, Spain
Hilde Pérez García
University of León, León, Spain
Lidia Sánchez González
University of León, León, Spain
Manuel Castejón Limas
University of A Coruña, Ferrol, Spain
Héctor Quintián Pardo
University of Salamanca, Salamanca, Spain
Emilio Corchado Rodríguez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Messaoud, T.A., Smiti, A., Louati, A. (2019). A Novel Density-Based Clustering Approach for Outlier Detection in High-Dimensional Data. In: Pérez García, H., Sánchez González, L., Castejón Limas, M., Quintián Pardo, H., Corchado Rodríguez, E. (eds) Hybrid Artificial Intelligent Systems. HAIS 2019. Lecture Notes in Computer Science(), vol 11734. Springer, Cham. https://doi.org/10.1007/978-3-030-29859-3_28

Download citation

DOI: https://doi.org/10.1007/978-3-030-29859-3_28
Published: 26 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29858-6
Online ISBN: 978-3-030-29859-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics