Initial Centroid Selection Optimization for K-Means with Genetic Algorithm to Enhance Clustering of Transcribed Arabic Broadcast News Documents

Maghawry, Ahmed Mohamed; Omar, Yasser; Badr, Amr

doi:10.1007/978-3-319-67621-0_8

Ahmed Mohamed Maghawry¹⁷,
Yasser Omar¹⁷ &
Amr Badr¹⁸

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 662))

Included in the following conference series:

Proceedings of the Computational Methods in Systems and Software

1076 Accesses
2 Citations

Abstract

In this research a collection of artificial intelligence techniques are combined together to optimize the process of clustering textual transcripts obtained from audio sources. Since clustering techniques have drawbacks that if not taken care of will produce sub optimal clustering solutions, it’s essential to attempt to optimize the clustering algorithms to avoid sub optimal solutions. As an attempt to overcome this problem, different artificial intelligence techniques are applied to avoid clustering problems. The main objectives of this research is to optimize automatic topic clustering of transcribed speech documents, and investigate the impact of applying genetic algorithm optimization and initial centroid selection optimization (ICSO) in combination with K-means clustering algorithm using Chi-Square similarity measure on the accuracy and the sum of square distances (SSD) of the selected clustering algorithm. The evaluation showed that using ICSO with genetic algorithm and K-means clustering algorithm with Chi-square similarity measure achieved the highest accuracy with the least SSD.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

8. References

Wold, E., Blum, T., Keislar, D., Wheaten, J.: Content-based classification, search, and retrieval of audio. IEEE Multimed. 3, 27–36 (1996). doi:10.1109/93.556537
Article Google Scholar
Li, D., Sethi, I.K., Dimitrova, N., Mcgee, T.: Classification of general audio data for content-based retrieval. Pattern Recognit. Lett. 22, 533–544 (2001). doi:10.1016/s0167-8655(00)00119-7
Article MATH Google Scholar
Coden, A., Brown, E.: Speech transcript analysis for automatic search. In: Proceedings of the 34th Annual Hawaii International Conference on System Sciences. doi:10.1109/hicss.2001.926473
Ibrahimov, O.V., Sethi, I.K., Dimitrova, N.: Data mining and knowledge discovery: theory. Tools Technol. IV (2002). doi:10.1117/12.460239
Google Scholar
A comparison of document clustering algorithms. In: Proceedings of the 5th International Workshop on Pattern Recognition in Information Systems (2005). doi:10.5220/0002557501860191
Hayes-Roth, F.: Review of “Adaptation in Natural and Artificial Systems by John H. Holland”, The U. of Michigan Press, 1975. ACM SIGART Bull. 53, 15 (1975). doi:10.1145/1216504.1216510
Article Google Scholar
Genetic algorithms in search, optimization, and machine learning. Choice Rev. Online (1989). doi:10.5860/choice.27-0936
Nazeer, K.A.A., Sebastian, M.P., Kumar, S.D.M.: A heuristic k-means algorithm with better accuracy and efficiency for clustering health informatics data. J. Med. Imaging Health Inform. 1, 66–71 (2011). doi:10.1166/jmihi.2011.1010
Article Google Scholar
Banerjee, A., Louis, S.J.: A recursive clustering methodology using a genetic algorithm. In: 2007 IEEE Congress on Evolutionary Computation (2007). doi:10.1109/cec.2007.4424740
Sun, H.-J., Xiong, L.-H.: Genetic algorithm-based high-dimensional data clustering technique. In: 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery (2009). doi:10.1109/fskd.2009.215
Jian-Xiang, W., Huai, L., Yue-Hong, S., Xin-Ning, S.: Application of genetic algorithm in document clustering. In: 2009 International Conference on Information Technology and Computer Science (2009). doi:10.1109/itcs.2009.269
Jafar, A.A., Fakhr, M.W., Farouk, M.H.: Clustering-based topic identification of transcribed Arabic broadcast news. In: New Trends in Networking, Computing, E-learning, Systems Sciences, and Engineering. Lecture Notes in Electrical Engineering, pp. 253–260 (2014). doi:10.1007/978-3-319-06764-3_32
Speech to Text Online Notepad. Free. In: Speechnotes. https://speechnotes.co/#app. Accessed 12 May 2017
Morissette, L., Chartier, S.: The k-means clustering technique: general considerations and implementation in mathematica. Tutor. Quant. Methods Psychol. 9, 15–24 (2013). doi:10.20982/tqmp.09.1.p015
Article Google Scholar
Xu, R., Wunschii, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16, 645–678 (2005). doi:10.1109/tnn.2005.845141
Article Google Scholar
Survey report on K-means clustering algorithm. Int. J. Mod. Trends Eng. Res. 4, 218–221 (2017). doi:10.21884/ijmter.2017.4143.lgjzd
Na, S., Xumin, L., Yong, G.: Research on k-means clustering algorithm: an improved k-means clustering algorithm. In: 2010 Third International Symposium on Intelligent Information Technology and Security Informatics (2010). doi:10.1109/iitsi.2010.74
Agarwal, S.: Data mining: data mining concepts and techniques. In: 2013 International Conference on Machine Intelligence and Research Advancement (2013). doi:10.1109/icmira.2013.45
Hamerly, G., Drake, J.: Accelerating Lloyd’s algorithm for k-means clustering. Partitional Clust. Algorithms (2014). doi:10.1007/978-3-319-09259-1_2
Google Scholar
Lei, X.-F.: An efficient clustering algorithm based on local optimality of K-means. J. Softw. 19, 1683–1692 (2008). doi:10.3724/sp.j.1001.2008.01683
Article MATH Google Scholar
Tiwari, A.K., Sharma, L.K., Krishna, G.R.: Entropy weighting genetic k-means algorithm for subspace clustering. Int. J. Comput. Appl. 7, 27–30 (2010). doi:10.5120/1263-1628
Google Scholar
Zheng, D., Wang, Q.-P.: Selection algorithm for K-means initial clustering center. J. Comput. Appl. 32, 2186–2188 (2013). doi:10.3724/sp.j.1087.2012.02186
Google Scholar
Wu, J.: Cluster analysis and K-means clustering: an introduction. In: Advances in K-Means Clustering. Springer Theses, pp. 1–16 (2012). doi:10.1007/978-3-642-29807-3_1
An Introduction to Classification and Clustering. Cluster Analysis Wiley Series in Probability and Statistics, pp. 1–13 (2011). doi:10.1002/9780470977811.ch1
Wu, J.: The Uniform Effect of K-means Clustering. In: Advances in K-Means Clustering. Springer Theses, pp. 17–35 (2012). doi:10.1007/978-3-642-29807-3_2
Shrivastava, P., Kavita, P., Singh, S., Shukla, M.: Comparative analysis in between the k-means algorithm, k-means using with Gaussian mixture model and fuzzy c means algorithm. Commun. Comput. Syst. (2016). doi:10.1201/9781315364094-186
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, College of Computers and Information Systems, Arab Academy for Science and Technology (AAST), Cairo, Egypt
Ahmed Mohamed Maghawry & Yasser Omar
Department of Computer Science, Faculty of Computers and Information, Cairo University, Cairo, 12613, Egypt
Amr Badr

Authors

Ahmed Mohamed Maghawry
View author publications
You can also search for this author in PubMed Google Scholar
Yasser Omar
View author publications
You can also search for this author in PubMed Google Scholar
Amr Badr
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed Mohamed Maghawry .

Editor information

Editors and Affiliations

Faculty of Applied Informatics, Tomas Bata University in Zlín, Zlín, Czech Republic
Radek Silhavy
Faculty of Applied Informatics, Tomas Bata University in Zlín, Zlín, Czech Republic
Petr Silhavy
Faculty of Applied Informatics, Tomas Bata University in Zlín, Zlín, Czech Republic
Zdenka Prokopova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Maghawry, A.M., Omar, Y., Badr, A. (2018). Initial Centroid Selection Optimization for K-Means with Genetic Algorithm to Enhance Clustering of Transcribed Arabic Broadcast News Documents. In: Silhavy, R., Silhavy, P., Prokopova, Z. (eds) Applied Computational Intelligence and Mathematical Methods. CoMeSySo 2017. Advances in Intelligent Systems and Computing, vol 662. Springer, Cham. https://doi.org/10.1007/978-3-319-67621-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-67621-0_8
Published: 05 September 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67620-3
Online ISBN: 978-3-319-67621-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics