Abstract
In this research a collection of artificial intelligence techniques are combined together to optimize the process of clustering textual transcripts obtained from audio sources. Since clustering techniques have drawbacks that if not taken care of will produce sub optimal clustering solutions, it’s essential to attempt to optimize the clustering algorithms to avoid sub optimal solutions. As an attempt to overcome this problem, different artificial intelligence techniques are applied to avoid clustering problems. The main objectives of this research is to optimize automatic topic clustering of transcribed speech documents, and investigate the impact of applying genetic algorithm optimization and initial centroid selection optimization (ICSO) in combination with K-means clustering algorithm using Chi-Square similarity measure on the accuracy and the sum of square distances (SSD) of the selected clustering algorithm. The evaluation showed that using ICSO with genetic algorithm and K-means clustering algorithm with Chi-square similarity measure achieved the highest accuracy with the least SSD.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
8. References
Wold, E., Blum, T., Keislar, D., Wheaten, J.: Content-based classification, search, and retrieval of audio. IEEE Multimed. 3, 27–36 (1996). doi:10.1109/93.556537
Li, D., Sethi, I.K., Dimitrova, N., Mcgee, T.: Classification of general audio data for content-based retrieval. Pattern Recognit. Lett. 22, 533–544 (2001). doi:10.1016/s0167-8655(00)00119-7
Coden, A., Brown, E.: Speech transcript analysis for automatic search. In: Proceedings of the 34th Annual Hawaii International Conference on System Sciences. doi:10.1109/hicss.2001.926473
Ibrahimov, O.V., Sethi, I.K., Dimitrova, N.: Data mining and knowledge discovery: theory. Tools Technol. IV (2002). doi:10.1117/12.460239
A comparison of document clustering algorithms. In: Proceedings of the 5th International Workshop on Pattern Recognition in Information Systems (2005). doi:10.5220/0002557501860191
Hayes-Roth, F.: Review of “Adaptation in Natural and Artificial Systems by John H. Holland”, The U. of Michigan Press, 1975. ACM SIGART Bull. 53, 15 (1975). doi:10.1145/1216504.1216510
Genetic algorithms in search, optimization, and machine learning. Choice Rev. Online (1989). doi:10.5860/choice.27-0936
Nazeer, K.A.A., Sebastian, M.P., Kumar, S.D.M.: A heuristic k-means algorithm with better accuracy and efficiency for clustering health informatics data. J. Med. Imaging Health Inform. 1, 66–71 (2011). doi:10.1166/jmihi.2011.1010
Banerjee, A., Louis, S.J.: A recursive clustering methodology using a genetic algorithm. In: 2007 IEEE Congress on Evolutionary Computation (2007). doi:10.1109/cec.2007.4424740
Sun, H.-J., Xiong, L.-H.: Genetic algorithm-based high-dimensional data clustering technique. In: 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery (2009). doi:10.1109/fskd.2009.215
Jian-Xiang, W., Huai, L., Yue-Hong, S., Xin-Ning, S.: Application of genetic algorithm in document clustering. In: 2009 International Conference on Information Technology and Computer Science (2009). doi:10.1109/itcs.2009.269
Jafar, A.A., Fakhr, M.W., Farouk, M.H.: Clustering-based topic identification of transcribed Arabic broadcast news. In: New Trends in Networking, Computing, E-learning, Systems Sciences, and Engineering. Lecture Notes in Electrical Engineering, pp. 253–260 (2014). doi:10.1007/978-3-319-06764-3_32
Speech to Text Online Notepad. Free. In: Speechnotes. https://speechnotes.co/#app. Accessed 12 May 2017
Morissette, L., Chartier, S.: The k-means clustering technique: general considerations and implementation in mathematica. Tutor. Quant. Methods Psychol. 9, 15–24 (2013). doi:10.20982/tqmp.09.1.p015
Xu, R., Wunschii, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16, 645–678 (2005). doi:10.1109/tnn.2005.845141
Survey report on K-means clustering algorithm. Int. J. Mod. Trends Eng. Res. 4, 218–221 (2017). doi:10.21884/ijmter.2017.4143.lgjzd
Na, S., Xumin, L., Yong, G.: Research on k-means clustering algorithm: an improved k-means clustering algorithm. In: 2010 Third International Symposium on Intelligent Information Technology and Security Informatics (2010). doi:10.1109/iitsi.2010.74
Agarwal, S.: Data mining: data mining concepts and techniques. In: 2013 International Conference on Machine Intelligence and Research Advancement (2013). doi:10.1109/icmira.2013.45
Hamerly, G., Drake, J.: Accelerating Lloyd’s algorithm for k-means clustering. Partitional Clust. Algorithms (2014). doi:10.1007/978-3-319-09259-1_2
Lei, X.-F.: An efficient clustering algorithm based on local optimality of K-means. J. Softw. 19, 1683–1692 (2008). doi:10.3724/sp.j.1001.2008.01683
Tiwari, A.K., Sharma, L.K., Krishna, G.R.: Entropy weighting genetic k-means algorithm for subspace clustering. Int. J. Comput. Appl. 7, 27–30 (2010). doi:10.5120/1263-1628
Zheng, D., Wang, Q.-P.: Selection algorithm for K-means initial clustering center. J. Comput. Appl. 32, 2186–2188 (2013). doi:10.3724/sp.j.1087.2012.02186
Wu, J.: Cluster analysis and K-means clustering: an introduction. In: Advances in K-Means Clustering. Springer Theses, pp. 1–16 (2012). doi:10.1007/978-3-642-29807-3_1
An Introduction to Classification and Clustering. Cluster Analysis Wiley Series in Probability and Statistics, pp. 1–13 (2011). doi:10.1002/9780470977811.ch1
Wu, J.: The Uniform Effect of K-means Clustering. In: Advances in K-Means Clustering. Springer Theses, pp. 17–35 (2012). doi:10.1007/978-3-642-29807-3_2
Shrivastava, P., Kavita, P., Singh, S., Shukla, M.: Comparative analysis in between the k-means algorithm, k-means using with Gaussian mixture model and fuzzy c means algorithm. Commun. Comput. Syst. (2016). doi:10.1201/9781315364094-186
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Maghawry, A.M., Omar, Y., Badr, A. (2018). Initial Centroid Selection Optimization for K-Means with Genetic Algorithm to Enhance Clustering of Transcribed Arabic Broadcast News Documents. In: Silhavy, R., Silhavy, P., Prokopova, Z. (eds) Applied Computational Intelligence and Mathematical Methods. CoMeSySo 2017. Advances in Intelligent Systems and Computing, vol 662. Springer, Cham. https://doi.org/10.1007/978-3-319-67621-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-67621-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67620-3
Online ISBN: 978-3-319-67621-0
eBook Packages: EngineeringEngineering (R0)