Skip to main content

Initial Centroid Selection Optimization for K-Means with Genetic Algorithm to Enhance Clustering of Transcribed Arabic Broadcast News Documents

  • Conference paper
  • First Online:
Applied Computational Intelligence and Mathematical Methods (CoMeSySo 2017)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 662))

Included in the following conference series:

Abstract

In this research a collection of artificial intelligence techniques are combined together to optimize the process of clustering textual transcripts obtained from audio sources. Since clustering techniques have drawbacks that if not taken care of will produce sub optimal clustering solutions, it’s essential to attempt to optimize the clustering algorithms to avoid sub optimal solutions. As an attempt to overcome this problem, different artificial intelligence techniques are applied to avoid clustering problems. The main objectives of this research is to optimize automatic topic clustering of transcribed speech documents, and investigate the impact of applying genetic algorithm optimization and initial centroid selection optimization (ICSO) in combination with K-means clustering algorithm using Chi-Square similarity measure on the accuracy and the sum of square distances (SSD) of the selected clustering algorithm. The evaluation showed that using ICSO with genetic algorithm and K-means clustering algorithm with Chi-square similarity measure achieved the highest accuracy with the least SSD.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

8. References

  1. Wold, E., Blum, T., Keislar, D., Wheaten, J.: Content-based classification, search, and retrieval of audio. IEEE Multimed. 3, 27–36 (1996). doi:10.1109/93.556537

    Article  Google Scholar 

  2. Li, D., Sethi, I.K., Dimitrova, N., Mcgee, T.: Classification of general audio data for content-based retrieval. Pattern Recognit. Lett. 22, 533–544 (2001). doi:10.1016/s0167-8655(00)00119-7

    Article  MATH  Google Scholar 

  3. Coden, A., Brown, E.: Speech transcript analysis for automatic search. In: Proceedings of the 34th Annual Hawaii International Conference on System Sciences. doi:10.1109/hicss.2001.926473

  4. Ibrahimov, O.V., Sethi, I.K., Dimitrova, N.: Data mining and knowledge discovery: theory. Tools Technol. IV (2002). doi:10.1117/12.460239

    Google Scholar 

  5. A comparison of document clustering algorithms. In: Proceedings of the 5th International Workshop on Pattern Recognition in Information Systems (2005). doi:10.5220/0002557501860191

  6. Hayes-Roth, F.: Review of “Adaptation in Natural and Artificial Systems by John H. Holland”, The U. of Michigan Press, 1975. ACM SIGART Bull. 53, 15 (1975). doi:10.1145/1216504.1216510

    Article  Google Scholar 

  7. Genetic algorithms in search, optimization, and machine learning. Choice Rev. Online (1989). doi:10.5860/choice.27-0936

  8. Nazeer, K.A.A., Sebastian, M.P., Kumar, S.D.M.: A heuristic k-means algorithm with better accuracy and efficiency for clustering health informatics data. J. Med. Imaging Health Inform. 1, 66–71 (2011). doi:10.1166/jmihi.2011.1010

    Article  Google Scholar 

  9. Banerjee, A., Louis, S.J.: A recursive clustering methodology using a genetic algorithm. In: 2007 IEEE Congress on Evolutionary Computation (2007). doi:10.1109/cec.2007.4424740

  10. Sun, H.-J., Xiong, L.-H.: Genetic algorithm-based high-dimensional data clustering technique. In: 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery (2009). doi:10.1109/fskd.2009.215

  11. Jian-Xiang, W., Huai, L., Yue-Hong, S., Xin-Ning, S.: Application of genetic algorithm in document clustering. In: 2009 International Conference on Information Technology and Computer Science (2009). doi:10.1109/itcs.2009.269

  12. Jafar, A.A., Fakhr, M.W., Farouk, M.H.: Clustering-based topic identification of transcribed Arabic broadcast news. In: New Trends in Networking, Computing, E-learning, Systems Sciences, and Engineering. Lecture Notes in Electrical Engineering, pp. 253–260 (2014). doi:10.1007/978-3-319-06764-3_32

  13. Speech to Text Online Notepad. Free. In: Speechnotes. https://speechnotes.co/#app. Accessed 12 May 2017

  14. Morissette, L., Chartier, S.: The k-means clustering technique: general considerations and implementation in mathematica. Tutor. Quant. Methods Psychol. 9, 15–24 (2013). doi:10.20982/tqmp.09.1.p015

    Article  Google Scholar 

  15. Xu, R., Wunschii, D.: Survey of clustering algorithms. IEEE Trans. Neural Netw. 16, 645–678 (2005). doi:10.1109/tnn.2005.845141

    Article  Google Scholar 

  16. Survey report on K-means clustering algorithm. Int. J. Mod. Trends Eng. Res. 4, 218–221 (2017). doi:10.21884/ijmter.2017.4143.lgjzd

  17. Na, S., Xumin, L., Yong, G.: Research on k-means clustering algorithm: an improved k-means clustering algorithm. In: 2010 Third International Symposium on Intelligent Information Technology and Security Informatics (2010). doi:10.1109/iitsi.2010.74

  18. Agarwal, S.: Data mining: data mining concepts and techniques. In: 2013 International Conference on Machine Intelligence and Research Advancement (2013). doi:10.1109/icmira.2013.45

  19. Hamerly, G., Drake, J.: Accelerating Lloyd’s algorithm for k-means clustering. Partitional Clust. Algorithms (2014). doi:10.1007/978-3-319-09259-1_2

    Google Scholar 

  20. Lei, X.-F.: An efficient clustering algorithm based on local optimality of K-means. J. Softw. 19, 1683–1692 (2008). doi:10.3724/sp.j.1001.2008.01683

    Article  MATH  Google Scholar 

  21. Tiwari, A.K., Sharma, L.K., Krishna, G.R.: Entropy weighting genetic k-means algorithm for subspace clustering. Int. J. Comput. Appl. 7, 27–30 (2010). doi:10.5120/1263-1628

    Google Scholar 

  22. Zheng, D., Wang, Q.-P.: Selection algorithm for K-means initial clustering center. J. Comput. Appl. 32, 2186–2188 (2013). doi:10.3724/sp.j.1087.2012.02186

    Google Scholar 

  23. Wu, J.: Cluster analysis and K-means clustering: an introduction. In: Advances in K-Means Clustering. Springer Theses, pp. 1–16 (2012). doi:10.1007/978-3-642-29807-3_1

  24. An Introduction to Classification and Clustering. Cluster Analysis Wiley Series in Probability and Statistics, pp. 1–13 (2011). doi:10.1002/9780470977811.ch1

  25. Wu, J.: The Uniform Effect of K-means Clustering. In: Advances in K-Means Clustering. Springer Theses, pp. 17–35 (2012). doi:10.1007/978-3-642-29807-3_2

  26. Shrivastava, P., Kavita, P., Singh, S., Shukla, M.: Comparative analysis in between the k-means algorithm, k-means using with Gaussian mixture model and fuzzy c means algorithm. Commun. Comput. Syst. (2016). doi:10.1201/9781315364094-186

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmed Mohamed Maghawry .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this paper

Cite this paper

Maghawry, A.M., Omar, Y., Badr, A. (2018). Initial Centroid Selection Optimization for K-Means with Genetic Algorithm to Enhance Clustering of Transcribed Arabic Broadcast News Documents. In: Silhavy, R., Silhavy, P., Prokopova, Z. (eds) Applied Computational Intelligence and Mathematical Methods. CoMeSySo 2017. Advances in Intelligent Systems and Computing, vol 662. Springer, Cham. https://doi.org/10.1007/978-3-319-67621-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67621-0_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67620-3

  • Online ISBN: 978-3-319-67621-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics