A Novel Weighting Scheme Applied to Improve the Text Document Clustering Techniques

  • Laith Mohammad AbualigahEmail author
  • Ahamad Tajudin Khader
  • Essam Said Hanandeh
Part of the Studies in Computational Intelligence book series (SCI, volume 741)


Text clustering is an efficient analysis technique used in the domain of the text mining to arrange a huge of unorganized text documents into a subset of coherent clusters. Where, the similar documents in the same cluster. In this paper, we proposed a novel term weighting scheme, namely, length feature weight (LFW), to improve the text document clustering algorithms based on new factors. The proposed scheme assigns a favorable term weight according to the obtained information from the documents collection. It recognizes the terms which are particular to each cluster and enhances their weights based on the proposed factors at the level of the document. β-hill climbing technique is used to validate the proposed scheme in the text clustering. The proposed weight scheme is compared with the existing weight scheme (TF-IDF) to validate its results in that domain. Experiments are conducted on eight standard benchmark text datasets taken from the Laboratory of Computational Intelligence (LABIC). The results proved that the proposed weighting scheme LFW overcomes the existing weighting scheme and enhances the result of text document clustering technique in terms of the F-measure, precision, and recall.


Text document clustering β-hill climbing technique Length feature weight scheme 



The authors would like to thank the editors, reviewers for their helpful comments and EAI COMPSE 2016.


  1. 1.
    Abualigah, L. M., Khader, A. T., & Al-Betar, M. A. (2016, July). Multi-objectives-based text clustering technique using K-mean algorithm. In 7th International Conference on Computer Science and Information Technology (CSIT) (pp. 1–6). IEEE.Google Scholar
  2. 2.
    Makki, S., Yaakob, R., Mustapha, N., & Ibrahim, H. (2015). Advances in document clustering with evolutionary-based algorithms. American Journal of Applied Sciences, 12(10), 689.CrossRefGoogle Scholar
  3. 3.
    Tang, B., Shepherd, M., Milios, E., & Heywood, M. I. (2005, April). Comparing and combining dimension reduction techniques for efficient text clustering. In International Workshop on Feature Selection for Data Mining, 39 (pp. 81–88).Google Scholar
  4. 4.
    Abualigah, L. M., Khader, A. T., Al-Betar, M. A., & Awadallah, M. A. (2016, May). A krill herd algorithm for efficient text documents clustering. In IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE) (pp. 67–72). IEEE.Google Scholar
  5. 5.
    Bharti, K. K., & Singh, P. K. (2015). Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering. Expert Systems with Applications, 42(6), 3105–3114.CrossRefGoogle Scholar
  6. 6.
    Abualigah, L. M., Khader, A. T., & Al-Betar, M. A. (2016, July). Unsupervised feature selection technique based on genetic algorithm for improving the Text Clustering. In7th International Conference on Computer Science and Information Technology (CSIT) (pp. 1–6). IEEE.Google Scholar
  7. 7.
    Abualigah, L. M., Khader, A. T., & Al-Betar, M. A. (2016, July). Unsupervised feature selection technique based on harmony search algorithm for improving the Text Clustering. In 7th International Conference on Computer Science and Information Technology (CSIT) 2016 (pp. 1–6). IEEE.Google Scholar
  8. 8.
    Aggarwal, C. C., & Zhai, C. (2012). A survey of text clustering algorithms. In Mining text data (pp. 77–128). US: Springer.Google Scholar
  9. 9.
    Mahdavi, M., Chehreghani, M. H., Abolhassani, H., & Forsati, R. (2008). Novel meta-heuristic algorithms for clustering web documents. Applied Mathematics and Computation, 201(1), 441–451.MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Abualigah, L. M. Q., & Hanandeh, E. S. (2015). Applying genetic algorithms to information retrieval using vector space model. International Journal of Computer Science, Engineering and Applications, 5(1), 19.CrossRefGoogle Scholar
  11. 11.
    Murugesan, A. K., & Zhang, B. J. (2011). A new term weighting scheme for document clustering. In 7th International Conference Data Min. (DMIN 2011-WORLDCOMP 2011, Las Vegas, Nevada, USA.Google Scholar
  12. 12.
    Cui, X., Potok, T. E., & Palathingal, P. (2005, June). Document clustering using particle swarm optimization. In Swarm Intelligence Symposium, 2005. SIS 2005. Proceedings 2005 IEEE (pp. 185–191). IEEE.Google Scholar
  13. 13.
    Jensi, R., & Jiji, D. G. W. (2014). A survey on optimization approaches to text document clustering. arXiv:1401.2229.
  14. 14.
    Bolaji, A. L. A., Al-Betar, M. A., Awadallah, M. A., Khader, A. T., & Abualigah, L. M. (2016). A comprehensive review: Krill Herd algorithm (KH) and its applications. Applied Soft Computing, 49, 437–446.CrossRefGoogle Scholar
  15. 15.
    Hanandeh, E., & Maabreh, K. (2015). Effective information retrieval method based on matching adaptive genetic algorithm. Journal of Theoretical and Applied Information Technology, 81(3), 446.Google Scholar
  16. 16.
    Abualigah, L. M., Khader, A. T., Al-Betar, M. A., Alyasseri Z. A., Alomari, O. A., & Hanandeh, E. S. (2017). Feature Selection with \( \beta \)-hill climbing Search for Text Clustering Application. In Second Palestinian International Conference on Information and Communication Technology. IEEE.Google Scholar
  17. 17.
    Yeh, W. C., Lai, C. M., & Chang, K. H. (2016). A novel hybrid clustering approach based on K-harmonic means using robust design. Neurocomputing, 173, 1720–1732.CrossRefGoogle Scholar
  18. 18.
    Chandran, T. R., Reddy, A. V., & Janet, B. (2017). Text Clustering Quality Improvement using a hybrid Social spider optimization. International Journal of Applied Engineering Research, 12(6), 995–1008.Google Scholar
  19. 19.
    Tunali, V., Bilgin, T., & Camurcu, A. (2016). An improved clustering algorithm for text mining: multi-cluster spherical k-means. International Arab Journal of Information Technology, 13(1), 12–19.Google Scholar
  20. 20.
    Kohli, S., & Mehrotra, S. (2016). A clustering approach for optimization of search result. Journal of Images and Graphics, 4(1), 63–66.Google Scholar
  21. 21.
    Prakash, B. R., Hanumanthappa, M., & Mamatha, M. (2014). Cluster based term weighting model for web document clustering. In Proceedings of the Third International Conference on Soft Computing for Problem Solving (pp. 815–822). India: Springer.Google Scholar
  22. 22.
    Vahdani, B., Behzadi, S. S., Mousavi, S. M., & Shahriari, M. R. (2016). A dynamic virtual air hub location problem with balancing requirements via robust optimization: Mathematical modeling and solution methods. Journal of Intelligent & Fuzzy Systems, 31(3), 1521–1534.CrossRefGoogle Scholar
  23. 23.
    Vasant, P. (2015). Handbook of Research on Artificial Intelligence Techniques and Algorithms, 2 Volumes. Information Science Reference-Imprint of IGI Publishing.Google Scholar
  24. 24.
    Vasant, P. (Ed.). (2013). Handbook of research on novel soft computing intelligent algorithms: Theory and practical applications. IGI Global.Google Scholar
  25. 25.
    Vasant, P. (Ed.). (2011). Innovation in power, control, and optimization: Emerging energy technologies: Emerging energy technologies. IGI Global.Google Scholar
  26. 26.
    Vasant, P. (Ed.). (2016). Handbook of research on modern optimization algorithms and applications in engineering and economics. IGI Global.Google Scholar
  27. 27.
    Mohammed, A. J., Yusof, Y., & Husni, H. (2014). Weight-based Firefly algorithm for document clustering. In Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013) (pp. 259–266). Singapore: Springer.Google Scholar
  28. 28.
    Punitha, S. C., & Punithavalli, M. (2012). Performance evaluation of semantic based and ontology based text document clustering techniques. Procedia Engineering, 30, 100–106.CrossRefGoogle Scholar
  29. 29.
    Liu, W., & Wong, W. (2009). Web service clustering using text mining techniques. International Journal of Agent-Oriented Software Engineering, 3(1), 6–26.CrossRefGoogle Scholar
  30. 30.
    Abualigah, L. M., Khader, A. T., Al-Betar, M. A., & Hanandeh, E. S. A new hybridization strategy for krill herd algorithm and harmony search algorithm applied to improve the data clustering. management9, 11.Google Scholar
  31. 31.
    Abualigah, L. M., Khader, A. T., Al-Betar, M. A., & Alomari, O. A. (2017). Text feature selection with a robust weight scheme and dynamic dimension reduction to text document clustering. Expert Systems with Applications.Google Scholar
  32. 32.
    Rangrej, A., Kulkarni, S., & Tendulkar, A. V. (2011, March). Comparative study of clustering techniques for short text documents. In Proceedings of the 20th International Conference Companion on World wide web (pp. 111–112). ACM.Google Scholar
  33. 33.
    Abualigah, L. M., & Khader, A. T. (2017). Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering. The Journal of Supercomputing, 1–23.Google Scholar
  34. 34.
    Abualigah, L. M., Khader, A. T., AlBetar, M. A., & Hanandeh, E. S. (2017). Unsupervised text feature selection technique based on particle swarm optimization algorithm for improving the text clustering.Google Scholar
  35. 35.
    Sharma, S., & Gupta, V. (2012). Recent developments in text clustering techniques. Recent Developments in Text Clustering Techniques, 37(6).Google Scholar
  36. 36.
    Huang, A. (2008, April). Similarity measures for text document clustering. In Proceedings of the Sixth New Zealand Computer Science Research Student Conference (NZCSRSC2008) (pp. 49–56), Christchurch, New Zealand.Google Scholar
  37. 37.
    Zaw, M. M., & Mon, E. E. (2013). Web document clustering using cuckoo search clustering algorithm based on levy flight. International Journal of Innovation and Applied Studies, 4(1), 182–188.Google Scholar
  38. 38.
    Forsati, R., Mahdavi, M., Shamsfard, M., & Meybodi, M. R. (2013). Efficient stochastic algorithms for document clustering. Information Sciences, 220, 269–291.MathSciNetCrossRefGoogle Scholar
  39. 39.
    Karol, S., & Mangat, V. (2013). Evaluation of text document clustering approach based on particle swarm optimization. Open Computer Science, 3(2), 69–90.CrossRefGoogle Scholar
  40. 40.
    Boyack, K. W., Small, H., & Klavans, R. (2013). Improving the accuracy of co-citation clustering using full text. Journal of the American Society for Information Science and Technology, 64(9), 1759–1767.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  • Laith Mohammad Abualigah
    • 1
    Email author
  • Ahamad Tajudin Khader
    • 1
  • Essam Said Hanandeh
    • 2
  1. 1.School of Computer ScienceUniversiti Sains MalaysiaGeorge TownMalaysia
  2. 2.Department of Computer Information SystemZarqa UniversityZarqaJordan

Personalised recommendations