Skip to main content

An Analytical Approach to Document Clustering Techniques

  • Conference paper
  • First Online:
Book cover ICT Systems and Sustainability

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1077))

  • 495 Accesses

Abstract

Clustering is a technique that group data together based on their similarity and apart based on their dissimilarity. When this technique is applied to documents and the terms within these documents retrieval of similar documents become easy and efficient. Document clustering is being researched and utilized for many years but is yet far from being optimal. To study and analyze different document clustering algorithm, a theoretical literature review and analysis was performed and the results are presented in this paper. This paper comprises of theoretical review of papers. 95 papers were identified and out of these 30 were selected. Various techniques or algorithms and modifications to previous algorithms proposed for document clustering by various researchers are compiled and presented with the intent that it will aid the researchers in finding out the current and future scope of research in information retrieval systems and document clustering technologies.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Handa, R., Rama Krishna, C., Aggarwal, N.: Document clustering for efficient and secure information retrieval from cloud. Concurr. Comput. Pract. Exp. e5127

    Google Scholar 

  2. Anbarasi, M.S., et al.: Ontology oriented concept-based clustering. IJRET Int. J. Res. Eng. Technol. 3(2) (2014)

    Google Scholar 

  3. Sedding, J., Kazakov, D.: WordNet-based text document clustering. In: Proceedings of the 3rd Workshop on Robust Methods in Analysis of Natural Language Data. Association for Computational Linguistics (2004)

    Google Scholar 

  4. Sarkar, S., Roy, A., Purkayastha, B.S.: A comparative analysis of particle swarm optimization and K-means algorithm for text clustering using Nepali Wordnet. Int. J. Nat. Lang. Comput. (IJNLC) 3(3) (2014)

    Article  Google Scholar 

  5. Akter, R., Chung, Y.: An evolutionary approach for document clustering. IERI Procedia 4, 370–375 (2013)

    Article  Google Scholar 

  6. Meena, K.Y., Singh, P.: Text documents clustering using genetic algorithm and discrete differential evolution. Int. J. Comput. Appl. 43(1), 0975–8887 (2012)

    Google Scholar 

  7. Trappey, A.J.C., et al.: A fuzzy ontological knowledge document clustering methodology. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 39(3), 806–814 (2009)

    Article  Google Scholar 

  8. Thilagavathi, G., Anitha, J.: Document clustering in forensic investigation by hybrid approach. Int. J. Comput. Appl. 91(3) (2014)

    Article  Google Scholar 

  9. Baghel, R., Dhir, R.: A frequent concepts-based document clustering algorithm. Int. J. Comput. Appl. 4(5), 6–12 (2010)

    Google Scholar 

  10. Jing, H., et al.: Semantic naïve Bayes classifier for document classification. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing (2013)

    Google Scholar 

  11. Aggarwal, C.C., Reddy, C.K. (eds.): Data Clustering: Algorithms and Applications. CRC Press, New York (2013)

    Google Scholar 

  12. Abualigah, L.M., Khader, A.T., Hanandeh, E.S.: A combination of objective functions and hybrid Krill herd algorithm for text document clustering analysis. Eng. Appl. Artif. Intell. 73, 111–125 (2018)

    Article  Google Scholar 

  13. Lydia, E.L., et al.: Charismatic document clustering through novel K-Means non-negative matrix factorization (KNMF) algorithm using key phrase extraction. Int. J. Parallel Program. 1–19 (2018)

    Google Scholar 

  14. Altameem, T., Amoon, M.: Hybrid tolerance rough fuzzy set with improved monkey search algorithm-based document clustering. J. Ambient Intell. Humanized Comput. 1–11 (2018)

    Google Scholar 

  15. Dalal, V., Malik, L.: Data Clustering Approach for Automatic Text Summarization of Hindi Documents using Particle Swarm Optimization and Semantic Graph

    Google Scholar 

  16. Ahmad, A., Amin, M.R., Chowdhury, F.: Bengali document clustering using word movers distance. In: 2018 International Conference on Bangla Speech and Language Processing (ICBSLP). IEEE (2018)

    Google Scholar 

  17. Lakshmi, R., Baskar, S.: DIC-DOC-K-means: dissimilarity-based Initial Centroid selection for DOCument clustering using K-means for improving the effectiveness of text document clustering. J. Inf. Sci. 0165551518816302 (2018)

    Google Scholar 

  18. Megarchioti, S., Mamalis, B.: The BigKClustering approach for document clustering using Hadoop MapReduce. In: Proceedings of the 22nd Pan-Hellenic Conference on Informatics. ACM (2018)

    Google Scholar 

  19. Al-Jadir, I., et al.: Enhancing digital forensic analysis using memetic algorithm feature selection method for document clustering. In: 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE (2018)

    Google Scholar 

  20. Zhu, Y., Zhang, M., Shi, F.: Application of algorithm CARDBK in document clustering. Wuhan Univ. J. Nat. Sci. 23(6), 514–524 (2018)

    Article  Google Scholar 

  21. Abualigah, L.M., et al.: A krill herd algorithm for efficient text documents clustering. In: 2016 IEEE Symposium on Computer Applications & Industrial Electronics (ISCAIE). IEEE (2016)

    Google Scholar 

  22. Akter, R., Chung, Y.: An improved genetic algorithm for document clustering on the cloud. Int. J. Cloud Appl. Comput. (IJCAC) 8(4), 20–28 (2018)

    Google Scholar 

  23. Chen, Y., Sun, P.: An optimized K-Means algorithm based on FSTVM. In: 2018 International Conference on Virtual Reality and Intelligent Systems (ICVRIS). IEEE (2018)

    Google Scholar 

  24. Al-Jadir, I., et al.: Adaptive crossover memetic differential harmony search for optimizing document clustering. In: International Conference on Neural Information Processing. Springer, Cham (2018)

    Chapter  Google Scholar 

  25. Seshadri, K., Viswanathan Iyer, K.: Design and evaluation of a parallel document clustering algorithm based on hierarchical latent semantic analysis. Concurr. Comput. Pract. Exp. e5094

    Google Scholar 

  26. Saini, N., Saha, S., Bhattacharyya, P.: Automatic scientific document clustering using self-organized multi-objective differential evolution. Cogn. Comput. 1–23 (2018)

    Google Scholar 

  27. Rani, M.S., Babu, G.C.: Efficient query clustering technique and context well-informed document clustering. In: Soft Computing and Signal Processing, pp. 261–271. Springer, Singapore (2019)

    Google Scholar 

  28. Gonzàlez, E., Turmo, J.: Unsupervised document clustering by weighted combination. LSI Research Report LSI-06-17-R, Departament de Llenguatges i Sistemes Informátics, Barcelona (2006)

    Google Scholar 

  29. Gupta, A., Gautam, J., Kumar, A.: A survey on methodologies used for semantic document clustering. In: 2017 International Conference on Energy, Communication, Data Analytics and Soft Computing (ICECDS). IEEE (2017)

    Google Scholar 

  30. Jain, A.K., Narasimha Murty, M., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. (CSUR) 31(3), 264–323 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vikas Choubey .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Choubey, V., Dubey, S.K. (2020). An Analytical Approach to Document Clustering Techniques. In: Tuba, M., Akashe, S., Joshi, A. (eds) ICT Systems and Sustainability. Advances in Intelligent Systems and Computing, vol 1077. Springer, Singapore. https://doi.org/10.1007/978-981-15-0936-0_3

Download citation

Publish with us

Policies and ethics