Advertisement

Determining Number of Clusters Using Firefly Algorithm with Cluster Merging for Text Clustering

  • Athraa Jasim MohammedEmail author
  • Yuhanis Yusof
  • Husniza Husni
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9429)

Abstract

Text mining, in particular the clustering is mostly used by search engines to increase the recall and precision of a search query. The content of online websites (text, blogs, chats, news, etc.) are dynamically updated, nevertheless relevant information on the changes made are not present. Such a scenario requires a dynamic text clustering method that operates without initial knowledge on a data collection. In this paper, a dynamic text clustering that utilizes Firefly algorithm is introduced. The proposed, aFAmerge, clustering algorithm automatically groups text documents into the appropriate number of clusters based on the behavior of firefly and cluster merging process. Experiments utilizing the proposed aFAmerge were conducted on two datasets; 20Newsgroups and Reuter’s news collection. Results indicate that the aFAmerge generates a more robust and compact clusters than the ones produced by Bisect K-means and practical General Stochastic Clustering Method (pGSCM).

Keywords

Firefly algorithm Text clustering Text mining Agglomerative clustering 

Notes

Acknowledgments

Authors would like to thank the Malaysian Ministry of Higher Education for providing the financial support under the Fundamental Research Grant Scheme (s/o: 12894). Gratitude also goes to Universiti Utara Malaysia for helping in managing the study.

References

  1. 1.
    Sayed, A., Hacid, H., Zighed, D.: Exploring validity indices for clustering textual data. In: Zighed, D.A., Tsumoto, S., Ras, Z.W., Hacid, H. (eds.) Mining Complex Data. Studies in Computational Intelligence, vol. 165, pp. 281–300. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  2. 2.
    Miner, G., Elder, J., Fast, A., Hill, T., Nisbet, R., Delen, D.: Practical Text Mining and Statistical Analysis for Non-structured Text Data Applications, 1st edn. Elsevier, Amsterdam (2012)Google Scholar
  3. 3.
    Zhang, L., Cao, Q., Lee, J.: A novel ant-based clustering algorithm using Renyi entropy. Appl. Soft Comput. 13(5), 2643–2657 (2013)CrossRefGoogle Scholar
  4. 4.
    Murugesan, K, Zhang, J.: Hybrid bisect K-means clustering algorithm. In: IEEE International Conference on Business Computing and Global Informatization (BCGIN), pp. 216–219. IEEE (2011)Google Scholar
  5. 5.
    Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput. Surv. 31(3), 264–323 (1999)CrossRefGoogle Scholar
  6. 6.
    Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: proceedings of KDD Workshop on Text Mining, Boston (2000)Google Scholar
  7. 7.
    Tan, S.C., Ting, K.M., Teng, S.W.: A general stochastic clustering method for automatic cluster discovery. Pattern Recogn. 44(10–11), 2786–2799 (2011)Google Scholar
  8. 8.
    Feng, L., Qiu, M.H., Wang, Y.X., Xiang, Q.L., Yang, Y.F.: Fast divisive clustering algorithm using an improved discrete particle swarm optimizer. Pattern Recogn. Lett. 31, 1216–1225 (2010)CrossRefGoogle Scholar
  9. 9.
    Kashef, R., Kamel, M.S.: Enhanced bisecting K-means clustering using intermediate cooperation. Pattern Recogn. 42(11), 2557–2569 (2009)zbMATHCrossRefGoogle Scholar
  10. 10.
    Yujian, L., Liye, X.: Unweighted multiple group method with arithmetic mean. In: the IEEE Fifth International Conference on Bio-Inspired Computing: Theories and Applications (BIC-TA), pp. 830–834 (2010)Google Scholar
  11. 11.
    Yin, Y., Kaku, I., Tang, J., Zhu, J.: Data Mining Concepts, Methods and Application in Management and Engineering Design. Springer, London (2011)Google Scholar
  12. 12.
    Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, New York, Addition Wesley, Boston (2006)Google Scholar
  13. 13.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval, 1st edn. Cambridge University Press, Cambridge (2008)zbMATHCrossRefGoogle Scholar
  14. 14.
    Gil-Garicia, R., Pons-Porrata, A.: Dynamic hierarchical algorithms for document clustering. Pattern Recogn. Lett. 31(6), 469–477 (2010)CrossRefGoogle Scholar
  15. 15.
    Picarougne, F., Azzag, H., Venturini, G., Guinot, C.: A new approach of data clustering using a flock of agents. Evol. Comput. 15(3), 345–367 (2007)CrossRefGoogle Scholar
  16. 16.
    Tan, S.C., Ting, K.M., Teng, S.W.: Simplifying and improving ant-based clustering. Procedia Comput. Sci. 4, 46–55 (2011)CrossRefGoogle Scholar
  17. 17.
    Yang, X.S.: Firefly algorithm, stochastic test functions and design optimization. Int. J. Bio-Inspired Comput. 2(2), 78–84 (2010)CrossRefGoogle Scholar
  18. 18.
    Yang, X.S., He, X.: Firefly algorithm: recent advances and applications. Int. J. Swarm Intell. 1(1), 36–50 (2013)CrossRefGoogle Scholar
  19. 19.
    Mohammed, A.J., Yusof, Y., Husni, H.: Document clustering based on firefly algorithm. J. Comput. Sci. 11(3), 453–465 (2015)CrossRefGoogle Scholar
  20. 20.
    Newsgroup Data Set (2006). http://people.csail.mit.edu/20Newsgroup/
  21. 21.
    Lewis, D.: The reuters-21578 text categorization test collection (1999). http://kdd.ics.uci.edu/database/reuters21578/reuters21578.html
  22. 22.
    Forsati, R., Mahdavi, M., Shamsfard, M., Meybodi, M.R.: Efficient stochastic algorithms for document clustering. Inf. Sci. 220, 269–291 (2013)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Hatamlou, A., Abdullah, S., Nezamabadi-pour, H.: A combined approach for clustering based on K-means and gravitational search algorithms. Swarm Evol. Comput. 6, 47–52 (2012)CrossRefGoogle Scholar
  24. 24.
    Yang, X.S., Hosseini, S.S.S., Gandomi, A.H.: Firefly algorithm for solving non-convex economic dispatch problems with valve loading effect. Appl. Soft Comput. 12(3), 1180–1186 (2012)CrossRefGoogle Scholar
  25. 25.
    Adaniya, M.H.A.C., Abrão, T., Proença Jr., M.L.: Anomaly detection using metaheuristic firefly harmonic clustering. J. Netw. 8(1), 82–91 (2013)Google Scholar
  26. 26.
    Banati, H., Bajaj, M.: Performance analysis of firefly algorithm for data clustering. Int. J. Swarm Intell. 1(1), 19–35 (2013)CrossRefGoogle Scholar
  27. 27.
    Senthilnath, J., Omkar, S.N., Mani, V.: Clustering using firefly algorithm: performance study. Swarm Evol. Comput. 1(3), 164–171 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Athraa Jasim Mohammed
    • 1
    • 2
    Email author
  • Yuhanis Yusof
    • 1
  • Husniza Husni
    • 1
  1. 1.School of Computing, College of Arts and SciencesUniversiti Utara MalaysiaSintokMalaysia
  2. 2.University of TechnologyBaghdadIraq

Personalised recommendations