Skip to main content

Part of the book series: Studies in Computational Intelligence ((SCI,volume 270))

Abstract

The Harmony Search (HS) algorithm in recent years has been applied in many applications in computer science and engineering. This chapter is intended to review the application of the HS method in the area of web document clustering. Clustering is a problem of great practical importance that has been the focus of substantial research in several domains for decades. It is defined as the problem of partitioning data objects into groups, such that objects in the same group are similar, while objects in different groups are dissimilar. Due to the high-dimension and sparseness properties of documents the problem of clustering becomes more challenging when we apply it on web documents. Two algorithms in literature were proposed for clustering web documents with HS which will be reviewed in this chapter. Also three hybridization of HS based clustering with K-means algorithm will be reviewed. It will be shown that the HS method can outperform other methods in terms of solution quality and computational time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Húsek, D., Pokorný, J., Řezanková, H., et al.: Web data clustering. In: Foundations of Computational Intelligence, vol. 4. Springer, Berlin (2009)

    Google Scholar 

  2. Rijsbergen, V.: Information retrieval. Buttersworth, London (1979)

    Google Scholar 

  3. Aslam, J., Pelekhov, K., Rus, D.: Using star clusters for filtering. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, USA (2000)

    Google Scholar 

  4. Zhong, S., Ghosh, J.: A comparative study of generative models for document clustering. In: Proceedings of SDM Workshop on Clustering High Dimensional Data and Its Applications (2003)

    Google Scholar 

  5. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A Review. ACM Computing Surveys 31, 264–323 (1999)

    Article  Google Scholar 

  6. Grira, N., Crucianu, M., Boujemaa, N.: Unsupervised and semi-supervised clustering: a brief survey. In: Proceedings of 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, pp. 9–16 (2005)

    Google Scholar 

  7. Zhong, S., Ghosh, J.: Generative model-based clustering of documents: a comparative study. Knowledge and Information Systems (KAIS) 8, 374–384 (2005)

    Article  Google Scholar 

  8. Zhong, S.: Semi-supervised model-based document clustering: A Comparative Study. Machine Learning 65, 3–29 (2006)

    Article  Google Scholar 

  9. Guha, S., Rastogi, R., Shim, K.: An efficient clustering algorithm for large databases. In: Proceedings of ACM-SIGMOD Int. Conf. Management of Data (SIG-MOD 1998), pp. 73–84 (1998)

    Google Scholar 

  10. Karypis, G., Han, E.H., Kumar, V.: CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer 32, 68–75 (1999)

    Google Scholar 

  11. Olson, C.F.: Parallel algorithms for hierarchical clustering. Parallel Comput. 21, 1313–1325 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  12. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient data clustering method for very large databases. In: Proceedings of ACM-SIGMOD Int. Conf. Management of Data (SIG-MOD 1996), pp. 103–114 (1996)

    Google Scholar 

  13. Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning 55, 311–331 (2004)

    Article  MATH  Google Scholar 

  14. Xu, S., Zhang, J.: A parallel hybrid web document clustering algorithm and its performance study. Journal of Supercomputing 30, 117–131 (2004)

    Article  MATH  Google Scholar 

  15. Cutting, D.R., Pedersen, J.O., Karger, D.R., et al.: Scatter/gather: A cluster-based approach to browsing large document collections. In: Proceedings of the ACM SIGIR Copenhagen, pp. 318–329 (1992)

    Google Scholar 

  16. Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the Fifth ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining, pp. 16–22 (1999)

    Google Scholar 

  17. Aggarwal, C.C., Gates, S.C., Yu, P.S.: On the merits of building categorization systems by supervised clustering. In: Proceedings of the Fifth ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining, pp. 352–356 (1999)

    Google Scholar 

  18. Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD 2000, Technical Report, University of Minnesota (2000)

    Google Scholar 

  19. Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Knowledge Discovery and Data Mining, pp. 269–274 (2001)

    Google Scholar 

  20. McQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)

    Google Scholar 

  21. Anderberg, M.R.: Cluster analysis for applications. Academic Press Inc., New York (1973)

    MATH  Google Scholar 

  22. Stumme, G., Hotho, A., Berendt, B.: Semantic web mining. In: Proceedings of 12th Europ. Conf. on Machine Learning (ECML2001)/5th Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases, PKDD 2001 (2001)

    Google Scholar 

  23. Stumme, G., Hotho, A., Berendt, B.: Semantic web mining state of the art and future directions. Journal of Web Semantics: Science, Services and Agents on the World Wide Web 4, 124–143 (2006)

    Article  Google Scholar 

  24. Beil, F., Ester, M., Xu, X.: Frequen term-based text clustering. In: Proceedings of 8th Int. Conf. on Knowledge Discovery and Data Mining (KDD 2002), Edmonton, Alberta, Canada (2002)

    Google Scholar 

  25. Raghavan, V.V., Birchand, K.: A clustering strategy based on a formalism of the reproductive process in a natural system. In: Proceedings of the Second International Conference on Information Storage and Retrieval, pp. 10–22 (1979)

    Google Scholar 

  26. Cui, X., Potok, T.E., Palathingal, P.: Document clustering using particle swarm optimization. In: Proceedings of the IEEE swarm intelligence symposium, pp. 185–191.

    Google Scholar 

  27. Labroche, N., Monmarche, N., Venturini, G.: AntClust: ant clustering and web usage mining. In: Proceedings of Genetic and Evolutionary Computation Conference, pp. 25–36 (2003)

    Google Scholar 

  28. Kennedy, J., Eberhart, R.C., Shi, Y.: Swarm intelligence. Morgan Kaufmann, New York (2001)

    Google Scholar 

  29. Omran, M., Salman, A., Engelbrecht, A.P.: Image classification using particle swarm optimization. In: Proceedings of the 4th Asia-Pacific Conference on Simulated Evolution and Learning (SEAL 2002), pp. 370–374 (2002)

    Google Scholar 

  30. Merwe, V.D., Engelbrecht, A.P.: Data clustering using particle swarm optimization. In: Proceedings of IEEE Congress on Evolutionary Computation (CEC 2003), pp. 215–220 (2003)

    Google Scholar 

  31. Cui, X., Potok, T.E.: Document clustering analysis based on hybrid PSO+K-means algorithm. Journal of Computer Sciences 4, 27–33 (2005)

    Google Scholar 

  32. Everitt, B.: Cluster analysis, 2nd edn. Halsted Press, New York (1980)

    MATH  Google Scholar 

  33. Salton, G.: Automatic text processing. The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)

    Google Scholar 

  34. Cios, K., Pedrycs, W., Swiniarski, R.: Data mining methods for knowledge discovery. Kluwer Academic Publishers, Dordrecht (1998)

    MATH  Google Scholar 

  35. Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24, 513–523 (1988)

    Article  Google Scholar 

  36. Jain, A.K., Richard, C.D.: Algorithm for clustering in data. Prentice Hall, Englewood Cliffs (1990)

    Google Scholar 

  37. Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document Clustering. Machine Learning 55, 311–331 (2004)

    Article  MATH  Google Scholar 

  38. Mahdavi, M., Chehreghani, M.H., Abolhassani, H., et al.: Novel meta-heuristic algorithms for clustering web documents. Computer Methods in Applied Mechanics and Engineering 201, 441–451 (2008)

    MATH  MathSciNet  Google Scholar 

  39. Mahdavi, M., Abolhassani, H.: Harmony K-means algorithm for document clustering. Data Mining and Knowledge Discovery 18, 370–391 (2009)

    Article  Google Scholar 

  40. Forsati, R., Meybodi, M.R., Mahdavi, M., et al.: Hybridization of K-means and harmony search methods for web page clustering. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pp. 329–335 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Forsati, R., Mahdavi, M. (2010). Web Text Mining Using Harmony Search. In: Geem, Z.W. (eds) Recent Advances In Harmony Search Algorithm. Studies in Computational Intelligence, vol 270. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04317-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04317-8_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04316-1

  • Online ISBN: 978-3-642-04317-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics