Abstract
The Harmony Search (HS) algorithm in recent years has been applied in many applications in computer science and engineering. This chapter is intended to review the application of the HS method in the area of web document clustering. Clustering is a problem of great practical importance that has been the focus of substantial research in several domains for decades. It is defined as the problem of partitioning data objects into groups, such that objects in the same group are similar, while objects in different groups are dissimilar. Due to the high-dimension and sparseness properties of documents the problem of clustering becomes more challenging when we apply it on web documents. Two algorithms in literature were proposed for clustering web documents with HS which will be reviewed in this chapter. Also three hybridization of HS based clustering with K-means algorithm will be reviewed. It will be shown that the HS method can outperform other methods in terms of solution quality and computational time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Húsek, D., Pokorný, J., Řezanková, H., et al.: Web data clustering. In: Foundations of Computational Intelligence, vol. 4. Springer, Berlin (2009)
Rijsbergen, V.: Information retrieval. Buttersworth, London (1979)
Aslam, J., Pelekhov, K., Rus, D.: Using star clusters for filtering. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, USA (2000)
Zhong, S., Ghosh, J.: A comparative study of generative models for document clustering. In: Proceedings of SDM Workshop on Clustering High Dimensional Data and Its Applications (2003)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A Review. ACM Computing Surveys 31, 264–323 (1999)
Grira, N., Crucianu, M., Boujemaa, N.: Unsupervised and semi-supervised clustering: a brief survey. In: Proceedings of 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, pp. 9–16 (2005)
Zhong, S., Ghosh, J.: Generative model-based clustering of documents: a comparative study. Knowledge and Information Systems (KAIS) 8, 374–384 (2005)
Zhong, S.: Semi-supervised model-based document clustering: A Comparative Study. Machine Learning 65, 3–29 (2006)
Guha, S., Rastogi, R., Shim, K.: An efficient clustering algorithm for large databases. In: Proceedings of ACM-SIGMOD Int. Conf. Management of Data (SIG-MOD 1998), pp. 73–84 (1998)
Karypis, G., Han, E.H., Kumar, V.: CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer 32, 68–75 (1999)
Olson, C.F.: Parallel algorithms for hierarchical clustering. Parallel Comput. 21, 1313–1325 (1995)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient data clustering method for very large databases. In: Proceedings of ACM-SIGMOD Int. Conf. Management of Data (SIG-MOD 1996), pp. 103–114 (1996)
Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning 55, 311–331 (2004)
Xu, S., Zhang, J.: A parallel hybrid web document clustering algorithm and its performance study. Journal of Supercomputing 30, 117–131 (2004)
Cutting, D.R., Pedersen, J.O., Karger, D.R., et al.: Scatter/gather: A cluster-based approach to browsing large document collections. In: Proceedings of the ACM SIGIR Copenhagen, pp. 318–329 (1992)
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the Fifth ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining, pp. 16–22 (1999)
Aggarwal, C.C., Gates, S.C., Yu, P.S.: On the merits of building categorization systems by supervised clustering. In: Proceedings of the Fifth ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining, pp. 352–356 (1999)
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD 2000, Technical Report, University of Minnesota (2000)
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Knowledge Discovery and Data Mining, pp. 269–274 (2001)
McQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Anderberg, M.R.: Cluster analysis for applications. Academic Press Inc., New York (1973)
Stumme, G., Hotho, A., Berendt, B.: Semantic web mining. In: Proceedings of 12th Europ. Conf. on Machine Learning (ECML2001)/5th Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases, PKDD 2001 (2001)
Stumme, G., Hotho, A., Berendt, B.: Semantic web mining state of the art and future directions. Journal of Web Semantics: Science, Services and Agents on the World Wide Web 4, 124–143 (2006)
Beil, F., Ester, M., Xu, X.: Frequen term-based text clustering. In: Proceedings of 8th Int. Conf. on Knowledge Discovery and Data Mining (KDD 2002), Edmonton, Alberta, Canada (2002)
Raghavan, V.V., Birchand, K.: A clustering strategy based on a formalism of the reproductive process in a natural system. In: Proceedings of the Second International Conference on Information Storage and Retrieval, pp. 10–22 (1979)
Cui, X., Potok, T.E., Palathingal, P.: Document clustering using particle swarm optimization. In: Proceedings of the IEEE swarm intelligence symposium, pp. 185–191.
Labroche, N., Monmarche, N., Venturini, G.: AntClust: ant clustering and web usage mining. In: Proceedings of Genetic and Evolutionary Computation Conference, pp. 25–36 (2003)
Kennedy, J., Eberhart, R.C., Shi, Y.: Swarm intelligence. Morgan Kaufmann, New York (2001)
Omran, M., Salman, A., Engelbrecht, A.P.: Image classification using particle swarm optimization. In: Proceedings of the 4th Asia-Pacific Conference on Simulated Evolution and Learning (SEAL 2002), pp. 370–374 (2002)
Merwe, V.D., Engelbrecht, A.P.: Data clustering using particle swarm optimization. In: Proceedings of IEEE Congress on Evolutionary Computation (CEC 2003), pp. 215–220 (2003)
Cui, X., Potok, T.E.: Document clustering analysis based on hybrid PSO+K-means algorithm. Journal of Computer Sciences 4, 27–33 (2005)
Everitt, B.: Cluster analysis, 2nd edn. Halsted Press, New York (1980)
Salton, G.: Automatic text processing. The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)
Cios, K., Pedrycs, W., Swiniarski, R.: Data mining methods for knowledge discovery. Kluwer Academic Publishers, Dordrecht (1998)
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24, 513–523 (1988)
Jain, A.K., Richard, C.D.: Algorithm for clustering in data. Prentice Hall, Englewood Cliffs (1990)
Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document Clustering. Machine Learning 55, 311–331 (2004)
Mahdavi, M., Chehreghani, M.H., Abolhassani, H., et al.: Novel meta-heuristic algorithms for clustering web documents. Computer Methods in Applied Mechanics and Engineering 201, 441–451 (2008)
Mahdavi, M., Abolhassani, H.: Harmony K-means algorithm for document clustering. Data Mining and Knowledge Discovery 18, 370–391 (2009)
Forsati, R., Meybodi, M.R., Mahdavi, M., et al.: Hybridization of K-means and harmony search methods for web page clustering. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pp. 329–335 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Forsati, R., Mahdavi, M. (2010). Web Text Mining Using Harmony Search. In: Geem, Z.W. (eds) Recent Advances In Harmony Search Algorithm. Studies in Computational Intelligence, vol 270. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04317-8_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-04317-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04316-1
Online ISBN: 978-3-642-04317-8
eBook Packages: EngineeringEngineering (R0)