Web Text Mining Using Harmony Search

Forsati, Rana; Mahdavi, Mehrdad

doi:10.1007/978-3-642-04317-8_5

Rana Forsati³ &
Mehrdad Mahdavi⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 270))

1214 Accesses
7 Citations

Abstract

The Harmony Search (HS) algorithm in recent years has been applied in many applications in computer science and engineering. This chapter is intended to review the application of the HS method in the area of web document clustering. Clustering is a problem of great practical importance that has been the focus of substantial research in several domains for decades. It is defined as the problem of partitioning data objects into groups, such that objects in the same group are similar, while objects in different groups are dissimilar. Due to the high-dimension and sparseness properties of documents the problem of clustering becomes more challenging when we apply it on web documents. Two algorithms in literature were proposed for clustering web documents with HS which will be reviewed in this chapter. Also three hybridization of HS based clustering with K-means algorithm will be reviewed. It will be shown that the HS method can outperform other methods in terms of solution quality and computational time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Húsek, D., Pokorný, J., Řezanková, H., et al.: Web data clustering. In: Foundations of Computational Intelligence, vol. 4. Springer, Berlin (2009)
Google Scholar
Rijsbergen, V.: Information retrieval. Buttersworth, London (1979)
Google Scholar
Aslam, J., Pelekhov, K., Rus, D.: Using star clusters for filtering. In: Proceedings of the Ninth International Conference on Information and Knowledge Management, USA (2000)
Google Scholar
Zhong, S., Ghosh, J.: A comparative study of generative models for document clustering. In: Proceedings of SDM Workshop on Clustering High Dimensional Data and Its Applications (2003)
Google Scholar
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A Review. ACM Computing Surveys 31, 264–323 (1999)
Article Google Scholar
Grira, N., Crucianu, M., Boujemaa, N.: Unsupervised and semi-supervised clustering: a brief survey. In: Proceedings of 7th ACM SIGMM International Workshop on Multimedia Information Retrieval, pp. 9–16 (2005)
Google Scholar
Zhong, S., Ghosh, J.: Generative model-based clustering of documents: a comparative study. Knowledge and Information Systems (KAIS) 8, 374–384 (2005)
Article Google Scholar
Zhong, S.: Semi-supervised model-based document clustering: A Comparative Study. Machine Learning 65, 3–29 (2006)
Article Google Scholar
Guha, S., Rastogi, R., Shim, K.: An efficient clustering algorithm for large databases. In: Proceedings of ACM-SIGMOD Int. Conf. Management of Data (SIG-MOD 1998), pp. 73–84 (1998)
Google Scholar
Karypis, G., Han, E.H., Kumar, V.: CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer 32, 68–75 (1999)
Google Scholar
Olson, C.F.: Parallel algorithms for hierarchical clustering. Parallel Comput. 21, 1313–1325 (1995)
Article MATH MathSciNet Google Scholar
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient data clustering method for very large databases. In: Proceedings of ACM-SIGMOD Int. Conf. Management of Data (SIG-MOD 1996), pp. 103–114 (1996)
Google Scholar
Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document clustering. Machine Learning 55, 311–331 (2004)
Article MATH Google Scholar
Xu, S., Zhang, J.: A parallel hybrid web document clustering algorithm and its performance study. Journal of Supercomputing 30, 117–131 (2004)
Article MATH Google Scholar
Cutting, D.R., Pedersen, J.O., Karger, D.R., et al.: Scatter/gather: A cluster-based approach to browsing large document collections. In: Proceedings of the ACM SIGIR Copenhagen, pp. 318–329 (1992)
Google Scholar
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of the Fifth ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining, pp. 16–22 (1999)
Google Scholar
Aggarwal, C.C., Gates, S.C., Yu, P.S.: On the merits of building categorization systems by supervised clustering. In: Proceedings of the Fifth ACM SIGKDD Int’l Conference on Knowledge Discovery and Data Mining, pp. 352–356 (1999)
Google Scholar
Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques. In: KDD 2000, Technical Report, University of Minnesota (2000)
Google Scholar
Dhillon, I.S.: Co-clustering documents and words using bipartite spectral graph partitioning. In: Knowledge Discovery and Data Mining, pp. 269–274 (2001)
Google Scholar
McQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Google Scholar
Anderberg, M.R.: Cluster analysis for applications. Academic Press Inc., New York (1973)
MATH Google Scholar
Stumme, G., Hotho, A., Berendt, B.: Semantic web mining. In: Proceedings of 12th Europ. Conf. on Machine Learning (ECML2001)/5th Europ. Conf. on Principles and Practice of Knowledge Discovery in Databases, PKDD 2001 (2001)
Google Scholar
Stumme, G., Hotho, A., Berendt, B.: Semantic web mining state of the art and future directions. Journal of Web Semantics: Science, Services and Agents on the World Wide Web 4, 124–143 (2006)
Article Google Scholar
Beil, F., Ester, M., Xu, X.: Frequen term-based text clustering. In: Proceedings of 8th Int. Conf. on Knowledge Discovery and Data Mining (KDD 2002), Edmonton, Alberta, Canada (2002)
Google Scholar
Raghavan, V.V., Birchand, K.: A clustering strategy based on a formalism of the reproductive process in a natural system. In: Proceedings of the Second International Conference on Information Storage and Retrieval, pp. 10–22 (1979)
Google Scholar
Cui, X., Potok, T.E., Palathingal, P.: Document clustering using particle swarm optimization. In: Proceedings of the IEEE swarm intelligence symposium, pp. 185–191.
Google Scholar
Labroche, N., Monmarche, N., Venturini, G.: AntClust: ant clustering and web usage mining. In: Proceedings of Genetic and Evolutionary Computation Conference, pp. 25–36 (2003)
Google Scholar
Kennedy, J., Eberhart, R.C., Shi, Y.: Swarm intelligence. Morgan Kaufmann, New York (2001)
Google Scholar
Omran, M., Salman, A., Engelbrecht, A.P.: Image classification using particle swarm optimization. In: Proceedings of the 4th Asia-Pacific Conference on Simulated Evolution and Learning (SEAL 2002), pp. 370–374 (2002)
Google Scholar
Merwe, V.D., Engelbrecht, A.P.: Data clustering using particle swarm optimization. In: Proceedings of IEEE Congress on Evolutionary Computation (CEC 2003), pp. 215–220 (2003)
Google Scholar
Cui, X., Potok, T.E.: Document clustering analysis based on hybrid PSO+K-means algorithm. Journal of Computer Sciences 4, 27–33 (2005)
Google Scholar
Everitt, B.: Cluster analysis, 2nd edn. Halsted Press, New York (1980)
MATH Google Scholar
Salton, G.: Automatic text processing. The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading (1989)
Google Scholar
Cios, K., Pedrycs, W., Swiniarski, R.: Data mining methods for knowledge discovery. Kluwer Academic Publishers, Dordrecht (1998)
MATH Google Scholar
Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Information Processing and Management 24, 513–523 (1988)
Article Google Scholar
Jain, A.K., Richard, C.D.: Algorithm for clustering in data. Prentice Hall, Englewood Cliffs (1990)
Google Scholar
Zhao, Y., Karypis, G.: Empirical and theoretical comparisons of selected criterion functions for document Clustering. Machine Learning 55, 311–331 (2004)
Article MATH Google Scholar
Mahdavi, M., Chehreghani, M.H., Abolhassani, H., et al.: Novel meta-heuristic algorithms for clustering web documents. Computer Methods in Applied Mechanics and Engineering 201, 441–451 (2008)
MATH MathSciNet Google Scholar
Mahdavi, M., Abolhassani, H.: Harmony K-means algorithm for document clustering. Data Mining and Knowledge Discovery 18, 370–391 (2009)
Article Google Scholar
Forsati, R., Meybodi, M.R., Mahdavi, M., et al.: Hybridization of K-means and harmony search methods for web page clustering. In: Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, pp. 329–335 (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, Shahid Beheshti University, G. C. Tehran, Iran
Rana Forsati
Department of Computer Engineering, Sharif University of Technology, Tehran, Iran
Mehrdad Mahdavi

Authors

Rana Forsati
View author publications
You can also search for this author in PubMed Google Scholar
Mehrdad Mahdavi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

iGlobal University, 7700 Little River Tpke. #600, Annandale, 22003, Virginia, USA
Zong Woo Geem

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Forsati, R., Mahdavi, M. (2010). Web Text Mining Using Harmony Search. In: Geem, Z.W. (eds) Recent Advances In Harmony Search Algorithm. Studies in Computational Intelligence, vol 270. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04317-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-642-04317-8_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04316-1
Online ISBN: 978-3-642-04317-8
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics