Skip to main content

A New Algorithm for Incremental Web Page Clustering Based on k-Means and Ant Colony Optimization

  • Conference paper
Book cover Recent Advances on Soft Computing and Data Mining

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 287))

  • 1530 Accesses

Abstract

Internet serves as source of information. Clustering web pages is needed to identify topics in a page. But dynamism is one of the web clustering challenges, because the web pages change very frequently and new pages are always added and removed. Processing a new page should not require to repeat the whole clustering. For these reasons, incremental algorithms are an appropriate alternative for web page clustering

In this paper we propose a new hybrid technique we call Incremental K Ant Colony Clustering (IKACC). It is based on the Ant Colony Optimization and the k-means algorithms. We adapt this approach to classify the new pages in the online manner, and we compare it to incremental k-means algorithm. The results show that this approach is more efficient and produces better results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proc. Fifth Berkeley Symp. on Math. Statist. and Prob., vol. 1, pp. 281–297. Univ. of Calif. Press (1967)

    Google Scholar 

  2. Saatchi, S., Hung, C.-C.: Hybridization of the ant colony optimization with the K-means algorithm for clustering. In: Kalviainen, H., Parkkinen, J., Kaarna, A. (eds.) SCIA 2005. LNCS, vol. 3540, pp. 511–520. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  3. Wong, W.C., Fu, A.W.C.: Incremental Document Clustering for Web Page Classification. In: IEEE Int. Conference on Society in the 21st Century: Emerging Technologies and New Challenges (IS 2000), Japan (2000)

    Google Scholar 

  4. Gavin, S., Yue, X.: Enhancing an incremental clustering algorithm for web page collections. In: 2009 IEEEWICACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, vol. 3, pp. 81–84 (2009)

    Google Scholar 

  5. Liu, B., Pan, J., McKay, R.I.B.: Entropy-based metrics in swarm clustering. International Journal of Intelligent Systems 24, 989–1011 (2009)

    Article  MATH  Google Scholar 

  6. Deneubourg, J.L., Goss, S., Franks, N., Sendova-Franks, A., Detrain, C., Chretien, L.: The dynamics of collective sorting robot-like ants and ant-like robots. In: Proceedings of the First International Conference on Simulation of Adaptive Behavior on from Animals to Animats (1990)

    Google Scholar 

  7. Monmarche, N., Slimane, M., Venturini, G.: On Improving Clustering in Numerical Databases With Artificial Ants. In: Floreano, D., Mondada, F. (eds.) ECAL 1999. LNCS, vol. 1674, pp. 626–635. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  8. Kao, Y., Lee, S.Y.: Combining k-means and particle swarm optimization for dynamic data clustering problems. In: IEEE International Conference on Intelligent Computing and Intelligent Systems, ICIS 2009, vol. 1, pp. 757–761 (2009)

    Google Scholar 

  9. Kuo, R.J., Wang, M.J., Huang, T.W.: An application of particle swarm optimization algorithm to clustering analysis. Soft Computing 15, 533–542 (2009)

    Article  Google Scholar 

  10. Shu-Chuan Chu, J.F.R.: A clustering algorithm using tabu search approach with simulated annealing for vector quantization. Chinese Journal of Electronics 12, 349–353 (2003)

    Google Scholar 

  11. Shang, G., Zaiyue, Z., Xiaoru, Z., Cungen, C.: A new hybrid ant colony algorithm for clustering problem. In: International Workshop on Education Technology and Training and 2008 International Workshop on Geoscience and Remote Sensing, ETT and GRS 2008, vol. 1, pp. 645–648 (2008)

    Google Scholar 

  12. Kao, Y.T., Zahara, E., Kao, I.W.: A hybridized approach to data clustering. Expert Systems with Applications 34, 1754–1762 (2008)

    Article  Google Scholar 

  13. Youssef, S.M.: A new hybrid evolutionary-based data clustering using fuzzy particle swarm optimization. In: 23rd IEEE International Conference on Tools with Artificial Intelligence 1082-3409/11 (2011)

    Google Scholar 

  14. Wang, C., Lu, J., Zhang, G.: Mining key information of web pages: A method and its application. Expert Systems with Applications 33, 425–433 (2007)

    Article  MathSciNet  Google Scholar 

  15. Linde, Y., Buzo, A.G.R.: An algorithm for vector quantizer design. IEEE Transactions on Communications 28, 84–95 (1980)

    Article  Google Scholar 

  16. Chakraborty, S., Nagwani, N.K.: Analysis and study of incremental K-means clustering algorithm. In: Mantri, A., Nandi, S., Kumar, G., Kumar, S. (eds.) HPAGC 2011. CCIS, vol. 169, pp. 338–341. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  17. Sinkaa, M., Corneb, D.W.: The banksearch web document dataset: investigating unsupervised clustering and category similarity. Journal of Network and Computer Applications 28, 129–146 (2004)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yasmina Boughachiche .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Boughachiche, Y., Kamel, N. (2014). A New Algorithm for Incremental Web Page Clustering Based on k-Means and Ant Colony Optimization. In: Herawan, T., Ghazali, R., Deris, M. (eds) Recent Advances on Soft Computing and Data Mining. Advances in Intelligent Systems and Computing, vol 287. Springer, Cham. https://doi.org/10.1007/978-3-319-07692-8_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07692-8_33

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07691-1

  • Online ISBN: 978-3-319-07692-8

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics