Skip to main content

On-Line Single-Pass Clustering Based on Diffusion Maps

  • Conference paper
Book cover Natural Language Processing and Information Systems (NLDB 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4592))

Abstract

Due to recent advances in technology, online clustering has emerged as a challenging and interesting problem, with applications such as peer-to-peer information retrieval, and topic detection and tracking. Single-pass clustering is particularly one of the popular methods used in this field. While significant work has been done on to perform this clustering algorithm, it has not been studied in a reduced dimension space, typically in online processing scenarios. In this paper, we discuss previous work focusing on single-pass improvement, and then present a new single-pass clustering algorithm, called OSPDM (On-line Single-Pass clustering based on Diffusion Map), based on mapping the data into low-dimensional feature space.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allan, J., Papka, R., Lavrenko, V.: On-Line New Event Detection and Tracking. 21st ACM SIGIR conf., pp. 37–45 (1998)

    Google Scholar 

  2. Belkin, N., Croft, W.: Retrieval techniques. ARIST, vol. 22, Ch. 4, pp. 109–145 (1987)

    Google Scholar 

  3. Berry, M.W., Dumais, S.T., O’Brien, G.W.: Using Linear Algebra for Intelligent Information Retrieval. SIAM Review 37(4), 573–595 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  4. Brown, R.D.: Dynamic Stopwording for Story Link Detection. In: 2nd HLT conf., pp. 190–193 (2002)

    Google Scholar 

  5. Chen, C.C., Chen, Y.T. Chen, M.C.: An Aging Theory for Event Life Cycle Modeling. IEEE-SMC Tran. Part A (to Appear)

    Google Scholar 

  6. Chen, C.C., Chen, Y.T., Sun, Y., Chen, M.C.: Life Cycle Modeling of News Events Using Aging Theory. In: 14th Machine Learning European Conf., pp. 47–59 (2003)

    Google Scholar 

  7. Chen, F.R., Farahat, A.O., Brants, T.: Story Link Detection and New Event Detection are Asymmetric. HLT–NAACL Conf. (2003)

    Google Scholar 

  8. Chen, F.R., Farahat, A.O., Brants, T.: Multiple Similarity Measures and Source-Pair Information in Story Link Detection. HLT–NAACL Conf., pp. 313–320 (2004)

    Google Scholar 

  9. Coifman, R.R., Lafon, S., Lee, A.B., Maggioni, M., Nadler, B., Warner, F., Zucker, S.: Geometric Diffusions as a Tool for Harmonics Analysis and Structure Definition of Data: Diffusion Maps. Proceedings of the National Academy of Sciences 102(21), 7426–7431 (2005)

    Article  Google Scholar 

  10. Coifman, R.R., Lafon, S.: Diffusion Maps. Appl. Comput. Harmon. Anal. 21(1), 6–30 (2006)

    Google Scholar 

  11. Coifman, R.R., Lafon, S.: Geometric Harmonics: A Novel Tool for Multiscale Out-of-Sample Extension of Empirical Functions. Appl. Comput. Harmon. Anal. 21(1), 31–52 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  12. Dhillon, I.S., Modha, D.S.: Concept Decompositions for Large Sparse Text Data using Clustering. Machine Learning 42(1-2), 143–175 (2001)

    Article  MATH  Google Scholar 

  13. Golub, G., Reinsch, C.: Handbook for Automatic Computation II: Linear Algebra. Springer, Heidelberg (1971)

    Google Scholar 

  14. Hammouda, K.M., Kamel, M.S.: Incremental Document Clustering Using Cluster Similarity Histograms. In: IEEE-WI Conf, pp. 597–601 (2003)

    Google Scholar 

  15. Hatzivassiloglou, V., Gravano, L., Maganti, A.: An Investigation of Linguistic Features and Clustering Algorithms for Topical Document Clustering. In: 23rd ACM SIGIR Conf., pp. 224–231(2000)

    Google Scholar 

  16. Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)

    MATH  Google Scholar 

  17. Klampanos, I.A., Jose, J.M.: An Architecture for Information Retrieval over Semi-Collaborating Peer-to-Peer Networks. ACM Symp. 2, 1078–1083 (2004)

    Google Scholar 

  18. Klampanos, I. A., Jose, J.M., Rijsbergen, C.J.K.: Single-Pass Clustering for Peer-to-Peer Information Retrieval: The Effect of Document Ordering. In: 1st INFOSCALE Conf. Art, vol. 36 (2006)

    Google Scholar 

  19. Krishnamurthy, B., Wang, J., Xie, Y.: Early Measurements of a Cluster-Based Architecture for P2P Systems. In: ACM SIGCOMM, pp. 105–109 (2001)

    Google Scholar 

  20. Lafon, S., Lee, A.B.: Diffusion Maps and Coarse-Graining: A Unified Framework for Dimensionality Reduction, Graph Partitioning, and Data Set Parameterization. IEEE-TPAMI Tran. 28(9), 1393–1403 (2006)

    Google Scholar 

  21. Leuski, A., Allan, J.: Interactive Information Retrieval Using Clustering and Spatial Proximity. UMUAI 14(2), 259–288 (2004)

    Google Scholar 

  22. Lerman, K.: Document Clustering in Reduced Dimension Vector Space. Unpublished Manuscript (1999), http://www.isi.edu/~lerman/papers/papers.html

  23. Makkonen, J., Ahonen-Myka, H., Salmenkivi, M.: Topic Detection and Tracking with Spatio-Temporal Evidence. In: 25th ECIR, pp. 251–265 (2003)

    Google Scholar 

  24. Munkres, J.: Algorithms for the Assignment and Transportation Problems. JSTOR 5(1), 32–38 (1957)

    MATH  MathSciNet  Google Scholar 

  25. O’Brien, G.W.: Information Management Tools for Updating an SVD Encoded Indexing Scheme. Master’s Thesis, Knoxville University (1994)

    Google Scholar 

  26. Papka, R., Allan, J.: On-line New Event Detection using Single-Pass Clustering. UMASS Computer Science Technical Report, pp. 98–21 (1998)

    Google Scholar 

  27. Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw Hill Publishing Company, New York (1983)

    MATH  Google Scholar 

  28. Vaidya, U., Hagen, G., Lafon, S., Banaszuk, A., Mezic, I., Coifman, R.R.: Comparison of Systems using Diffusion Maps. In: 44th IEEE CDC-ECC, pp. 7931–7936 (2005)

    Google Scholar 

  29. Wong, W., Fu, A.: Incremental Document Clustering for Web Page Classification. Int. IS Conf. (2000)

    Google Scholar 

  30. Yang, Y., Pierce, T., Carbonell, J.: A Study on Retrospective and On-Line Event Detection. In: 21st ACM SIGIR Conf., pp. 28–36 (1998)

    Google Scholar 

  31. Zamir, O., Etzioni, O.: Web Document Dlustering: A Feasibility Demonstration. In: 21st ACM SIGIR Conf., pp. 46–54 (1998)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Zoubida Kedad Nadira Lammari Elisabeth Métais Farid Meziane Yacine Rezgui

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ataa Allah, F., Grosky, W.I., Aboutajdine, D. (2007). On-Line Single-Pass Clustering Based on Diffusion Maps. In: Kedad, Z., Lammari, N., Métais, E., Meziane, F., Rezgui, Y. (eds) Natural Language Processing and Information Systems. NLDB 2007. Lecture Notes in Computer Science, vol 4592. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73351-5_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73351-5_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73350-8

  • Online ISBN: 978-3-540-73351-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics