Abstract
Due to recent advances in technology, online clustering has emerged as a challenging and interesting problem, with applications such as peer-to-peer information retrieval, and topic detection and tracking. Single-pass clustering is particularly one of the popular methods used in this field. While significant work has been done on to perform this clustering algorithm, it has not been studied in a reduced dimension space, typically in online processing scenarios. In this paper, we discuss previous work focusing on single-pass improvement, and then present a new single-pass clustering algorithm, called OSPDM (On-line Single-Pass clustering based on Diffusion Map), based on mapping the data into low-dimensional feature space.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allan, J., Papka, R., Lavrenko, V.: On-Line New Event Detection and Tracking. 21st ACM SIGIR conf., pp. 37–45 (1998)
Belkin, N., Croft, W.: Retrieval techniques. ARIST, vol. 22, Ch. 4, pp. 109–145 (1987)
Berry, M.W., Dumais, S.T., O’Brien, G.W.: Using Linear Algebra for Intelligent Information Retrieval. SIAM Review 37(4), 573–595 (1995)
Brown, R.D.: Dynamic Stopwording for Story Link Detection. In: 2nd HLT conf., pp. 190–193 (2002)
Chen, C.C., Chen, Y.T. Chen, M.C.: An Aging Theory for Event Life Cycle Modeling. IEEE-SMC Tran. Part A (to Appear)
Chen, C.C., Chen, Y.T., Sun, Y., Chen, M.C.: Life Cycle Modeling of News Events Using Aging Theory. In: 14th Machine Learning European Conf., pp. 47–59 (2003)
Chen, F.R., Farahat, A.O., Brants, T.: Story Link Detection and New Event Detection are Asymmetric. HLT–NAACL Conf. (2003)
Chen, F.R., Farahat, A.O., Brants, T.: Multiple Similarity Measures and Source-Pair Information in Story Link Detection. HLT–NAACL Conf., pp. 313–320 (2004)
Coifman, R.R., Lafon, S., Lee, A.B., Maggioni, M., Nadler, B., Warner, F., Zucker, S.: Geometric Diffusions as a Tool for Harmonics Analysis and Structure Definition of Data: Diffusion Maps. Proceedings of the National Academy of Sciences 102(21), 7426–7431 (2005)
Coifman, R.R., Lafon, S.: Diffusion Maps. Appl. Comput. Harmon. Anal. 21(1), 6–30 (2006)
Coifman, R.R., Lafon, S.: Geometric Harmonics: A Novel Tool for Multiscale Out-of-Sample Extension of Empirical Functions. Appl. Comput. Harmon. Anal. 21(1), 31–52 (2006)
Dhillon, I.S., Modha, D.S.: Concept Decompositions for Large Sparse Text Data using Clustering. Machine Learning 42(1-2), 143–175 (2001)
Golub, G., Reinsch, C.: Handbook for Automatic Computation II: Linear Algebra. Springer, Heidelberg (1971)
Hammouda, K.M., Kamel, M.S.: Incremental Document Clustering Using Cluster Similarity Histograms. In: IEEE-WI Conf, pp. 597–601 (2003)
Hatzivassiloglou, V., Gravano, L., Maganti, A.: An Investigation of Linguistic Features and Clustering Algorithms for Topical Document Clustering. In: 23rd ACM SIGIR Conf., pp. 224–231(2000)
Jain, A., Dubes, R.: Algorithms for Clustering Data. Prentice-Hall, Englewood Cliffs (1988)
Klampanos, I.A., Jose, J.M.: An Architecture for Information Retrieval over Semi-Collaborating Peer-to-Peer Networks. ACM Symp. 2, 1078–1083 (2004)
Klampanos, I. A., Jose, J.M., Rijsbergen, C.J.K.: Single-Pass Clustering for Peer-to-Peer Information Retrieval: The Effect of Document Ordering. In: 1st INFOSCALE Conf. Art, vol. 36 (2006)
Krishnamurthy, B., Wang, J., Xie, Y.: Early Measurements of a Cluster-Based Architecture for P2P Systems. In: ACM SIGCOMM, pp. 105–109 (2001)
Lafon, S., Lee, A.B.: Diffusion Maps and Coarse-Graining: A Unified Framework for Dimensionality Reduction, Graph Partitioning, and Data Set Parameterization. IEEE-TPAMI Tran. 28(9), 1393–1403 (2006)
Leuski, A., Allan, J.: Interactive Information Retrieval Using Clustering and Spatial Proximity. UMUAI 14(2), 259–288 (2004)
Lerman, K.: Document Clustering in Reduced Dimension Vector Space. Unpublished Manuscript (1999), http://www.isi.edu/~lerman/papers/papers.html
Makkonen, J., Ahonen-Myka, H., Salmenkivi, M.: Topic Detection and Tracking with Spatio-Temporal Evidence. In: 25th ECIR, pp. 251–265 (2003)
Munkres, J.: Algorithms for the Assignment and Transportation Problems. JSTOR 5(1), 32–38 (1957)
O’Brien, G.W.: Information Management Tools for Updating an SVD Encoded Indexing Scheme. Master’s Thesis, Knoxville University (1994)
Papka, R., Allan, J.: On-line New Event Detection using Single-Pass Clustering. UMASS Computer Science Technical Report, pp. 98–21 (1998)
Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw Hill Publishing Company, New York (1983)
Vaidya, U., Hagen, G., Lafon, S., Banaszuk, A., Mezic, I., Coifman, R.R.: Comparison of Systems using Diffusion Maps. In: 44th IEEE CDC-ECC, pp. 7931–7936 (2005)
Wong, W., Fu, A.: Incremental Document Clustering for Web Page Classification. Int. IS Conf. (2000)
Yang, Y., Pierce, T., Carbonell, J.: A Study on Retrospective and On-Line Event Detection. In: 21st ACM SIGIR Conf., pp. 28–36 (1998)
Zamir, O., Etzioni, O.: Web Document Dlustering: A Feasibility Demonstration. In: 21st ACM SIGIR Conf., pp. 46–54 (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ataa Allah, F., Grosky, W.I., Aboutajdine, D. (2007). On-Line Single-Pass Clustering Based on Diffusion Maps. In: Kedad, Z., Lammari, N., Métais, E., Meziane, F., Rezgui, Y. (eds) Natural Language Processing and Information Systems. NLDB 2007. Lecture Notes in Computer Science, vol 4592. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73351-5_10
Download citation
DOI: https://doi.org/10.1007/978-3-540-73351-5_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73350-8
Online ISBN: 978-3-540-73351-5
eBook Packages: Computer ScienceComputer Science (R0)