Abstract
In this paper, we present a three tier clustering method where data objects are described by a number of feature dimensions. Using the approach, similarity along each feature dimension of objects are first computed. The inter-objects similarity are then computed from inter-feature-dimension similarity using a Bayesian multi-causal model. Objects are finally clustered based on the computed similarity. An online citation entry clustering system was built using the approach. It accepts user queries in the form of name of authors. Such queries are sent to citation/bibliography search engines. The returned entries are clustered based on feature dimensions such as authors, title, place of publication, etc. After clustering, entries from different authors with the similar name form different clusters, that are presented to the user. Preliminary experiment results indicated the effectiveness of the proposed clustering approach. The architecture of three-tire clustering framework, feature representation of a citation entry, a brief network model for inter-object similarity computation, and a special cluster evaluation technique are discussed in detail.
This work is partially supported by a grant from the Research Grant Council of the Hong Kong Special Administrative Region, China (AOE97/98.EG05) and a grant from the National 973 project of China (No. G1998030414)
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
S. Brin, L. Page. The Anatomy of a Large-Scale Hyper-textual Web Search Engine. Proc. Of the 7th International World Wide Web conference, 1998.
Rodrigo A. Botafogo, Clustering Analysis for Hypertext Systems, ACM-SIGIR’93-6/93/Pittsburgh, PA, USA.
Douglass R. Cutting, David R. Karger, Jan O. Pedersen, John W. Tukey, Scatter /Gather: A Cluster-based Approach to Browsing Large Document Collections, 15th Ann Int’l SIGIR’92, Denmark-6/92.
Douglass R. Cutting, David R. Karger, Jan O. Pedersen, Constant Interaction-Time Scatter/Gather Browsing of Very Large Document Collections, 16th Ann Int’s SIGIR’93/Pittsburgh PA USA-6/93.
R. O. DUDA and P. E. HART, Pattern Classification and Scene Analysis, John Wiley and Sons, Inc., New York, NY, 1973.
Computer Science Bibliography, http://www.informatik.uni-trier.de/~ley/db/.
Lee Giles, Kurt Bollacker, Steve Lawrence. CiteSeer: An Automatic Citation Indexing System. Proceedings of the 3rd ACM Conference on Digital Libraries, pp. 89–98, 1998 [short listed for best paper award].
A.K. JAIN, M.N. Murty and P.J. FLYNN. Data Clustering: A review. ACM Computing Surveys, Vol. 31, No. 3, September 1999.
S. Lawrence and C.L. Giles. Accessibility of information on theWeb. Nature, 400(8), July 1999, 107–109.
M. F. Porter. An algorithm for suffix stripping. Program, 14:130–137, 3 1980.
Dharmendra s. Modha, W.Scott Spangler. Clustering Hypertext with applications to Web Searching, Reseach Report RJ 10160(95035), Proceedings of ACM Hypertext Conference, May 30–June 3, 2000.
ACC:SampleFunction to Format NamesinSeveral Different Ways, http://support.microsoft.com/support/kb/articles/Q149/9/53.asp
Rasmussen, E. Clustering algorithms in Information Retrieval: Data Structures and Algorithms. (1992), W. B. Frakes and R. Baeza Yates, Eds., Prentice Hall, Englewood Cliffs, New Jersey, pp. 419–442.
Stuart J. Russell. (1995), Artificial intelligence: a modern approach, Chapter 15, Prentice Hall.
Willet, P. Recent trends in hierarchic document clustering: a critical review. Inform. Proc. & Management (1988), 577–597.
N. L. Zhang and D. Poole (1996), Exploiting causal independence in Bayesian network inference, Journal of Artificial Intelligence Research, 5, 301–328.
N. L. Zhang and D. Poole (1999), On the role of context-specific independence in Probabilistic Reasoning, IJCAI-99, 1288–1293.
Research Index, the NECI Scientific Literature Digital Library. Available at http://citeseer.nj.nec.com/cs.
The Collection of Computer Science Bibliographies. Available at http://liinwww.ira.uka.de/bibliography/.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jiang, H., Lou, W., Wang, W. (2001). Three-Tier Clustering: An Online Citation Clustering System. In: Wang, X.S., Yu, G., Lu, H. (eds) Advances in Web-Age Information Management. WAIM 2001. Lecture Notes in Computer Science, vol 2118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47714-4_22
Download citation
DOI: https://doi.org/10.1007/3-540-47714-4_22
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42298-3
Online ISBN: 978-3-540-47714-3
eBook Packages: Springer Book Archive