Three-Tier Clustering: An Online Citation Clustering System

Jiang, Haifeng; Lou, Wenwu; Wang, Wei

doi:10.1007/3-540-47714-4_22

Haifeng Jiang⁷,
Wenwu Lou⁷ &
Wei Wang⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2118))

Included in the following conference series:

International Conference on Web-Age Information Management

341 Accesses
2 Citations

Abstract

In this paper, we present a three tier clustering method where data objects are described by a number of feature dimensions. Using the approach, similarity along each feature dimension of objects are first computed. The inter-objects similarity are then computed from inter-feature-dimension similarity using a Bayesian multi-causal model. Objects are finally clustered based on the computed similarity. An online citation entry clustering system was built using the approach. It accepts user queries in the form of name of authors. Such queries are sent to citation/bibliography search engines. The returned entries are clustered based on feature dimensions such as authors, title, place of publication, etc. After clustering, entries from different authors with the similar name form different clusters, that are presented to the user. Preliminary experiment results indicated the effectiveness of the proposed clustering approach. The architecture of three-tire clustering framework, feature representation of a citation entry, a brief network model for inter-object similarity computation, and a special cluster evaluation technique are discussed in detail.

This work is partially supported by a grant from the Research Grant Council of the Hong Kong Special Administrative Region, China (AOE97/98.EG05) and a grant from the National 973 project of China (No. G1998030414)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. Brin, L. Page. The Anatomy of a Large-Scale Hyper-textual Web Search Engine. Proc. Of the 7th International World Wide Web conference, 1998.
Google Scholar
Rodrigo A. Botafogo, Clustering Analysis for Hypertext Systems, ACM-SIGIR’93-6/93/Pittsburgh, PA, USA.
Google Scholar
Douglass R. Cutting, David R. Karger, Jan O. Pedersen, John W. Tukey, Scatter /Gather: A Cluster-based Approach to Browsing Large Document Collections, 15th Ann Int’l SIGIR’92, Denmark-6/92.
Google Scholar
Douglass R. Cutting, David R. Karger, Jan O. Pedersen, Constant Interaction-Time Scatter/Gather Browsing of Very Large Document Collections, 16th Ann Int’s SIGIR’93/Pittsburgh PA USA-6/93.
Google Scholar
R. O. DUDA and P. E. HART, Pattern Classification and Scene Analysis, John Wiley and Sons, Inc., New York, NY, 1973.
MATH Google Scholar
Computer Science Bibliography, http://www.informatik.uni-trier.de/~ley/db/.
Lee Giles, Kurt Bollacker, Steve Lawrence. CiteSeer: An Automatic Citation Indexing System. Proceedings of the 3rd ACM Conference on Digital Libraries, pp. 89–98, 1998 [short listed for best paper award].
Google Scholar
A.K. JAIN, M.N. Murty and P.J. FLYNN. Data Clustering: A review. ACM Computing Surveys, Vol. 31, No. 3, September 1999.
Google Scholar
S. Lawrence and C.L. Giles. Accessibility of information on theWeb. Nature, 400(8), July 1999, 107–109.
Google Scholar
M. F. Porter. An algorithm for suffix stripping. Program, 14:130–137, 3 1980.
Google Scholar
Dharmendra s. Modha, W.Scott Spangler. Clustering Hypertext with applications to Web Searching, Reseach Report RJ 10160(95035), Proceedings of ACM Hypertext Conference, May 30–June 3, 2000.
Google Scholar
ACC:SampleFunction to Format NamesinSeveral Different Ways, http://support.microsoft.com/support/kb/articles/Q149/9/53.asp
Rasmussen, E. Clustering algorithms in Information Retrieval: Data Structures and Algorithms. (1992), W. B. Frakes and R. Baeza Yates, Eds., Prentice Hall, Englewood Cliffs, New Jersey, pp. 419–442.
Google Scholar
Stuart J. Russell. (1995), Artificial intelligence: a modern approach, Chapter 15, Prentice Hall.
Google Scholar
Willet, P. Recent trends in hierarchic document clustering: a critical review. Inform. Proc. & Management (1988), 577–597.
Google Scholar
N. L. Zhang and D. Poole (1996), Exploiting causal independence in Bayesian network inference, Journal of Artificial Intelligence Research, 5, 301–328.
MATH MathSciNet Google Scholar
N. L. Zhang and D. Poole (1999), On the role of context-specific independence in Probabilistic Reasoning, IJCAI-99, 1288–1293.
Google Scholar
Research Index, the NECI Scientific Literature Digital Library. Available at http://citeseer.nj.nec.com/cs.
The Collection of Computer Science Bibliographies. Available at http://liinwww.ira.uka.de/bibliography/.

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, Hong Kong University of Science and Technology, Hong Kong, China
Haifeng Jiang, Wenwu Lou & Wei Wang

Authors

Haifeng Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Wenwu Lou
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information and Software Engineering, George Mason University, Fairfax, VA, 22030-4444, USA
X. Sean Wang
Department of Computer Science and Engineering, Northeastern University, Shenyang, 110004, China
Ge Yu
Department of Computer Science, Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong, China
Hongjun Lu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, H., Lou, W., Wang, W. (2001). Three-Tier Clustering: An Online Citation Clustering System. In: Wang, X.S., Yu, G., Lu, H. (eds) Advances in Web-Age Information Management. WAIM 2001. Lecture Notes in Computer Science, vol 2118. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47714-4_22

Download citation

DOI: https://doi.org/10.1007/3-540-47714-4_22
Published: 28 June 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42298-3
Online ISBN: 978-3-540-47714-3
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics