Utilization of Global Ranking Information in Graph- Based Biomedical Literature Clustering

Zhang, Xiaodan; Hu, Xiaohua; Xia, Jiali; Zhou, Xiaohua; Achananuparp, Palakorn

doi:10.1007/978-3-540-74553-2_29

Xiaodan Zhang¹,
Xiaohua Hu^1,2,
Jiali Xia²,
Xiaohua Zhou¹ &
…
Palakorn Achananuparp¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4654))

Included in the following conference series:

International Conference on Data Warehousing and Knowledge Discovery

1193 Accesses

Abstract

In this paper, we explore how global ranking method in conjunction with local density method help identify meaningful term clusters from ontology enriched graph representation of biomedical literature corpus. One big problem with document clustering is how to discount the effects of class-unspecific general terms and strengthen the effects of class-specific core terms. We claim that a well constructed term graph can help improve the global ranking of class-specific core terms. We first apply PageRank and HITS to a directed abstract-title term graph to target class specific core terms. Then k dense term clusters (graphs) are identified from these terms. Last, each document is assigned to its closest core term graph. A series of experiments are conducted on a document corpus collected from PubMed. Experimental results show that our approach is very effective to identify class-specific core terms and thus help document clustering.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Angelova, R., Weikum, G.: Graph-based text classification: learn from your neighbors. In: SIGIR 2006, pp. 485–492 (2006)
Google Scholar
Charkrabarti, S., Dom, B.E., Indyk, P.: Enhanced hypertext categorization using hyperlinks. In: SIGMOD 1998, pp. 307–318 (1998)
Google Scholar
Cohen, W.W., Hofmann, T.: The missing link—a probabilistic model of document conent and hypertext connectivity. In: NIPS 13 (2001)
Google Scholar
Erkan, G., Radev, D.R.: LexRank: Graph-based Lexical Centrality as Salience in Text Summarization. J. Artif. Intell. Res (JAIR) 22, 457–479 (2004)
Google Scholar
Hassan, S., Banea, C.: Random-Walk TermWeighting for Improved Text Classification. In: Workshop on TextGraphs, at HLT-NAACL 2006, pp. 53–60 (2006)
Google Scholar
Kleinberg, J.: Authoritative sources in a hyperlinked environment. In: Proc. Ninth Ann. ACM-SIAM Symp. Discrete Algorithms, pp. 668–677. ACM Press, New York (1998)
Google Scholar
Marklv, A., Last, M., Kandel, A.: Model-based classification of web documents represented by Graphs. In: Proceedings of WebKDD 2006 workshop on knowledge discovery (2006)
Google Scholar
Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking:Bringing order to theWeb. Technical report, Stanford Digital Library Technologies Project (1998)
Google Scholar
Steinbach, M., Karypis, G., Kumar, V.: A Comparison of Document Clustering Techniques. Technical Report #00-034. Department of Computer Science and Engineering, University of Minnesota (2000)
Google Scholar
Wang, B.B., McKay, R.I., Abbass, H.A., Barlow, M.: Learning Text Classifier using the Domain Concept Hierarchy. In: Proceedings of International Conference on Communications, Circuits and Systems 2002, China (2002)
Google Scholar
Zhong, S., Ghosh, J.: A comparative study of generative models for document clustering. In: Proceedings of the workshop on Clustering High Dimensional Data and Its Applications in SIAM Data Mining Conference (2003)
Google Scholar
Zipf, G.K.: Human Behaviour and the Principle of Least-Effort. Addison-Wesley, Cambridge, MA (1949)
Google Scholar
Zhao, Y., Karypis, G.: Criterion functions for document clustering: experiments and analysis, Technical Report, Department of Computer Science, University of Minnesota (2001)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Information Science and Technology, Drexel University, 3141 Chestnut street, Philadelphia, PA 19104, USA
Xiaodan Zhang, Xiaohua Hu, Xiaohua Zhou & Palakorn Achananuparp
UFSoft School of Software, Jiangxi University of Finance and Economics, Nanchang, Jiangxi, China
Xiaohua Hu & Jiali Xia

Authors

Xiaodan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohua Hu
View author publications
You can also search for this author in PubMed Google Scholar
Jiali Xia
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohua Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Palakorn Achananuparp
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Il Yeal Song Johann Eder Tho Manh Nguyen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, X., Hu, X., Xia, J., Zhou, X., Achananuparp, P. (2007). Utilization of Global Ranking Information in Graph- Based Biomedical Literature Clustering. In: Song, I.Y., Eder, J., Nguyen, T.M. (eds) Data Warehousing and Knowledge Discovery. DaWaK 2007. Lecture Notes in Computer Science, vol 4654. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74553-2_29

Download citation

DOI: https://doi.org/10.1007/978-3-540-74553-2_29
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74552-5
Online ISBN: 978-3-540-74553-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics