Corpus Construction for Extracting Disease-Gene Relations

Chun, Hong-Woo; Song, Sa-Kwang; Choi, Sung-Pil; Jung, Hanmin

doi:10.1007/978-3-642-34624-8_33

Corpus Construction for Extracting Disease-Gene Relations

Hong-Woo Chun²²,
Sa-Kwang Song²²,
Sung-Pil Choi²² &
…
Hanmin Jung²²

Conference paper

1344 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7661))

Abstract

Many corpus-based statistical methods have been used to tackle issues of extracting disease-gene relations (DGRs) from literature. There are two limitations in the corpus-based approach: One is that available corpora for training a system are not enough and the other is that previous most research have not deal with various types of DGRs but a binary relation. In other words, analysis of presence of relation itself has been a common issue. However, the binary relation is not enough to explain DGR in practice. One solution is to construct a corpus that can analyze various types of relations between diseases and their related genes.

This article describes a corpus construction process with respect to the DGRs. Eleven topics of relations were defined by biologists. Four annotators participated in the corpus annotation task and their inter-annotator agreement was calculated to show reliability for the annotation results.

The gold standard data in the proposed approach can be used to enhance the performance of many research. Examples include recognition of gene and disease names and extraction of fine-grained DGRs. The corpus will be released through the GENIA project home page.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Song, S.-K., Choi, Y.-S., Chun, H.-W., Jeong, C.-H., Choi, S.-P., Sung, W.-K.: Multi-words Terminology Recognition Using Web Search. In: Kim, T.-H., Gelogo, Y. (eds.) UNESST 2011. CCIS, vol. 264, pp. 233–238. Springer, Heidelberg (2011)
Google Scholar
Chun, H.W., Jeong, C.H., Song, S.K., Choi, Y.S., Choi, S.P., Sung, W.K.: Composite Kernel-based Relation Extraction using Predicate-Argument Structure. In: Kim, T.-H., Adeli, H., Ma, J., Fang, W.-C., Kang, B.-H., Park, B., Sandnes, F.E., Lee, K.C. (eds.) UNESST 2011. CCIS, vol. 264, pp. 269–273. Springer, Heidelberg (2011)
Google Scholar
Chen, J.Y., Shen, C., Sivachenko, A.Y.: Mining Alzheimer disease relevant proteins from integrated protein interactome data. In: The Pacific Symposium on Biocomputing (PSB), pp. 367–378 (2006)
Google Scholar
Rosario, B., Hearst, M.A.: Classifying Semantic Relations in Bioscience Texts. In: Proc. of the Annual Meeting of the relation of Computational Linguistics (ACL), pp. 431–438 (2004)
Google Scholar
Chen, S., Wen, K.: An integrated system for cancer-related genes mining from biomedical literatures. International Journal of Computer Science and Applications 3(1), 26–39 (2006)
Google Scholar
Chun, H.W., Tsuruoka, Y., Kim, J.D., Shiba, R., Nagata, N., Hishiki, T., Tsujii, T.: Extraction of gene-disease relations from MEDLINE using domain dictionaries and machine learning. In: The Pacific Symposium on Biocomputing (PSB), pp. 133–154 (2006)
Google Scholar
Chun, H.W., Tsuruoka, Y., Kim, J.D., Shiba, R., Nagata, N., Hishiki, T., Tsujii, T.: Automatic recognition of topic-classified relations between prostate cancer and genes using MEDLINE abstracts. BMC Bioinformatics 7, S4 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Korea Institute of Science and Technology Information (KISTI), 245 Daehak-ro, Yuseong-gu, Daejeon, South Korea
Hong-Woo Chun, Sa-Kwang Song, Sung-Pil Choi & Hanmin Jung

Authors

Hong-Woo Chun
View author publications
You can also search for this author in PubMed Google Scholar
Sa-Kwang Song
View author publications
You can also search for this author in PubMed Google Scholar
Sung-Pil Choi
View author publications
You can also search for this author in PubMed Google Scholar
Hanmin Jung
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Hong Kong Baptist University, 224 Waterloo Road, Kowloon, Hong Kong
Li Chen & Jiming Liu &
Institute for Software Technology, Graz University of Technology, Inffeldgasse 16b, 8010, Graz, Austria
Alexander Felfernig
University of North Carolina, Charlotte, NC 28223, USA and Warsaw University of Technology, Nowowiejska 15/19, 00-665, Warsaw, Poland
Zbigniew W. Raś

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chun, HW., Song, SK., Choi, SP., Jung, H. (2012). Corpus Construction for Extracting Disease-Gene Relations. In: Chen, L., Felfernig, A., Liu, J., Raś, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2012. Lecture Notes in Computer Science(), vol 7661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34624-8_33

Download citation

DOI: https://doi.org/10.1007/978-3-642-34624-8_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34623-1
Online ISBN: 978-3-642-34624-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics