Abstract
Many corpus-based statistical methods have been used to tackle issues of extracting disease-gene relations (DGRs) from literature. There are two limitations in the corpus-based approach: One is that available corpora for training a system are not enough and the other is that previous most research have not deal with various types of DGRs but a binary relation. In other words, analysis of presence of relation itself has been a common issue. However, the binary relation is not enough to explain DGR in practice. One solution is to construct a corpus that can analyze various types of relations between diseases and their related genes.
This article describes a corpus construction process with respect to the DGRs. Eleven topics of relations were defined by biologists. Four annotators participated in the corpus annotation task and their inter-annotator agreement was calculated to show reliability for the annotation results.
The gold standard data in the proposed approach can be used to enhance the performance of many research. Examples include recognition of gene and disease names and extraction of fine-grained DGRs. The corpus will be released through the GENIA project home page.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Song, S.-K., Choi, Y.-S., Chun, H.-W., Jeong, C.-H., Choi, S.-P., Sung, W.-K.: Multi-words Terminology Recognition Using Web Search. In: Kim, T.-H., Gelogo, Y. (eds.) UNESST 2011. CCIS, vol. 264, pp. 233–238. Springer, Heidelberg (2011)
Chun, H.W., Jeong, C.H., Song, S.K., Choi, Y.S., Choi, S.P., Sung, W.K.: Composite Kernel-based Relation Extraction using Predicate-Argument Structure. In: Kim, T.-H., Adeli, H., Ma, J., Fang, W.-C., Kang, B.-H., Park, B., Sandnes, F.E., Lee, K.C. (eds.) UNESST 2011. CCIS, vol. 264, pp. 269–273. Springer, Heidelberg (2011)
Chen, J.Y., Shen, C., Sivachenko, A.Y.: Mining Alzheimer disease relevant proteins from integrated protein interactome data. In: The Pacific Symposium on Biocomputing (PSB), pp. 367–378 (2006)
Rosario, B., Hearst, M.A.: Classifying Semantic Relations in Bioscience Texts. In: Proc. of the Annual Meeting of the relation of Computational Linguistics (ACL), pp. 431–438 (2004)
Chen, S., Wen, K.: An integrated system for cancer-related genes mining from biomedical literatures. International Journal of Computer Science and Applications 3(1), 26–39 (2006)
Chun, H.W., Tsuruoka, Y., Kim, J.D., Shiba, R., Nagata, N., Hishiki, T., Tsujii, T.: Extraction of gene-disease relations from MEDLINE using domain dictionaries and machine learning. In: The Pacific Symposium on Biocomputing (PSB), pp. 133–154 (2006)
Chun, H.W., Tsuruoka, Y., Kim, J.D., Shiba, R., Nagata, N., Hishiki, T., Tsujii, T.: Automatic recognition of topic-classified relations between prostate cancer and genes using MEDLINE abstracts. BMC Bioinformatics 7, S4 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chun, HW., Song, SK., Choi, SP., Jung, H. (2012). Corpus Construction for Extracting Disease-Gene Relations. In: Chen, L., Felfernig, A., Liu, J., Raś, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2012. Lecture Notes in Computer Science(), vol 7661. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34624-8_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-34624-8_33
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34623-1
Online ISBN: 978-3-642-34624-8
eBook Packages: Computer ScienceComputer Science (R0)