Abstract
Protein Sub-cellular Localization (PSL) prediction is an important task for predicting protein functions. Because the sequence-based approach used in the most previous work has focused on prediction of locations for given proteins, it failed to provide useful information for the cases in which single proteins are localized, depending on their states in progress, in several different sub-cellular locations. While it is difficult for the sequence-based approach, it can be tackled by the text-based approach.
The proposed approach extracts PSL from literature using Natural Language Processing techniques. We conducted experiments to see how our system performs in identification of evidence sentences and what linguistic features from sentences significantly contribute to the task. This article presents a text-based novel approach to extract PSL relations with their evidence sentences. Evidence sentences will provide indispensable pieces of information that the sequence-based approach cannot supply.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Horton, P., Park, K.J., Obayashi, T., Nakai, K.: Protein Subcellular Localization Prediction with WoLF PSORT. In: Asia Pacific Bioinformatics Conference (APBC), pp. 39–48 (2006)
Stapley, B.J., Kelley, L., Sternberg, M.: Predicting the subcellular location of proteins from text using support vector machines. In: Pacic Symposium on Biocomputing, PSB (2002)
Brady, S., Shatkay, H.: EPILOC: A (Working) Text-Based System for Predicting Protein Subcellular Location. In: Pacific Symposium on Biocomputing, PSB (2008)
Kim, J.D., Ohta, T., Tsujii, J.: Corpus annotation for mining biomedical events from literature. BMC Bioinformatics 9(10) (2008)
Sim, J., Wright, C.C.: The Kappa Statistic in Reliability Studies: Use, Interpretation, and Sample Size Requirements. Physical Therapy 85(3), 206–282 (2005)
Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33, 159–174 (1977)
Berger, A.L., Della Pietra, S.A., Della Pietra, V.J.: A maximum entropy approach to natural language processing. Computational Linguistics 22(1), 39–71 (1996)
Tsujii Laboratory: ENJU Deep Syntactic Full Parser ver. 2.1., http://www-tsujii.is.s.u-tokyo.ac.jp/enju/index.html/
Tsujii Laboratory: GENIA Project, http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Chun, HW., Kim, JD., Choi, YS., Sung, WK. (2010). Extracting Protein Sub-cellular Localizations from Literature. In: An, A., Lingras, P., Petty, S., Huang, R. (eds) Active Media Technology. AMT 2010. Lecture Notes in Computer Science, vol 6335. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15470-6_39
Download citation
DOI: https://doi.org/10.1007/978-3-642-15470-6_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15469-0
Online ISBN: 978-3-642-15470-6
eBook Packages: Computer ScienceComputer Science (R0)