Abstract
Proteins are important macromolecules in living systems and serve various functions in almost all biological processes. Protein function information is reported in many scientific articles. Extraction of the function information from the articles is useful for drug discovery, understanding of life phenomenon, and so on. However, it is infeasible to extract the function information manually from a number of articles. In this paper, we propose a method of extracting sentences containing protein function information by iterative learning with feature update. In this method, we use a classifier in order to distinguish the sentences containing the function information from the other sentences, and introduce a semi-automatic procedure, in which a new classifier is reconstructed based on the user’s feedback for the previous classified results. In the experiment with twelve articles as feedback data, it was confirmed that F-measure was improved by iterating learning without getting the negative effect of the feedback.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berg, J., Tymoczko, J., Stryer, L.: Biochemistry, 5th edn., vol. 423, pp. 436–437. WH Freeman and Company (2002)
Wu, C.H., Yeh, L.S.L., Huang, H., Arminski, L., Castro-Alvear, J., Chen, Y., Hu, Z., Kourtesis, P., Ledley, R.S., Suzek, B.E., et al.: The protein information resource. Nucleic Acids Research 31, 345–347 (2003)
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Shindyalov, I.N., Bourne, P.E.: The protein data bank. Nucleic Acids Research 28, 235–242 (2000)
Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E., Martin, M.J., Michoud, K., O’Donovan, C., Phan, I., et al.: The swiss-prot protein knowledgebase and its supplement trembl in 2003. Nucleic Acids Research 31, 365–370 (2003)
Tsai, R.T.H., Sung, C.L., Dai, H.J., Hung, H.C., Sung, T.Y., Hsu, W.L.: Nerbio: Using selected word conjunctions, term normalization, and global patterns to improve biomedical named entity recognition. BMC Bioinformatics 7(suppl. 5), S11 (2006)
Sun, C., Guan, Y., Wang, X., Lin, L.: Biomedical Named Entities Recognition Using Conditional Random Fields Model. In: Wang, L., Jiao, L., Shi, G., Li, X., Liu, J. (eds.) FSKD 2006. LNCS (LNAI), vol. 4223, pp. 1279–1288. Springer, Heidelberg (2006)
Lafferty, J., Pereira, F., McCallum, A.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: Proceedings of the International Conference on Machine Learning, ICML 2001 (2001)
Seki, K., Mostafa, J.: An approach to protein name extraction using heuristics and a dictionary. In: The American Society for Information Science and Technology (ASIST) Annual Meeting, vol. 40, pp. 71–77 (2003)
Bunescu, R., Ge, R., Kate, R.J., Marcotte, E.M., Mooney, R.J., Ramani, A.K., Wong, Y.W.: Learning to extract proteins and their interactions from medline abstracts. In: Proceedings of the International Conference on Machine Learning 2003 Workshop on Machine Learning in Bioinformatics, pp. 46–53 (2003)
Califf, M.E., Mooney, R.J.: Relational learning of pattern-match rules for information extraction. In: Proceedings of the Sixteenth National Conference on Artificial Intelligence (AAAI 1999), pp. 328–334 (1999)
Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: Proceedings of the Seventeenth National Conference on Artificial Intelligence and Twelfth Conference on Innovative Applications of Artificial Intelligence, pp. 577–583 (2000)
Rabiner, L.R.: A tutorial on hidden markov models and selected applications in speech recognition. Proceedings of the IEEE 77(2), 257–286 (1989)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer (1995)
Fukuda, K., Tsunoda, T., Tamura, A., Takagi, T.: Information extraction: Identifying protein names from biological papers. In: Proceedings of the Pacific Symposium on Biocomputing, pp. 707–718 (1998)
Tanabe, L., Wilbur, W.J.: Tagging gene and protein names in biomedical text. Bioinformatics 18(8), 1124–1132 (2002)
Cooper, J.W., Kershenbaum, A.: Discovery of protein-protein interactions using a combination of linguistic, statistical and graphical information. BMC Bioinformatics 6, 143 (2005)
Hao, Y., Zhu, X., Huang, M., Li, M.: Discovering patterns to extract protein-protein interactions from the literature: part ii. Bioinformatics 21(15), 3294–3300 (2005)
Munna, M.A., Ohkawa, T.: A method to extract sentences with protein functional information from literature by iterative learning of the corpus. IPSJ Transactions on Bioinformatics 47(SIG 17(TBIO 1)), 22–30 (2006)
Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. In: Proceedings of the Neural Information Processing Systems (NIPS 2000), vol. 13 (2001)
Quilan, J.R.: Decision trees and multi-valued attributes. Machine Intelligence 11, 305–318 (1988)
Quilan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann (1993)
Utgoff, P.E.: Incremental induction of decision trees. Machine Learning 4, 161–186 (1989)
Domingos, P., Hulten, G.: Mining high-speed data streams. In: Proceedings of the Sixth International Conference on Knowledge Discovery and Data Mining, pp. 71–80 (2000)
Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics 21, 543–565 (1995)
Numa, M., Kaneta, Y., Ohkawa, T.: Automatic classification of proper names in protein-related literatures using database retrieval on www. In: Proceedings of the Fifth International Conference on Computational Biology and Genome Informatics, CBGI 2003, pp. 903–906 (2003)
Kaneta, Y., Munna, M.A., Ohkawa, T.: A method for extracting sentences related to protein interaction from literature using a structure database. In: Proceedings of the Second Workshop on Data Mining and Text Mining for Bioinformatics (in conjunction with ECML/PKDD 2004), pp. 18–25 (2004)
Martin, P.D., Malkowski, M.G., Box, J., Esmon, C.T., Edwards, B.F.P.: New insights into the regulation of the blood clotting cascade derived from the x-ray crystal structure of bovine meizothrombin des f1 in complex with ppack. Structure 5, 1681–1693 (1997)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Miyanishi, K., Ohkawa, T. (2013). A Method of Extracting Sentences Containing Protein Function Information from Articles by Iterative Learning with Feature Update. In: Peterson, L.E., Masulli, F., Russo, G. (eds) Computational Intelligence Methods for Bioinformatics and Biostatistics. CIBB 2012. Lecture Notes in Computer Science(), vol 7845. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38342-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-38342-7_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38341-0
Online ISBN: 978-3-642-38342-7
eBook Packages: Computer ScienceComputer Science (R0)