Various Features with Integrated Strategies for Protein Name Classification
Classification task is an integral part of named entity recognition system to classify a recognized named entity to its corresponding class. This task has not received much attention in the biomedical domain, due to the lack of awareness to differentiate feature sources and strategies in previous studies. In this research, we analyze different sources and strategies of protein name classification, and developed integrated strategies that incorporate advantages from rule-based, dictionary-based and statistical-based method. In rule-based method, terms and knowledge of protein nomenclature that provide strong cue for protein name are used. In dictionary-based method, a set of rules for curating protein name dictionary are used. These terms and dictionaries are combined with our developed features into a statistical-based classifier. Our developed features are comprised of word shape features and unigram & bi-gram features. Our various information sources and integrated strategies are able to achieve state-of-the-art performance to classify protein and non-protein names.
KeywordsClassification Task Integrate Strategy Regular Expression External Information Name Entity Recognition
Unable to display preview. Download preview PDF.
- 1.Fukuda, K., Tsunoda, T., Tamura, A., Takagi, T.: Toward Information Extraction: Identifying Protein Names from Biological Papers. In: Proceedings of the 3rd Pacific Symposium on Biocomputing (PSB 1998), vol. 3, pp. 705–716 (1998)Google Scholar
- 2.Narayanaswamy, M., Ravikumar, K.E., Vijay Shanker, K.: A Biological Named Entity Recognizer. In: Proc. of PSB 2003 (2003)Google Scholar
- 3.Egorov, S., Yuryev, A., Daraselia, N.: A Simple and Practical Dictionary-based Approach for Identification of Protein in Medline Abstracts. In: American Medical Informatics Association) (2004)Google Scholar
- 4.Mika, S., Rost, B.: Protein names precisely peeled off free textGoogle Scholar
- 6.Lee, K.-J., Hwang, Y.-S., Rim, H.-C.: Two-Phase Biomedical NE Recognition based on SVMs. In: Proceeding of the ACL 2003 Workshop on Natural Language Processing in Biomedicine, pp. 33–40 (2003)Google Scholar
- 7.Torii, M., Kamboj, S., Vijay-Shanker, K.: Using name-internal and contextual features to classify biological termsGoogle Scholar
- 8.Mukherhea, S., et al.: Enhancing a biomedical information extraction with dictionary mining and context disambiguation. IBM J. RES. & DEV. 48(5/6)Google Scholar
- 10.Nabota, C., Collier, N., Tsujii, J.: Automatic term identification and classification in biology text. In: Proc. Natural Language Pacific Rim Symposium, pp. 369–374 (1999)Google Scholar