Empirical Textual Mining to Protein Entities Recognition from PubMed Corpus

Liang, Tyne; Shih, Ping-Ke

doi:10.1007/11428817_6

Tyne Liang¹⁹ &
Ping-Ke Shih¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3513))

Included in the following conference series:

International Conference on Application of Natural Language to Information Systems

1379 Accesses
4 Citations

Abstract

Named Entity Recognition (NER) from biomedical literature is crucial in biomedical knowledge base automation. In this paper, both empirical rule and statistical approaches to protein entity recognition are presented and investigated on a general corpus GENIA 3.02p and a new domain-specific corpus SRC. Experimental results show the rules derived from SRC are useful though they are simpler and more general than the one used by other rule-based approaches. Meanwhile, a concise HMM-based model with rich set of features is presented and proved to be robust and competitive while comparing it to other successful hybrid models. Besides, the resolution of coordination variants common in entities recognition is addressed. By applying heuristic rules and clustering strategy, the presented resolver is proved to be feasible.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Fukuda, K., Tsunoda, T., Tamura, A., Takagi, T.: Towards Information Extraction: identifying Protein Names from Biological Papers. In: The 3rd Pacific Symposium on Biocomputing, pp. 707–718 (1998)
Google Scholar
Hou, W.J., Chen, H.H.: Enhancing Performance of Protein Name Recognizers using Collocation. In: ACL 2003, pp. 25–32 (2003)
Google Scholar
Lee, K.J., Hwang, Y.S., Rim, H.C.: Two-Phase Biomedical NE Recognition based on SVMs. In: ACL 2003 Workshop on Natural Language Processing in Biomedicine, pp. 33–40 (2003)
Google Scholar
Lin, Y., Tsai, T., Chiou, W., Wu, K., Sung, T.-Y., Hsu, W.L.: A Maximum Entropy Approach to Biomedical Named Entity Recognition. In: 4th Workshop on Data Mining in Bioinformatics (2004)
Google Scholar
Olsson, F., Eriksson, G., Franzen, K., Asker, L., Liden, P.: Notions of Correctness when Evaluating Protein Name Taggers. In: 19th International Conference on Computational Linguistics, pp. 765–771 (2002)
Google Scholar
Settles, B.: Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets. In: Int’l Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA), Geneva, Switzerland (2004)
Google Scholar
Takeuchi, K., Collier, N.: Bio-Medical Entity Extraction using Support Vector Machines. In: ACL 2003 Workshop on Natural Language Processing in Biomedicine, pp. 57–64 (2003)
Google Scholar
Tsuruoka, Y., Tsujii, J.: Boosting Precision and Recall of Dictionary-based Protein Name Recognition. In: ACL 2003 Workshop on Natural Language Processing in Biomedicine, pp. 41–48 (2003)
Google Scholar
Zhou, G.D., Su, J.: Named Entity Recognition using an HMM-based Chunk Tagger. In: 40th Annual Meeting of the Association for Computational Linguistics (2002)
Google Scholar
Zhou, G., Zhang, J., Su, J., Shen, D., Tan, C.L.: Recognizing Names in Biomedical Texts: A Machine Learning Approach. Bioinformatics 20, 1178–1190 (2004)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, National Chiao Tung University, Hsinchu, Taiwan
Tyne Liang & Ping-Ke Shih

Authors

Tyne Liang
View author publications
You can also search for this author in PubMed Google Scholar
Ping-Ke Shih
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Software and Computing Systems, University of Alicante, Spain
Andrés Montoyo
Grupo de investigación del Procesamiento del Lenguaje y Sistemas de Información, Departamento de Lenguajes y Sistemas Informáticos, Universidad de Alicante, Alicante, Spain
Rafael Muńoz
Lab. CEDRIC, CNAM, Paris, France
Elisabeth Métais

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liang, T., Shih, PK. (2005). Empirical Textual Mining to Protein Entities Recognition from PubMed Corpus. In: Montoyo, A., Muńoz, R., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2005. Lecture Notes in Computer Science, vol 3513. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11428817_6

Download citation

DOI: https://doi.org/10.1007/11428817_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-26031-8
Online ISBN: 978-3-540-32110-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics