Statistical Recognition of Noun Phrases in Unrestricted Text

Serrano, José I.; Araujo, Lourdes

doi:10.1007/11552253_36

José I. Serrano²¹ &
Lourdes Araujo²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3646))

Included in the following conference series:

International Symposium on Intelligent Data Analysis

1960 Accesses
3 Citations

Abstract

This paper presents a new model for flexible noun phrase detection, which is able to recognize noun phrases similar enough to the ones given by the inferred noun phrase grammar. To allow this flexibility, we use a very accurate set of probabilities for the transitions between the part-of-speech tag sequence which defines a noun phrase. These accurate probabilities are obtained by means of an evolutionary algorithm, which works with both, positive and negative examples of the language, thus improving the system coverage, while maintaining its precision. We have tested the system on different corpora and compare the results with other systems, what has revealed a clear improvement of the performance.

Supported by projects TIC2003-09481-C04 and FIT150500-2003-373.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Zhai, C.: Fast statistical parsing of noun phrases for document indexing. In: Proceedings of the Fifth Conference on Applied Natural Language Processing (1997)
Google Scholar
Bourigault, D.: Surface grammatical analysis for the extraction of terminological noun phrases. In: Proc. of the Int. Conf. on Computational Linguistics (COLING 1992), pp. 977–981 (1992)
Google Scholar
Voutilainen, A.: Nptool, a detector of english noun phrases. In: Proc. of the Worshop on Very Large Corpora (ACL), pp. 48–57 (1993)
Google Scholar
Church, K.W.: A stochastic parts program and noun phrase parser for unrestricted text. In: Proc. of 1st Conference on Applied Natural Language Processing, ANLP, pp. 136–143 (1988)
Google Scholar
Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Proc. of the third Workshop on Very Large Corpora (ACL), pp. 82–94 (1995)
Google Scholar
Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics 21 (1995)
Google Scholar
Pla, F., Molina, A., Prieto, N.: Tagging and chunking with bigrams. In: Proc. of the 17th conference on Computational linguistics, pp. 614–620 (2000)
Google Scholar
Pla, F.: Etiquetado léxico y análisis sintáctico superficial basado en modelos estadísticos (2000)
Google Scholar
Cardie, C., Pierce, D.: Error-driven pruning of treebank grammars for base noun phrase identification. In: Proc. of COLING-ACL 1998, pp. 218–224 (1998)
Google Scholar
Veenstra, J.: Fast np chunking using memory-based learning techniques. In: Proc. of BENELEARN 1998: Eighth Belgian-Ducth Conference on Machine Learning, pp. 71–78 (1998)
Google Scholar
Argamon, S., Dagan, I., Krymolowski, Y.: A memory-based approach to learning shallow natural language patterns. In: Proc. of joint International Conference COLING-ACL, pp. 67–73 (1998)
Google Scholar
Tjong-Kim-Sang, E.F.: Noun phrase representation by system combination. In: Proc. of ANLP-NAACL, pp. 50–55 (2000)
Google Scholar
Dupont, P.: Inductive and statistical learning of formal grammars. Technical report, Reseach talk, Departement ingenerie Informatique, Universite Catholique de Louvain (2002)
Google Scholar
Rulot, H., Vidal, E.: Modelling (sub)string-length-based constraints through a grammatical inference method. In: Pattern Recognition: Theory and Applications, pp. 451–459. Springer, Heidelberg (1987)
Google Scholar
Torró, F., Vidal, E., Rulot, H.: Fast and accurate speaker independent speech recognition using structurals models learnt by the ecgi. In: Signal Proccesing V: Theories and Applications. Elsevier Science Publishers B.V, Amsterdam (1990)
Google Scholar
Forney, G.D.: The viterbi algorithm. Proceedings of The IEEE 61, 268–278 (1973)
Article MathSciNet Google Scholar
Kool, A.: Literature survey. Technical report, Center for Dutch Language and Speech. University of Antwerp (2000)
Google Scholar
Serrano, J., Castillo, M.D., Sesmero, M.: Genetic learning of text patterns. In: Conejo, R., Urretavizcaya, M., Pérez-de-la-Cruz, J.-L. (eds.) CAEPIA/TTIA 2003. LNCS (LNAI), vol. 3040, pp. 231–234. Springer, Heidelberg (2004)
Chapter Google Scholar
Araujo, L.: Part-of-speech tagging with evolutionary algorithms. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 230–239. Springer, Heidelberg (2002)
Chapter Google Scholar
Araujo, L.: A probabilistic chart parser implemented with an evolutionary algorithm. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 81–92. Springer, Heidelberg (2004)
Chapter Google Scholar
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19, 313–330 (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto de Automática, Industrial CSIC, Spain
José I. Serrano
Departamento de Sistemas, Informáticos y Programación, Universidad Complutense de Madrid, Spain
Lourdes Araujo

Authors

José I. Serrano
View author publications
You can also search for this author in PubMed Google Scholar
Lourdes Araujo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Information Technology, National Research Council Canada, Ottawa, Canada
A. Fazel Famili
LIACS, Leiden University, The Netherlands
Joost N. Kok
IFM, Linköping University, SE-58183, Linköping, Sweden
José M. Peña
Department of Computer Science, Universiteit Utrecht,
Arno Siebes
Utrecht University, TB Utrecht,, P.O. box 80 089, NL-3508, the Netherlands
Ad Feelders

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Serrano, J.I., Araujo, L. (2005). Statistical Recognition of Noun Phrases in Unrestricted Text. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds) Advances in Intelligent Data Analysis VI. IDA 2005. Lecture Notes in Computer Science, vol 3646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552253_36

Download citation

DOI: https://doi.org/10.1007/11552253_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28795-7
Online ISBN: 978-3-540-31926-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics