Abstract
This paper presents a new model for flexible noun phrase detection, which is able to recognize noun phrases similar enough to the ones given by the inferred noun phrase grammar. To allow this flexibility, we use a very accurate set of probabilities for the transitions between the part-of-speech tag sequence which defines a noun phrase. These accurate probabilities are obtained by means of an evolutionary algorithm, which works with both, positive and negative examples of the language, thus improving the system coverage, while maintaining its precision. We have tested the system on different corpora and compare the results with other systems, what has revealed a clear improvement of the performance.
Supported by projects TIC2003-09481-C04 and FIT150500-2003-373.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Zhai, C.: Fast statistical parsing of noun phrases for document indexing. In: Proceedings of the Fifth Conference on Applied Natural Language Processing (1997)
Bourigault, D.: Surface grammatical analysis for the extraction of terminological noun phrases. In: Proc. of the Int. Conf. on Computational Linguistics (COLING 1992), pp. 977–981 (1992)
Voutilainen, A.: Nptool, a detector of english noun phrases. In: Proc. of the Worshop on Very Large Corpora (ACL), pp. 48–57 (1993)
Church, K.W.: A stochastic parts program and noun phrase parser for unrestricted text. In: Proc. of 1st Conference on Applied Natural Language Processing, ANLP, pp. 136–143 (1988)
Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Proc. of the third Workshop on Very Large Corpora (ACL), pp. 82–94 (1995)
Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics 21 (1995)
Pla, F., Molina, A., Prieto, N.: Tagging and chunking with bigrams. In: Proc. of the 17th conference on Computational linguistics, pp. 614–620 (2000)
Pla, F.: Etiquetado léxico y análisis sintáctico superficial basado en modelos estadísticos (2000)
Cardie, C., Pierce, D.: Error-driven pruning of treebank grammars for base noun phrase identification. In: Proc. of COLING-ACL 1998, pp. 218–224 (1998)
Veenstra, J.: Fast np chunking using memory-based learning techniques. In: Proc. of BENELEARN 1998: Eighth Belgian-Ducth Conference on Machine Learning, pp. 71–78 (1998)
Argamon, S., Dagan, I., Krymolowski, Y.: A memory-based approach to learning shallow natural language patterns. In: Proc. of joint International Conference COLING-ACL, pp. 67–73 (1998)
Tjong-Kim-Sang, E.F.: Noun phrase representation by system combination. In: Proc. of ANLP-NAACL, pp. 50–55 (2000)
Dupont, P.: Inductive and statistical learning of formal grammars. Technical report, Reseach talk, Departement ingenerie Informatique, Universite Catholique de Louvain (2002)
Rulot, H., Vidal, E.: Modelling (sub)string-length-based constraints through a grammatical inference method. In: Pattern Recognition: Theory and Applications, pp. 451–459. Springer, Heidelberg (1987)
Torró, F., Vidal, E., Rulot, H.: Fast and accurate speaker independent speech recognition using structurals models learnt by the ecgi. In: Signal Proccesing V: Theories and Applications. Elsevier Science Publishers B.V, Amsterdam (1990)
Forney, G.D.: The viterbi algorithm. Proceedings of The IEEE 61, 268–278 (1973)
Kool, A.: Literature survey. Technical report, Center for Dutch Language and Speech. University of Antwerp (2000)
Serrano, J., Castillo, M.D., Sesmero, M.: Genetic learning of text patterns. In: Conejo, R., Urretavizcaya, M., Pérez-de-la-Cruz, J.-L. (eds.) CAEPIA/TTIA 2003. LNCS (LNAI), vol. 3040, pp. 231–234. Springer, Heidelberg (2004)
Araujo, L.: Part-of-speech tagging with evolutionary algorithms. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 230–239. Springer, Heidelberg (2002)
Araujo, L.: A probabilistic chart parser implemented with an evolutionary algorithm. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 81–92. Springer, Heidelberg (2004)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19, 313–330 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Serrano, J.I., Araujo, L. (2005). Statistical Recognition of Noun Phrases in Unrestricted Text. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds) Advances in Intelligent Data Analysis VI. IDA 2005. Lecture Notes in Computer Science, vol 3646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552253_36
Download citation
DOI: https://doi.org/10.1007/11552253_36
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28795-7
Online ISBN: 978-3-540-31926-9
eBook Packages: Computer ScienceComputer Science (R0)