Skip to main content

Statistical Recognition of Noun Phrases in Unrestricted Text

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3646))

Abstract

This paper presents a new model for flexible noun phrase detection, which is able to recognize noun phrases similar enough to the ones given by the inferred noun phrase grammar. To allow this flexibility, we use a very accurate set of probabilities for the transitions between the part-of-speech tag sequence which defines a noun phrase. These accurate probabilities are obtained by means of an evolutionary algorithm, which works with both, positive and negative examples of the language, thus improving the system coverage, while maintaining its precision. We have tested the system on different corpora and compare the results with other systems, what has revealed a clear improvement of the performance.

Supported by projects TIC2003-09481-C04 and FIT150500-2003-373.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Zhai, C.: Fast statistical parsing of noun phrases for document indexing. In: Proceedings of the Fifth Conference on Applied Natural Language Processing (1997)

    Google Scholar 

  2. Bourigault, D.: Surface grammatical analysis for the extraction of terminological noun phrases. In: Proc. of the Int. Conf. on Computational Linguistics (COLING 1992), pp. 977–981 (1992)

    Google Scholar 

  3. Voutilainen, A.: Nptool, a detector of english noun phrases. In: Proc. of the Worshop on Very Large Corpora (ACL), pp. 48–57 (1993)

    Google Scholar 

  4. Church, K.W.: A stochastic parts program and noun phrase parser for unrestricted text. In: Proc. of 1st Conference on Applied Natural Language Processing, ANLP, pp. 136–143 (1988)

    Google Scholar 

  5. Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Proc. of the third Workshop on Very Large Corpora (ACL), pp. 82–94 (1995)

    Google Scholar 

  6. Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics 21 (1995)

    Google Scholar 

  7. Pla, F., Molina, A., Prieto, N.: Tagging and chunking with bigrams. In: Proc. of the 17th conference on Computational linguistics, pp. 614–620 (2000)

    Google Scholar 

  8. Pla, F.: Etiquetado léxico y análisis sintáctico superficial basado en modelos estadísticos (2000)

    Google Scholar 

  9. Cardie, C., Pierce, D.: Error-driven pruning of treebank grammars for base noun phrase identification. In: Proc. of COLING-ACL 1998, pp. 218–224 (1998)

    Google Scholar 

  10. Veenstra, J.: Fast np chunking using memory-based learning techniques. In: Proc. of BENELEARN 1998: Eighth Belgian-Ducth Conference on Machine Learning, pp. 71–78 (1998)

    Google Scholar 

  11. Argamon, S., Dagan, I., Krymolowski, Y.: A memory-based approach to learning shallow natural language patterns. In: Proc. of joint International Conference COLING-ACL, pp. 67–73 (1998)

    Google Scholar 

  12. Tjong-Kim-Sang, E.F.: Noun phrase representation by system combination. In: Proc. of ANLP-NAACL, pp. 50–55 (2000)

    Google Scholar 

  13. Dupont, P.: Inductive and statistical learning of formal grammars. Technical report, Reseach talk, Departement ingenerie Informatique, Universite Catholique de Louvain (2002)

    Google Scholar 

  14. Rulot, H., Vidal, E.: Modelling (sub)string-length-based constraints through a grammatical inference method. In: Pattern Recognition: Theory and Applications, pp. 451–459. Springer, Heidelberg (1987)

    Google Scholar 

  15. Torró, F., Vidal, E., Rulot, H.: Fast and accurate speaker independent speech recognition using structurals models learnt by the ecgi. In: Signal Proccesing V: Theories and Applications. Elsevier Science Publishers B.V, Amsterdam (1990)

    Google Scholar 

  16. Forney, G.D.: The viterbi algorithm. Proceedings of The IEEE 61, 268–278 (1973)

    Article  MathSciNet  Google Scholar 

  17. Kool, A.: Literature survey. Technical report, Center for Dutch Language and Speech. University of Antwerp (2000)

    Google Scholar 

  18. Serrano, J., Castillo, M.D., Sesmero, M.: Genetic learning of text patterns. In: Conejo, R., Urretavizcaya, M., Pérez-de-la-Cruz, J.-L. (eds.) CAEPIA/TTIA 2003. LNCS (LNAI), vol. 3040, pp. 231–234. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  19. Araujo, L.: Part-of-speech tagging with evolutionary algorithms. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 230–239. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  20. Araujo, L.: A probabilistic chart parser implemented with an evolutionary algorithm. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 81–92. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  21. Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a large annotated corpus of english: The penn treebank. Computational Linguistics 19, 313–330 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Serrano, J.I., Araujo, L. (2005). Statistical Recognition of Noun Phrases in Unrestricted Text. In: Famili, A.F., Kok, J.N., Peña, J.M., Siebes, A., Feelders, A. (eds) Advances in Intelligent Data Analysis VI. IDA 2005. Lecture Notes in Computer Science, vol 3646. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11552253_36

Download citation

  • DOI: https://doi.org/10.1007/11552253_36

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-28795-7

  • Online ISBN: 978-3-540-31926-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics