Natural Language Processing

  • Stefano Ferilli
Part of the Advances in Pattern Recognition book series (ACVPR)


Text processing represents a preliminary phase to many document content handling tasks aimed at extracting and organizing information therein. The computer science disciplines devoted to understanding language, and hence useful for such objectives, are Computational Linguistics and Natural Language Processing. They rely on the availability of suitable linguistic resources (corpora, computational lexica, etc.) and of standard representation models of linguistic information to build tools that are able to analyze sentences at various levels of complexity: morphologic, lexical, syntactic, semantic. This chapter provides a survey of the main Natural Language Processing tasks (tokenization, language recognition, stemming, stopword removal, Part of Speech tagging, Word Sense Disambiguation, Parsing) and presents some related techniques, along with lexical resources of interest to the research community.


Noun Phrase Natural Language Processing Regular Expression Word Sense Disambiguation Subject Code 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Allen, J.F.: Natural Language Understanding. Benjamin-Cummings, Redwood City (1994) Google Scholar
  2. 2.
    Bentivogli, L., Forner, P., Magnini, B., Pianta, E.: Revising WordNet domains hierarchy: Semantics, coverage, and balancing. In: Proceedings of COLING 2004 Workshop on Multilingual Linguistic Resources, pp. 101–108 (2004) CrossRefGoogle Scholar
  3. 3.
    Berry-Rogghe, G.: The computation of collocations and their relevance to lexical studies. In: Aitken, A.J., Bailey, R.W., Hamilton-Smith, N. (eds.) The Computer and Literary Studies, pp. 103–112. Edinburgh University Press, Edinburgh (1973) Google Scholar
  4. 4.
    Brill, E.: A simple rule-based part of speech tagger. In: HLT ’91: Proceedings of the Workshop on Speech and Natural Language, pp. 112–116 (1992) CrossRefGoogle Scholar
  5. 5.
    Brill, E.: Some advances in transformation-based part of speech tagging. In: Proceedings of the 12th National Conference on Artificial Intelligence (AAAI), vol. 1, pp. 722–727 (1994) Google Scholar
  6. 6.
    Brill, E.: Unsupervised learning of disambiguation rules for part of speech tagging. In: Natural Language Processing Using Very Large Corpora Workshop, pp. 1–13. Kluwer, Amsterdam (1995) Google Scholar
  7. 7.
    Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: The DL-lite family. Journal of Automated Reasoning 39(3), 385–429 (2007) MathSciNetMATHCrossRefGoogle Scholar
  8. 8.
    Calzolari, N., Lenci, A.: Linguistica computazionale—strumenti e risorse per il trattamento automatico della lingua. Mondo Digitale 2, 56–69 (2004) (in Italian) Google Scholar
  9. 9.
    De Mauro, T.: Grande Dizionario Italiano dell’Uso. UTET, Turin (1999) (in Italian) Google Scholar
  10. 10.
    Dewey, M., et al.: Dewey Decimal Classification and Relative Index. Edition 22. OCLC Online Computer Library Center (2003) Google Scholar
  11. 11.
    Gale, W., Church, K., Yarowsky, D.: One sense per discourse. In: Proceedings of the ARPA Workshop on Speech and Natural Language Processing, pp. 233–237 (1992) Google Scholar
  12. 12.
    Grishman, R.: Computational Linguistic—An Introduction. Studies in Natural Language Processing. Cambridge University Press, Cambridge (1986) CrossRefGoogle Scholar
  13. 13.
    Gruber, T.R.: A translation approach to portable ontology specifications. Knowledge Acquisition 5(2), 199–220 (1993) CrossRefGoogle Scholar
  14. 14.
    Halliday, M.: Categories of the theory of grammar. Word 17, 241–292 (1961) Google Scholar
  15. 15.
    Ide, N., Véronis, J.: Introduction to the special issue on Word Sense Disambiguation: The state of the art. Compuational Linguistics 24(1), 1–40 (1998) Google Scholar
  16. 16.
    Krovetz, R.: More than one sense per discourse. In: Proceedings of SENSEVAL Workshop, pp. 1–10 (1998) Google Scholar
  17. 17.
    Lafferty, J., Sleator, D.D., Temperley, D.: Grammatical trigrams: A probabilistic model of link grammar. In: Proceedings of the AAAI Fall Symposium on Probabilistic Approaches to Natural Language (1992) Google Scholar
  18. 18.
    Lesk, M.: Automatic sense disambiguation using machine-readable dictionaries: How to tell a pine cone from an ice cream cone. In: Proceedings of the 5th International Conference on Systems Documentation (SIGDOC), pp. 24–26 (1986) CrossRefGoogle Scholar
  19. 19.
    Magnini, B., Cavaglià, G.: Integrating subject field codes into WordNet. In: Proceedings of the 2nd International Conference on Language Resources and Evaluation (LREC), pp. 1413–1418 (2000) Google Scholar
  20. 20.
    Manning, C.D., Schutze, H.: Foundations of Statistical Natural Language Processing. MIT Press, New York (1999) MATHGoogle Scholar
  21. 21.
    McCarthy, J., Minsky, M.L., Rochester, N., Shannon, C.E.: A proposal for the Dartmouth Summer research project on Artificial Intelligence. Tech. rep., Dartmouth College (1955) Google Scholar
  22. 22.
    Miller, G.A., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.J.: Introduction to WordNet: An on-line lexical database. International Journal of Lexicography 3(4), 235–244 (1990) CrossRefGoogle Scholar
  23. 23.
    Oltramari, A., Vetere, G.: Lexicon and ontology interplay in Senso Comune. In: Proceedings of OntoLex 2008 Workshop, 6th International Conference on Language Resources and Evaluation (LREC) (2008) Google Scholar
  24. 24.
    Pierce, J.R.: Symbols, Signals and Noise—The Nature and Process of Communication. Harper Modern Science Series. Harper & Brothers (1961) Google Scholar
  25. 25.
    Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980) CrossRefGoogle Scholar
  26. 26.
    Sleator, D.D., Temperley, D.: Parsing English text with a link grammar. In: Proceedings of the 3rd International Workshop on Parsing Technologies (1993) Google Scholar
  27. 27.
    Yarowsky, D.: One sense per collocation. In: Proceeding of ARPA Human Language Technology Workshop, pp. 266–271 (1993) CrossRefGoogle Scholar
  28. 28.
    Yarowsky, D.: Decision lists for lexical ambiguity resolution: Application to accent restoration in Spanish and French. In: Proceedings of the 32nd Annual Meeting of the Association for Computational Linguistics, pp. 88–95 (1994) CrossRefGoogle Scholar
  29. 29.
    Yarowsky, D.: Unsupervised Word Sense Disambiguation rivaling supervised methods. In: Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics, pp. 189–196 (1995) CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2011

Authors and Affiliations

  1. 1.Dipartimento di InformaticaUniversità di BariBariItaly

Personalised recommendations