Advertisement

Two Algorithms for Probabilistic Stemming

  • Massimo Melucci
  • Nicola Orio
Chapter
  • 726 Downloads
Part of the The Information Retrieval Series book series (INRE, volume 22)

Abstract

This chapter describes two algorithms for probabilistic stemming. A probabilistic stemmer aims at detecting word stems by using a probabilistic or statistical model with no or very little knowledge about the language for which the stemmer has been built. While illustrating two probabilistic stemming models, a reflection and an analysis of the potentialities of this approach to stemming in the context of information retrieval are made.

Keywords

stemming multilingual information retrieval statistical models hidden Markov models 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bacchin, M., Ferro, N., Melucci, M.: A probabilistic model for stemmer generation. Information Processing and Management 41(1), 121–137 (2005). ElsevierCrossRefGoogle Scholar
  2. 2.
    Durbin, R., Eddy, S., Krogh, A., Mitchison, G.: Biological Sequence Analysis. Cambridge University Press, Cambridge, UK (2000)Google Scholar
  3. 3.
    Frakes, W.: Stemming algorithms. In: W. Frakes, R. Baeza-Yates (eds.) Information Retrieval: data structures and algorithms., chap. 8. Prentice Hall, Englewood Cliffs, NJ (1992)Google Scholar
  4. 4.
    Frakes, W., Baeza-Yates, R. (eds.): Information Retrieval: data structures and algorithms. Prentice Hall, Englewood Cliffs, NJ (1992)Google Scholar
  5. 5.
    Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Computational Linguistics 27(2), 154–198 (2001)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Hafer, M., Weiss, S.: Word segmentation by letter successor varieties. Information Storage and Retrieval 10, 371–385 (1974)CrossRefGoogle Scholar
  7. 7.
    Harman, D.: How effective is suffixing. Journal of the American Society for Information Science 42(1), 7–15 (1991)CrossRefGoogle Scholar
  8. 8.
    Kleinberg, J.: Authorative sources in a hyperlinked environment. Journal of the ACM 46(5), 604–632 (1999)zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Krovetz, R.: Viewing Morphology as an Inference Process,. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 1–203 (1993)Google Scholar
  10. 10.
    Lovins, J.: Development of a stemming algorithm. Mechanical Translation and Computational Linguistics 11, 22–31 (1968)Google Scholar
  11. 11.
    Melucci, M., Orio, N.: Design, implementation, and evaluation of a methodology for automatic stemmer generation. Journal of the American Society for Information Science and Technology 58(5), 673–686 (2007)CrossRefGoogle Scholar
  12. 12.
    Paice, C.: Constructing literature abstract by computer: techniques and prospects. Information Processing & Management 26(1), 171–186 (1990)CrossRefGoogle Scholar
  13. 13.
    Popovic, M., Willett, P.: The effectiveness of stemming for natural language access to Slovene textual data. Journal of the American Society for Information Science 43(5), 384–390 (1992)CrossRefGoogle Scholar
  14. 14.
    Porter, M.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)Google Scholar
  15. 15.
    Rabiner, L., Juang, B.: Fundamentals of speech recognition. Prentice Hall, Englewood Cliffs, NJ (1993)Google Scholar
  16. 16.
    Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the ACM International Conference on Research and Development in Information Retrieval (SIGIR), pp. 21–29. ACM Press, Zurich, Switzerland (1996)Google Scholar
  17. 17.
    Viterbi, A.: Error bounds for convolutional codes and an asymptotically decoding algorithm. IEEE Transactions on Knowledge and Data Engineering 13, 260–269 (1967)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Massimo Melucci
    • 1
  • Nicola Orio
    • 1
  1. 1.Department of Information EngineeringUniversity of PaduaVia Gradenigo 6/aItaly

Personalised recommendations