Skip to main content

Generating Variant Keyword Forms for a Morphologically Complex Language Leads to Successful Information Retrieval with Finnish

  • Conference paper
  • 973 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7356))

Abstract

This paper discusses information retrieval of Finnish and keyword variation management by generating inflected variant keyword forms. Finnish is a highly inflectional language, and thus keyword variation management of queries and query indexes is of utter importance for successful Finnish full-text retrieval. In the paper we show that generation of a quite small number of variant keyword forms leads to good retrieval performance using a probabilistic best-match retrieval system (Lemur). Generation of almost the full paradigm of inflected nominal forms improves the results slightly. We have also interesting results with regards to different index types: our evaluation shows that generated inflected queries behave extremely well in a lemmatized index, which is supposedly not suitable for this query type. We also show that in a research environment even inexact generation that produces lots of incorrect inflected forms achieves high precision-recall performance without considerable loss in query throughput effectiveness. We use two different word form generators and their variants and compare the results to commonly used reductive word form variation management methods, stemming and lemmatization. The paper includes also a short discussion about usage of the variant keyword method with Web search engines.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lovins, J.B.: Development of a Stemming Algorithm. Mechanical Translation and Computational Linguistics 11, 23–31 (1968)

    Google Scholar 

  2. Harman, D.: How Effective is Suffixing? Journal of the American Society for Information Science 42, 7–15 (1991)

    Article  MathSciNet  Google Scholar 

  3. Frakes, W.B.: Stemming Algorithms. In: Frakes, W.B., Baeza-Yates, R. (eds.) Information Retrieval, pp. 131–160. Prentice-Hall, Upper Saddle River (1992)

    Google Scholar 

  4. Hull, D.: Stemming Algorithms: a Case Study for Detailed Evaluation. Journal of the American Society for Information Science 47, 70–84 (1996)

    Article  Google Scholar 

  5. Hollink, V., Kamps, J., Monz, C., de Rijke, M.: Monolingual Document Retrieval for European Languages. Information Retrieval 7, 33–52 (2004)

    Article  Google Scholar 

  6. Galvez, C., de Moya-Anegón, F., Herrero-Solana, V.: Term Conflation Methods in Information Retrieval. Non-linguistic and Linguistic Approaches. Journal of Documentation 61, 520–547 (2005)

    Google Scholar 

  7. Koskenniemi, K.: Finite State Morphology and Information Retrieval. Natural Language Engineering 2, 331–336 (1996)

    Article  Google Scholar 

  8. Loponen, A., Järvelin, K.: A Dictionary- and Corpus-Independent Statistical Lemmatizer for Information Retrieval in Low Resource Languages. In: Agosti, M., Ferro, N., Peters, C., de Rijke, M., Smeaton, A. (eds.) CLEF 2010. LNCS, vol. 6360, pp. 3–14. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  9. Karlsson, F.: Suomen kielen äänne- ja muotorakenne. WSOY, Helsinki (1983)

    Google Scholar 

  10. Kettunen, K., Airio, E., Järvelin, K.: Restricted Inflectional Form Generation in Management of Morphological Keyword Variation. Information Retrieval 10, 415–444 (2007)

    Article  Google Scholar 

  11. Kettunen, K., Airio, E.: Is a Morphologically Complex Language Really that Complex in Full-Text Retrieval? In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 411–422. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  12. Kettunen, K.: Automatic Generation of Frequent Case Forms of Query Keywords in Text Retrieval. In: Nordström, B., Ranta, A. (eds.) GoTAL 2008. LNCS (LNAI), vol. 5221, pp. 222–236. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  13. Leturia, I., Gurrutxaga, A., Areta, N., Alegria, I., Ezeiza, A.: Eusbila, a Search Service Designed for the Agglutinative Nature of Basque. In: Lazarinis, F., Vilares, J., Tait, J.I. (eds.) First Workshop on Improving Non English Web Searching (ACM Sigir 2007 Workshop), pp. 47–54 (2007)

    Google Scholar 

  14. Leturia, I., Gurrutxaga, A., Areta, N., Pociello, E.: Analysis and Performance of Morphological Query Expansion and Language-filtering Words on Basque Web Searching. In: 6th International Conference on Language Resources and Evaluations (LREC), Marrakech (2008)

    Google Scholar 

  15. Paik, J.H., Kettunen, K., Pal, D., Järvelin, K.: Frequent Case Generation in ad hoc Retrieval of Three Indian Languages–Bengali, Gujarati and Marathi. To appear in Proceedings of FIRE 2011 (2012)

    Google Scholar 

  16. Kettunen, K., Kunttu, T., Järvelin, K.: To Stem or Lemmatize a Highly Inflectional Language in Probabilistic IR Environment? Journal of Documentation 61, 476–496 (2005)

    Article  Google Scholar 

  17. Snowball web site, http://snowball.tartarus.org/

  18. Lingsoft, http://www.lingsoft.fi

  19. Kettunen, K., Baskaya, F.: Stemming Finnish for Information Retrieval–Comparison of an Old and a New Rule-based Stemmer. In: Vetulani, Z. (ed.) Proceedings of the 5th Language & Technology Conference (LTC 2011), Poznan, pp. 476–480 (2011)

    Google Scholar 

  20. Sanderson, M.: Test Collection Based Evaluation of Information Retrieval Systems. Foundations and Trends in Information Retrieval 4, 247–375 (2010)

    Article  MATH  Google Scholar 

  21. Bar-Ilan, J., Gutman, T.: How do Search Engines Respond to Some Non-English Queries? Journal of Information Science 31, 13–28 (2005)

    Article  Google Scholar 

  22. Jansen, B., Spink, A., Sarasevic, T.: Real Life, Real Users, and Real Needs: a Study and Analysis of User Queries on the Web. Information Processing & Management 36, 207–227 (2000)

    Article  Google Scholar 

  23. Jansen, B., Spink, A.: An Analysis of Web Searching by European Alltheweb.com Users. Information Processing and Management 41, 361–381 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kettunen, K., Arvola, P. (2012). Generating Variant Keyword Forms for a Morphologically Complex Language Leads to Successful Information Retrieval with Finnish. In: Salampasis, M., Larsen, B. (eds) Multidisciplinary Information Retrieval. IRFC 2012. Lecture Notes in Computer Science, vol 7356. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31274-8_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31274-8_10

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31273-1

  • Online ISBN: 978-3-642-31274-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics