Skip to main content

ISM@FIRE-2011 Bengali Monolingual Task: A Frequency-Based Stemmer

  • Conference paper
Multilingual Information Access in South Asian Languages

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7536))

Abstract

This paper describes the work that we did at Indian School of Mines, Dhanbad towards adhoc Bengali monolingual retrieval task for FIRE 2011. During official submissions, we prepared three TD runs using TERRIER search retrieval system without query expansion. When we used YASS stemmer we received substantially improved retrieval performance. Post-submission, we also developed a statistical stemmer based on frequent pattern mining using apriori-like algorithm taken from market basket data analysis. Initial results that we received for our stemmer showed noticeable retrieval performance gain over no-stem runs. Although this performance-gain is lower than that of YASS, we believe that it is promising enough to fine-tune the stemmer towards better results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Majumder, P., Mitra, M., Parui, S.K., Kole, G., Mitra, P., Datta, K.: Yass: Yet another suffix stripper. ACM Trans. Inf. Syst. 25(4) (October 2007)

    Google Scholar 

  2. Xu, J., Croft, W.B.: Corpus-based stemming using cooccurrence of word variants. ACM Trans. Inf. Syst. 16(1), 61–81 (1998)

    Article  Google Scholar 

  3. Porter, M.F.: Readings in information retrieval, pp. 313–316. Morgan Kaufmann Publishers Inc., San Francisco (1997)

    Google Scholar 

  4. Oard, D.W., Levow, G.-A., Cabezas, C.I.: CLEF experiments at maryland: Statistical stemming and backoff translation. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 176–187. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  5. Bacchin, M., Ferro, N., Melucci, M.: A probabilistic model for stemmer generation. Inf. Process. Manage. 41(1), 121–137 (2005)

    Article  Google Scholar 

  6. Paik, J.H., Mitra, M., Parui, S.K., Järvelin, K.: Gras: An effective and efficient stemming algorithm for information retrieval. ACM Trans. Inf. Syst. 29(4), 19:1–19:24 (2011)

    Article  Google Scholar 

  7. Pal, S., Bagchi, A.: Association against dissociation: some pragmatic considerations for frequent itemset generation under fixed and variable thresholds. SIGKDD Explor. Newsl. 7(2), 151–159 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Banerjee, R., Pal, S. (2013). ISM@FIRE-2011 Bengali Monolingual Task: A Frequency-Based Stemmer. In: Majumder, P., Mitra, M., Bhattacharyya, P., Subramaniam, L.V., Contractor, D., Rosso, P. (eds) Multilingual Information Access in South Asian Languages. Lecture Notes in Computer Science, vol 7536. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40087-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-40087-2_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-40086-5

  • Online ISBN: 978-3-642-40087-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics