Abstract
This paper describes the work that we did at Indian School of Mines, Dhanbad towards adhoc Bengali monolingual retrieval task for FIRE 2011. During official submissions, we prepared three TD runs using TERRIER search retrieval system without query expansion. When we used YASS stemmer we received substantially improved retrieval performance. Post-submission, we also developed a statistical stemmer based on frequent pattern mining using apriori-like algorithm taken from market basket data analysis. Initial results that we received for our stemmer showed noticeable retrieval performance gain over no-stem runs. Although this performance-gain is lower than that of YASS, we believe that it is promising enough to fine-tune the stemmer towards better results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Majumder, P., Mitra, M., Parui, S.K., Kole, G., Mitra, P., Datta, K.: Yass: Yet another suffix stripper. ACM Trans. Inf. Syst. 25(4) (October 2007)
Xu, J., Croft, W.B.: Corpus-based stemming using cooccurrence of word variants. ACM Trans. Inf. Syst. 16(1), 61–81 (1998)
Porter, M.F.: Readings in information retrieval, pp. 313–316. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Oard, D.W., Levow, G.-A., Cabezas, C.I.: CLEF experiments at maryland: Statistical stemming and backoff translation. In: Peters, C. (ed.) CLEF 2000. LNCS, vol. 2069, pp. 176–187. Springer, Heidelberg (2001)
Bacchin, M., Ferro, N., Melucci, M.: A probabilistic model for stemmer generation. Inf. Process. Manage. 41(1), 121–137 (2005)
Paik, J.H., Mitra, M., Parui, S.K., Järvelin, K.: Gras: An effective and efficient stemming algorithm for information retrieval. ACM Trans. Inf. Syst. 29(4), 19:1–19:24 (2011)
Pal, S., Bagchi, A.: Association against dissociation: some pragmatic considerations for frequent itemset generation under fixed and variable thresholds. SIGKDD Explor. Newsl. 7(2), 151–159 (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Banerjee, R., Pal, S. (2013). ISM@FIRE-2011 Bengali Monolingual Task: A Frequency-Based Stemmer. In: Majumder, P., Mitra, M., Bhattacharyya, P., Subramaniam, L.V., Contractor, D., Rosso, P. (eds) Multilingual Information Access in South Asian Languages. Lecture Notes in Computer Science, vol 7536. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-40087-2_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-40087-2_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-40086-5
Online ISBN: 978-3-642-40087-2
eBook Packages: Computer ScienceComputer Science (R0)