Text Retrieval through Corrupted Queries

Otero, Juan; Vilares, Jesús; Vilares, Manuel

doi:10.1007/978-3-540-88309-8_37

Juan Otero⁵,
Jesús Vilares⁶ &
Manuel Vilares⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5290))

Included in the following conference series:

Ibero-American Conference on Artificial Intelligence

1317 Accesses

Abstract

Our work relies on the design and evaluation of experimental information retrieval systems able to cope with textual misspellings in queries. In contrast to previous proposals, commonly based on the consideration of spelling correction strategies and a word language model, we also report on the use of character n-grams as indexing support.

Research partially supported by the Spanish Government under project HUM2007-66607-C04-02 and HUM2007-66607-C04-03; and the Autonomous Government of Galicia under projects PGIDIT07SIN005206PR, PGIDIT05PXIC30501PN, the Network for Language Processing and Information Retrieval and ”Axuda para a consolidación e estruturació n de unidades de investigación”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Amati, G., van Rijsbergen, C.-J.: Probabilistic models of Information Retrieval based on measuring divergence from randomness. ACM Transactions on Information Systems 20(4), 357–389 (2002)
Article Google Scholar
Cross-Language Evaluation Forum (visited, July 2008), http://www.clef-campaign.org
Collins-Thompson, K., Schweizer, C., Dumais, S.: Improved string matching under noisy channel conditions. In: Proc. of the 10th Int. Conf. on Information and Knowledge Management, pp. 357–364 (2001)
Google Scholar
Damerau, F.: A technique for computer detection and correction of spelling errors. Communications of the ACM 7(3) (March 1964)
Google Scholar
Graña, J., Alonso, M.A., Vilares, M.: A common solution for tokenization and part-of-speech tagging: One-pass Viterbi algorithm vs. iterative approaches. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2002. LNCS (LNAI), vol. 2448, pp. 3–10. Springer, Heidelberg (2002)
Chapter Google Scholar
Lam-Adesina, A.M., Jones, G.J.F.: Examining and improving the effectiveness of relevance feedback for retrieval of scanned text documents. Information Processing Management 42(3), 633–649 (2006)
Article Google Scholar
McNamee, P., Mayfield, J.: Character N-gram Tokenization for European Language Text Retrieval. Information Retrieval 7(1-2), 73–97 (2004)
Article Google Scholar
McNamee, P., Mayfield, J.: jhu/apl experiments in tokenization and non-word translation. In: Peters, C., Gonzalo, J., Braschler, M., Kluck, M. (eds.) CLEF 2003. LNCS, vol. 3237, pp. 85–97. Springer, Heidelberg (2004)
Google Scholar
Mittendorf, E., Schauble, P.: Measuring the effects of data corruption on information retrieval. In: Symposium on Document Analysis and Information Retrieval, p. XX (1996)
Google Scholar
Mittendorf, E., Schäuble, P.: Information retrieval can cope with many errors. Information Retrieval 3(3), 189–216 (2000)
Article MATH Google Scholar
Mittendorfer, M., Winiwarter, W.: A simple way of improving traditional ir methods by structuring queries. In: Proc. of the 2001 IEEE Int. Workshop on Natural Language Processing and Knowledge Engineering (NLPKE 2001) (2001)
Google Scholar
Mittendorfer, M., Winiwarter, W.: Exploiting syntactic analysis of queries for information retrieval. Data & Knowledge Engineering 42(3), 315–325 (2002)
Article MATH Google Scholar
Nardi, A., Peters, C., Vicedo, J.L.: Results of the CLEF 2006 Cross-Language System Evaluation Campaign, Working Notes of the CLEF 2006 Workshop, Alicante, Spain, September 20-22 (2006) [2]
Google Scholar
Otero, J., Graña, J., Vilares, M.: Contextual Spelling Correction. In: Moreno Díaz, R., Pichler, F., Quesada Arencibia, A. (eds.) EUROCAST 2007. LNCS, vol. 4739, pp. 290–296. Springer, Heidelberg (2007)
Chapter Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Google Scholar
Ruch, P.: Using contextual spelling correction to improve retrieval effectiveness in degraded text collections. In: Proc. of the 19th Int. Conf. on Computational Linguistics, pp. 1–7 (2002)
Google Scholar
Savary, A.: Typographical nearest-neighbor search in a finite-state lexicon and its application to spelling correction. In: Watson, B.W., Wood, D. (eds.) CIAA 2001. LNCS, vol. 2494, pp. 251–260. Springer, Heidelberg (2003)
Chapter Google Scholar
Taghva, K., Borsack, J., Condit, A.: Results of applying probabilistic ir to ocr text. In: Proc. of the 17th Int. ACM SIGIR Conf. on Research and Development in Information Retrieval. Performance Evaluation, pp. 202–211 (1994)
Google Scholar
Takasu, A.: An approximate multi-word matching algorithm for robust document retrieval. In: CIKM 2006: Proc. of the 15th ACM Int. Conf. on Information and Knowledge Management, pp. 34–42 (2006)
Google Scholar
http://ir.dcs.gla.ac.uk/terrier/ (visited, July 2008)
Vilares, M., Otero, J., Graña, J.: On asymptotic finite-state error repair. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 271–272. Springer, Heidelberg (2004)
Google Scholar
Viterbi, A.: Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Trans. Information Theory IT-13, 260–269 (1967)
Article Google Scholar
Véronis, J.: Multext-corpora: An annotated corpus for five European languages. cd-rom, Distributed by elra/elda (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Vigo, Campus As Lagoas s/n, 32004, Ourense, Spain
Juan Otero & Manuel Vilares
Department of Computer Science, University of A Coruña, Campus Elviña s/n, 15071, A Coruña, Spain
Jesús Vilares

Authors

Juan Otero
View author publications
You can also search for this author in PubMed Google Scholar
Jesús Vilares
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Vilares
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

ICREA & Universitat Pompeu Fabra, Paseo de Circumvalacion 8, 08003, Barcelona, Spain
Hector Geffner
IST-UTL and INESC-ID, Av. Prof. Cavaco Silva - Taguspark, 2744-016, Porto Salvo, Portugal
Rui Prada
ADETTI/ISCTE and ISCTE, Lisbon University Institute, Av. das Forças Armadas, 1649-026, Lisbon, Portugal
Isabel Machado Alexandre
ADETTI/ISCTE and ISCTE, Lisbon University Institute, , Av. das Forças Armadas, 1649-026, Lisbon, Portugal
Nuno David

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Otero, J., Vilares, J., Vilares, M. (2008). Text Retrieval through Corrupted Queries. In: Geffner, H., Prada, R., Machado Alexandre, I., David, N. (eds) Advances in Artificial Intelligence – IBERAMIA 2008. IBERAMIA 2008. Lecture Notes in Computer Science(), vol 5290. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-88309-8_37

Download citation

DOI: https://doi.org/10.1007/978-3-540-88309-8_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-88308-1
Online ISBN: 978-3-540-88309-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics