Skip to main content

Fault-Tolerant Fulltext Information Retrieval in Digital Multilingual Encyclopedias with Weighted Pattern Morphing

  • Conference paper
Book cover Advances in Information Retrieval (ECIR 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2997))

Included in the following conference series:

Abstract

This paper introduces a new approach to add fault-tolerance to a fulltext retrieval system. The weighted pattern morphing technique circumvents some of the disadvantages of the widely used edit distance measure and can serve as a front end to almost any fast non fault-tolerant search engine. The technique enables approximate searches by carefully generating a set of modified patterns (morphs) from the original user pattern and by searching for promising members of this set by a non fault-tolerant search backend. Morphing is done by recursively applying so called submorphs, driven by a penalty weight matrix. The algorithm can handle phonetic similarities that often occur in multilingual scientific encyclopedias as well as normal typing errors such as omission or swapping of letters. We demonstrate the process of filtering out less promising morphs. We also show how results from approximate search experiments carried out on a huge encyclopedic text corpus were used to determine reasonable parameter settings.

A commercial pharmaceutic CD-ROM encyclopedia, a dermatological online encyclopedia and an online e-Learning system use an implementation of the presented approach and thus prove its “road capability”.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bruchhausen, F.v., et al.: Hagers Handbuch der Pharmazeutischen Praxis. 10(+2) Bände. u. Folgebände. Springer, Heidelberg (1992-2000)

    Google Scholar 

  2. Blaschek, W., Ebel, S., Hackenthal, E., Holzgrabe, U., Keller, K., Reichling, J. (eds.): HagerROM 2003 - Hagers Handbuch der Drogen und Arzneistoffe. CD-ROM. Springer, Heidelberg (2003), http://www.hagerrom.de

    Google Scholar 

  3. Kukich, K.: Technique for automatically correcting words in text. ACM Computing Surveys 24(4), 377–439 (1992)

    Article  Google Scholar 

  4. Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing Methods for Approximate String Matching. IEEE Bulletin of the Technical Committee on Data Engineering 24(4), 19–27 (2001)

    Google Scholar 

  5. Navarro, G., Baeza-Yates, R.: A Practical q-Gram Index for Text Retrieval Allowing Errors. CLEI Electronic Journal 1(2), 1 (1998)

    Google Scholar 

  6. Sutinen, E.: Filtration with q-Samples in Approximate String Matching. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 50–63. Springer, Heidelberg (1996)

    Google Scholar 

  7. Ukkonen, E.: Approximate string-matching with q-grams and maximal matches. Theoretical Computer Science 92, 191–211 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  8. Russell, R.: INDEX (Soundex Patent). U.S. Patent No. 1,261 167, 1–4 (1918)

    Google Scholar 

  9. Zobel, J., Dart, P.: Phonetic String Matching: Lessons from Information Retrieval. In: SIGIR 1996, pp. 166–172. ACM Press, New York (1996)

    Chapter  Google Scholar 

  10. Hodge, V., Austin, J.: An Evaluation of Phonetic Spell Checkers. Dept. of CS, University of York, U.K. (2001)

    Google Scholar 

  11. Jokinen, P., Ukkonen, E.: Two algorithms for approximate string matching in static texts. In: Tarlecki, A. (ed.) MFCS 1991. LNCS, vol. 520, pp. 240–248. Springer, Heidelberg (1991)

    Google Scholar 

  12. Myers, E.: A sublinear algorithm for approximate keyword searching. Algorithmica 12(4/5), 345–374 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  13. Levenshtein, V.: Binary codes capable of correcting deletions, insertions, and reversals. Problems in Information Transmission 1, 8–17 (1965)

    Google Scholar 

  14. Stephen, G.: String Searching Algorithms. Lecture Notes Series on Computing, vol. 3. World Scientific Publishing, Singapore (1994)

    MATH  Google Scholar 

  15. Rosenfelder, M.: Hou tu pranownse Inglish (2003), http://www.zompist.com/spell.html

  16. Grimm, M.: Random Access und Caching für q-Gramm-Suchverfahren. Lehrstuhl für Informatik II, Universität Würzburg (2001)

    Google Scholar 

  17. Projekt DEJAVU: Homepage (2003), http://www.projekt-dejavu.de

  18. Altmeyer, P., Bacharach-Buhles, M.: Springer Enzyklopädie Dermatologie, Allergologie, Umweltmedizin. Springer, Heidelberg (2002), http://www.galderma.de/anmeldung_enz.html

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Esser, W.M. (2004). Fault-Tolerant Fulltext Information Retrieval in Digital Multilingual Encyclopedias with Weighted Pattern Morphing. In: McDonald, S., Tait, J. (eds) Advances in Information Retrieval. ECIR 2004. Lecture Notes in Computer Science, vol 2997. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24752-4_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-24752-4_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-21382-6

  • Online ISBN: 978-3-540-24752-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics