Abstract
This paper describes the comparison of selected distance measures in their applicability for supporting retrieval of historical spelling variants (hsv). The interdisciplinary project Rule-based search in text databases with nonstandard orthography develops a fuzzy full-text search engine for historical text documents. This engine should provide easier text access for experts as well as interested amateurs. The FlexMetric framework enhances the distance measure algorithm found to be most efficient according to the results of the evaluation. This measure can be used for multiple applications, including searching, post-ranking, transformation and even reflection about one’s own language.
Chapter PDF
References
D. Archer, A. Ernst-Gerlach, S. Kempken S, T. Pilz, and P. Rayson, The identification of spelling variants in English and German historical texts: manual or automatic?, proposed for Digital Humanities 2006, July 4–9, Paris, France.
R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval, Addison-Wesley, 2000.
D. Biella, W. Luther, T. Pilz, A web-based system for assisted literature research, Proceedings of the 3rd European Conference on e-Learning 2004, Nov. 25–26, Paris, France.
Bibliotheca Augustana, FH Augsburg; http://www.fh-augsburg.de/~harsch/augustana.html (accessed 05 Jan. 2006)
Deutsch Diachron Digital; http://www.deutschdiachrondigital.designato.de (accessed 05 Jan. 2006)
K. Erikson, Approximate Swedish Name Matching-Survey and Test of Different Algorithms, Nada report TRITA-NA-E9721, 1997.
Excalibur; http://www.eg.bucknell.edu/~excalibr/excalibur.html (accessed 05 Jan. 2006)
documentArchiv.de; http://www.documentarchiv.de (accessed 05 Jan. 2006)
T. Gadd, PHONIX: The Algorithm, Program: Automated Library and Information Systems 24(4): pp. 363–366 (1990).
Hessisches Staatsarchiv Darmstadt; http://www.stad.hessen.de/DigitalesArchiv/anfang.html (accessed 05 Jan. 2005)
S. Hockey, Living with Google: Perspectives on Humanities Computing and Digital Libraries, Literary and Linguistic Computing, 20: pp. 7–24 (2004).
S. Kempken, Bewertung von historischen und regionalen Schreibvarianten mit Hilfe von Abstandsmaβen, Thesis, Universität Duisburg-Essen (2005).
D. Knuth, The Art of Computer Programming, Vol. 3: Searching and Sorting, Addison-Wesley, pp. 391–392 (1973).
V. Levenshtein, Binary codes capable of correcting deletions, insertions and reversals, Soviet Physics Doklady, 10(8): pp. 707–710 (1966).
L. Mischke, W. Luther, Document Image De-Warping Based on Detection of Distorted Text Lines, in: Fabio Roli, Sergio Vitulano (eds.), Image Analysis and Processing-ICIAP 2005 proceedings, Cagliari, Italy, September 2005, LNCS 3617, Springer, pp. 1068–1075.
T. Pilz, Unscharfe Suche in Textdatenbanken mit nichtstandardisierter Rechtschreibung am Beispiel von Frakturtexten zur Nietzsche-Rezeption, Thesis (civil service examination), Universität Duisburg-Essen (2003).
T. Pilz, W. Luther, N. Fuhr, U. Ammon, Rule-based search in text databases with nonstandard orthography, Proceedings ACH/ALLC 2005, June 15–18 Victoria, Canada.
P. Rayson, D. Archer, N. Smith, VARD versus Word: A comparison of the UCREL variant detector and modern spell checkers on English historical corpora, Proceedings of the Corpus Linguistics 2005 conference, July 14–17, Birmingham, UK.
E. Ristad, P. Yianilos, Learning String Edit Distance, IEEE Transactions on Pattern Recognition and Machine Intelligence 20(5), pp. 522–532 (1998).
R. Wagner, J. Fischer, The String-to-String Correction Problem, Journal of the ACM 21(1), pp. 168–173 (1974).
J. Zobel, P. Dart, Phonetic String Matching: Lessons from Information Retrieval, Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 166–172 (1996).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 International Federation for Information Processing
About this paper
Cite this paper
Kempken, S., Luther, W., Pilz, T. (2006). Comparison of distance measures for historical spelling variants. In: Bramer, M. (eds) Artificial Intelligence in Theory and Practice. IFIP AI 2006. IFIP International Federation for Information Processing, vol 217. Springer, Boston, MA . https://doi.org/10.1007/978-0-387-34747-9_31
Download citation
DOI: https://doi.org/10.1007/978-0-387-34747-9_31
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-34654-0
Online ISBN: 978-0-387-34747-9
eBook Packages: Computer ScienceComputer Science (R0)