Abstract
Keyword spotting techniques are becoming cost-effective solutions for information retrieval in handwritten documents. We explore the extension of the single-word, line-level probabilistic indexing approach described in [1, 2] to allow page-level Boolean combinations of several single-keyword queries. We propose heuristic rules to combine the single-word relevance probabilities into probabilistically consistent confidence scores of the multi-word boolean combinations. As a preliminary study, this paper focuses on evaluating the search performance of word-pair queries involving just one OR or AND Boolean operation. Empirical results of this study support the proposed approach and clearly show its effectiveness.
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This work was partially supported by the Generalitat Valenciana under the Prometeo/2009/014 project grant ALMAMATER, and through the EU projects: HIMANIS (JPICH programme, Spanish grant Ref. PCIN-2015-068) and READ (Horizon-2020 programme, grant Ref. 674943).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
- 2.
Note that these statistics were obtained without any kind of tokenization; that is, each non-blank sequence of characters is assumed to be a “word”.
- 3.
References
Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: HMM word-graph based keyword spotting in handwritten document images. Int. J. Inf. Sci. 370, 497–518 (2015)
Toselli, A.H., Vidal, E., Romero, V., Frinken, V.: Word-graph based keyword spotting and indexing of handwritten document images. Technical report, Universitat Politècnica de València (2013)
Sánchez, J., Mühlberger, G., Gatos, B., Schofield, P., Depuydt, K., Davis, R., Vidal, E., de Does, J.: tranScriptorium: an European project on handwritten text recognition. In: DocEng, pp. 227–228 (2013)
Jelinek, F.: Statistical Methods for Speech Recognition. MIT Press, Cambridge (1998)
Causer, T., Wallace, V.: Building a volunteer community: results and findings from Transcribe Bentham. Digit. Humanit. Q. 6(2) (2012)
Sanchez, J.A., Romero, V., Toselli, A., Vidal, E.: ICFHR2014 Competition on Handwritten Text Recognition on Transcriptorium Datasets (HTRtS). In: 2014 14th International Conference on Frontiers in Handwriting Recognition (ICFHR), pp. 785–790, September 2014
Manning, C.D., Raghavan, P., Schutze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)
Zhu, M.: Recall, Precision and Average Precision. Working Paper 2004–09 Department of Statistics & Actuarial Science, University of Waterloo, 26 August 2004
Robertson, S.: A new interpretation of average precision. In: Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2008), pp. 689–690. ACM, New York (2008)
Kozielski, M., Forster, J., Ney, H.: Moment-based image normalization for handwritten text recognition. In: Proceedings of the 2012 International Conference on Frontiers in Handwriting Recognition, ICFHR 2012, pp. 256–261. IEEE Computer Society, Washington, DC (2012)
Young, S., Odell, J., Ollason, D., Valtchev, V., Woodland, P.: The HTK Book: Hidden Markov Models Toolkit V2.1. Cambridge Research Laboratory Ltd. (1997)
Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D.: The HTK Book: Hidden Markov Models Toolkit V3.4. Microsoft Corporation & Cambridge Research Laboratory Ltd., March 2009
Toselli, A., Vidal, E.: Handwritten text recognition results on the Bentham collection with improved classical N-gram-HMM methods. In: 3rd International Workshop on Historical Document Imaging and Processing (HIP 2015), pp. 15–22, August 2015
Kneser, R., Ney, H.: Improved backing-off for N-gram language modeling. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP 1995), vol. 1, Los Alamitos, CA, USA, pp. 181–184. IEEE Computer Society (1995)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Noya-García, E., Toselli, A.H., Vidal, E. (2017). Simple and Effective Multi-word Query Spotting in Handwritten Text Images. In: Alexandre, L., Salvador Sánchez, J., Rodrigues, J. (eds) Pattern Recognition and Image Analysis. IbPRIA 2017. Lecture Notes in Computer Science(), vol 10255. Springer, Cham. https://doi.org/10.1007/978-3-319-58838-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-58838-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-58837-7
Online ISBN: 978-3-319-58838-4
eBook Packages: Computer ScienceComputer Science (R0)