Abstract
It is well known that integration of multiple OCR outputs can give higher performance than a single OCR. This idea was applied to the printed Japanese recognition and better performance was obtained. In the conventional experiments, however, the zoning, i.e. the extraction of the text region, was done manually and this has been a serious problem from the practical point of view. To solve the problem, an approach to match automatically the classified regions outputted by multiple OCRs was proposed. By the proposed method, a high recognition rate of 98.8% was obtained from OCR systems whose performance is no better than 97.6%.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Baldonado, M., Chang, C.-C.K., Gravano, L., Paepcke, A.: The Stanford Digital Library Metadata Architecture. Int. J. Digit. Libr. 1, 108–121 (1997)
Bruce, K.B., Cardelli, L., Pierce, B.C.: Comparing Object Encodings. In: Ito, T., Abadi, M. (eds.) TACS 1997. LNCS, vol. 1281, pp. 415–438. Springer, Heidelberg (1997)
van Leeuwen, J. (ed.): Computer Science Today. LNCS, vol. 1000. Springer, Heidelberg (1995)
Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Programs, 3rd edn. Springer-Verlag, Berlin Heidelberg, New York (1996)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nakano, Y., Hananoi, T., Miyao, H., Maruyama, M., Maruyama, Ki. (2004). A Document Analysis System Based on Text Line Matching of Multiple OCR Outputs. In: Marinai, S., Dengel, A.R. (eds) Document Analysis Systems VI. DAS 2004. Lecture Notes in Computer Science, vol 3163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28640-0_44
Download citation
DOI: https://doi.org/10.1007/978-3-540-28640-0_44
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23060-1
Online ISBN: 978-3-540-28640-0
eBook Packages: Springer Book Archive