Abstract
In this paper, a reliable and efficient structural analysis method for mathematical formulae is proposed for practical mathematical OCR. The proposed method consists of three steps. In the first step, a fast structural analysis algorithm is performed on each mathematical formula to obtain a tree representation of the formula. This step generally provides a correct tree representation but sometimes provides an erroneous representation. Therefore, the tree representation is verified by the following two steps. In the second step, the result of the analysis step, (i.e., a tree representation) is converted into a one-dimensional representation. The third step is a verification step where the one-dimensional representation is parsed by a formula description grammar, which is a context-free grammar specialized for mathematical formulae. If the one-dimensional representation is not accepted by the grammar, the result of the analysis step is detected as an erroneous result and alarmed to OCR users. This three-step organization achieves reliable and efficient structural analysis without any two-dimensional grammars.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Anderson, R.H.: Syntax-directed recognition of hand-printed two-dimensional mathematics. In: Klerer, M., Reinfelds, J. (eds.) Interactive Systems for Experimental Applied Mathematics, pp. 436–459. Academic Press, London (1968)
Chan, K.-F., Yeung, D.-Y.: Mathematical expression recognition: a survey. Int. J. Doc. Anal. Recognit. 3(1), 3–15 (2000)
Chou, P.A.: Recognition of equations using a two-dimensional stochastic context-free grammar. In: Proc. SPIE, vol. 1199(2), pp. 852–863 (1989)
Eto, Y., Suzuki, M.: Mathematical Formula Recognition Using Virtual Link Network. In: Proc. ICDAR, pp. 430–437 (2001)
Fateman, R.J., Tokuyasu, T., Berman, B.P., Mitchell, N.: Optical character recognition and parsing of typeset mathematics. Journal of Visual Communication and Image Representation 7(1), 2–15 (1996)
Garain, U., Chaudhuri, B.B.: ‘A syntactic approach for processing mathematical expressions in printed documents. In: Proc. ICPR, vol. 4(4), pp. 523–526 (2000)
Okamoto, M., Miao, B.: Recognition of mathematical expressions by using the layout structure of symbols. In: Proceedings of First International Conference on Document Analysis and Recognition Saint Malo, pp. 242–250 (1991)
Suzuki, M., Tamari, F., Fukuda, R., Uchida, S., Kanahori, T.: ‘INFTY — An integrated OCR system for mathematical documents. In: Proc. ACM Symposium on Document Engineering, pp. 95–104 (2003)
Suzuki, M., Uchida, S., Nomura, A.: A ground-truthed mathematical character and symbol image database. In: Proc. ICDAR, vol. 2(2), pp. 675–679 (2005)
Toumit, J.-Y., Garcia-Salicetti, S., Emptoz, H.: A hierarchical and recursive model of mathematical expressions for automatic reading of mathematical documents. In: Proc. ICDAR, pp. 119–122 (1999)
Zanibbi, R., Blostein, D., Cordy, J.R.: Recognizing mathematical expressions using tree transformation. IEEE Trans. Pattern Anal. Mach. Intell. 24(11), 1455–1467 (2002)
Infty-Reader, http://www.inftyproject.org/en/download.html
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Toyota, S., Uchida, S., Suzuki, M. (2006). Structural Analysis of Mathematical Formulae with Verification Based on Formula Description Grammar. In: Bunke, H., Spitz, A.L. (eds) Document Analysis Systems VII. DAS 2006. Lecture Notes in Computer Science, vol 3872. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11669487_14
Download citation
DOI: https://doi.org/10.1007/11669487_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-32140-8
Online ISBN: 978-3-540-32157-6
eBook Packages: Computer ScienceComputer Science (R0)