Abstract
In document analysis line segmentation is a necessary prerequisite step for further analysing of textual components. While much work has been devoted to line segmentation of regular text documents, this work can not be easily adopted to documents that contain specialist components such as tables or mathematical expressions. In this paper we concentrate on a line segmentation technique for documents containing mathematical expressions, which, due to their two dimensional structure are often comprised of multiple distinct lines. We present an approach to line segmentation in the presence of mathematics that is based on a set of histogram measures and heuristics considering vertical and horizontal distances of characters only. The method also provides a technique to distinguish consecutive lines that are vertically overlapped but belong to different mathematical expressions. Experiments on data sets of 200 and 1000 maths pages, respectively, show a high rate of accuracy.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Alkalai, M., Baker, J., Sorge, V., Lin, X.: Improving formula analysis with line and mathematics identification. In: Proc. of ICDAR (to appear, 2013)
Boussellaa, W., Zahour, A., El Abed, H., BenAbdelhafid, A., Alimi, A.: Unsupervised block covering analysis for text-line segmentation of arabic ancient handwritten document images. In: ICPR, pp. 1929–1932 (2010)
Marti, U., Bunke, H.: On the influence of vocabulary size and language models in unconstrained handwritten text recognition. In: Proc. of ICDAR 2001, pp. 260–265. IEEE Computer Society (2001)
O’Gorman, L.: The document spectrum for page layout analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 15(11), 1162–1173 (1993)
Saabni, R., El-Sana, J.: Language-independent text lines extraction using seam carving. In: Document Analysis and Recognition, pp. 563–568. IEEE Computer Society (2011)
Wong, K., Casey, R., Wahl, F.: Document analysis system. IBM Journal of Research and Development 26(6), 647–656 (1982)
Zanibbi, R.: Recognition of mathematics notation via computer using baseline structure. Technical report, Queen’s University, Kingston, Canada (2000)
Zanibbi, R., Blostein, D.: Recognition and retrieval of mathematical expressions. IJDAR 15(4), 331–357 (2012)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Alkalai, M., Sorge, V. (2013). A Histogram-Based Approach to Mathematical Line Segmentation. In: Ruiz-Shulcloper, J., Sanniti di Baja, G. (eds) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. CIARP 2013. Lecture Notes in Computer Science, vol 8258. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-41822-8_56
Download citation
DOI: https://doi.org/10.1007/978-3-642-41822-8_56
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-41821-1
Online ISBN: 978-3-642-41822-8
eBook Packages: Computer ScienceComputer Science (R0)