Abstract
Scanned then OCRed documents usually lack detailed layout and structural information. We present a book specific layout analysis system used to extract TOC structure information from the scanned and OCRed books. This system was used for navigation purposes by the live books search project. We provide labeling scheme for the TOC sections of the books, high level overview for the book layout analysis system, as well as TOC Structure Extraction Engine. In the end we present accuracy measurements of this system on a representative test set.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ye, M., Viola, P.: Learning to Parse Hierarchical Lists and Outlines Using Conditional Random Fields. In: Proceedings of the Ninth international Workshop on Frontiers in Handwriting Recognition, pp. 154–159. IEEE Computer Society, Washington (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dresevic, B., Uzelac, A., Radakovic, B., Todic, N. (2009). Book Layout Analysis: TOC Structure Extraction Engine. In: Geva, S., Kamps, J., Trotman, A. (eds) Advances in Focused Retrieval. INEX 2008. Lecture Notes in Computer Science, vol 5631. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03761-0_17
Download citation
DOI: https://doi.org/10.1007/978-3-642-03761-0_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03760-3
Online ISBN: 978-3-642-03761-0
eBook Packages: Computer ScienceComputer Science (R0)