Skip to main content

TOC Structure Extraction from OCR-ed Books

  • Conference paper
Focused Retrieval of Content and Structure (INEX 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7424))

Abstract

This paper addresses the task of extracting the table of contents (TOC) from OCR-ed books. Since the OCR process misses a lot of layout and structural information, it is incapable of enabling navigation experience. A TOC is needed to provide a convenient and quick way to locate the content of interest. In this paper, we propose a hybrid method to extract TOC, which is composed of rule-based method and SVM-based method. The rule-based method mainly focuses on discovering the TOC from the books with TOC pages while the SVM-based method is employed to handle with the books without TOC pages. Experimental results indicate that the proposed methods obtain comparable performance against the other participants of the ICDAR 2011 Book structure extraction competition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Dresevic, B., Uzelac, A., Radakovic, B., Todic, N.: Book Layout Analysis: TOC Structure Extraction Engine. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2008. LNCS, vol. 5631, pp. 164–171. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  2. Doucet, A., Kazai, G., Dresevic, B., Uzelac, A., Radakovic, B., Todic, N.: Setting up a Competition Framework for the Evaluation of Structure Extraction from OCR-ed Books. International Journal of Document Analysis and Recognition (IJDAR) 14(1), 45–52 (2010)

    Article  Google Scholar 

  3. Giguet, E., Lucas, N.: The Book Structure Extraction Competition with the Resurgence Software at Caen University. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2009. LNCS, vol. 6203, pp. 170–178. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  4. Déjean, H., Meunier, J.-L.: XRCE Participation to the 2009 Book Structure Task. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2009. LNCS, vol. 6203, pp. 160–169. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, C., Chen, J., Zhang, X., Liu, J., Huang, Y. (2012). TOC Structure Extraction from OCR-ed Books. In: Geva, S., Kamps, J., Schenkel, R. (eds) Focused Retrieval of Content and Structure. INEX 2011. Lecture Notes in Computer Science, vol 7424. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35734-3_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35734-3_8

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35733-6

  • Online ISBN: 978-3-642-35734-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics