Skip to main content

Using Page Breaks for Book Structuring

  • Conference paper
Focused Retrieval of Content and Structure (INEX 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7424))

  • 560 Accesses

Abstract

We report on the XRCE participation to the Structure Extraction task of the INEX/ICDAR Book Structure Extraction 2011. We wanted to assess a simple method for structuring a book: using leading and trailing page whitespace. The detection of such large whitespace occurring at the top of leading pages and at the bottom of trailing pages is based on the detection of the type area zone. Evaluation shows as expected a very good precision. Since this approach aims at detecting high level book structures (parts, chapters), structures not marked a page break are not detected (thus a lower recall).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tschichold, J.: The form of the book: essays on the morality of good design. Hartley & Marks, Point Roberts (1991)

    Google Scholar 

  2. Shafait, F., van Beusekom, J., Keysers, D., Breuel, T.M.: Document cleanup using frame detection. International Journal of Document Analysis and Recognition 11, 81–96 (2008)

    Article  Google Scholar 

  3. Déjean, H., Meunier, J.-L.: A System for Converting PDF Documents into Structured XML Format. In: Bunke, H., Spitz, A.L. (eds.) DAS 2006. LNCS, vol. 3872, pp. 129–140. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  4. Déjean, H., Meunier, J.-L.: Reflections on the INEX structure extraction competition, Boston. In: Document Analysis Systems, pp. 301–308 (2010)

    Google Scholar 

  5. Giguet, E., Baudrillart, A., Lucas, N.: Resurgence for the Book Structure Extraction Competition. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2009 Workshop Pre-Proceedings (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Déjean, H. (2012). Using Page Breaks for Book Structuring. In: Geva, S., Kamps, J., Schenkel, R. (eds) Focused Retrieval of Content and Structure. INEX 2011. Lecture Notes in Computer Science, vol 7424. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35734-3_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-35734-3_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-35733-6

  • Online ISBN: 978-3-642-35734-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics