Skip to main content

Composite document analysis by means of typographic characteristics

  • Oral Presentations
  • Conference paper
  • First Online:
Advances in Document Image Analysis (BSDIA 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1339))

Included in the following conference series:

Abstract

We have just presented a new method, of regrouping letters and words in homogeneous font families which doesn't necessitate to explicitly recognise the font. This analysis, achieved with the application of one pattern redundancy technique, allows us to extract a part of the logical information which is carried by words typographic features. After having differentiated, grouped together and compared the typographic families, we'll know: - the cardinality of each family, - its grease, slope and size compared to the others families. The study of the typographic families organisation, and of their relative characteristics, will allows us to classify families according to their logical significance, and so to voice, when it will be possible, hypothesis concerning the logical signification of the families. A comparison between the constructed families and the learned grammar, will come to validate or correct the hypothesis, and to label families for which no hypothesis has been voiced. The significance of the method, we have developed, is that each process only depend on the image ; it isn't depend on the document type or on fonts data basis. So this method can be applied to every document type, specially complex and typographically rich documents. An other significance is that our text markers will be use for describing our document in HTML language

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

7 Bibliography

  1. Aupnigbogu J.C.Reconnaissance de Textes Imprimés Multifontes à l'aide de Modèles Stochastiques et Métriques. Thèse Doct. Sei.: Université de Nancy 1, 1992

    Google Scholar 

  2. R.G. CASEY, G. NAGY Recursive Segmentation and Classification of Composite Character. 6th ICPR, Intenational Conference on Pattern Recognition, Paris, France, 1982, vol.2, p.1023–1025

    Google Scholar 

  3. DUFFY L., LEBOURGEOIS F. et EMPTOZ H. The Improve of Logical Structure Analysis by Typographic Characteristics Extraction. ICIAP97, International Conference on Image Analysis and Processing, Florence, Italie, 1997

    Google Scholar 

  4. FISCHER S., AMIN A. and DRIVAS D. Segmentation of the Yellow Page. Third ICDAR, International Conference on Document Analysis and Recognition, Montréal, Canada, 1995, p. 605–609

    Google Scholar 

  5. LE D.X., THOMA G.R. et WECHSLER. Automated Borders Detection and Adaptative Segmentation for Binary Document Images. 13th ICPR, Intenational Conference on Pattern Recognition, Vienne, Austria, 1996, p.737–741

    Google Scholar 

  6. LEBOURGEOIS F., HENRY H. et EMPTOZ H. An OCR System for Printed Document. MVA'92, IAPR Workshop on Machine Vision Applications, Tokyo, Japon, 1992, p.83–86

    Google Scholar 

  7. NIYOGI D. and SRIHARI S.N. Knowledge-Based Derivation of Document Logical Structure. Third ICDAR, International Conference on Document Analysis and Recognition, Montréal, Canada, 1995, p. 472–475

    Google Scholar 

  8. SATOH S., TAKASU A. and KATSURA E. An Automated Generation of Electronic Library based on Document Image Understanding. Third ICDAR, International Conference on Document Analysis and Recognition, Montréal, Canada, 1995, p. 163–166

    Google Scholar 

  9. ZRAMDINI A. et INGOLD R. Optical Font Recognition from Projection Profiles. Third RIDT International Conference on Raster Imaging and Digital Typography, Darmstadt, Allemagne, 1994

    Google Scholar 

Download references

Authors

Editor information

Nabeel A. Murshed Flávio Bortolozzi

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Duffy, L., Lebourgeois, F., Emptoz, H. (1997). Composite document analysis by means of typographic characteristics. In: Murshed, N.A., Bortolozzi, F. (eds) Advances in Document Image Analysis. BSDIA 1997. Lecture Notes in Computer Science, vol 1339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63791-5_14

Download citation

  • DOI: https://doi.org/10.1007/3-540-63791-5_14

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63791-2

  • Online ISBN: 978-3-540-69646-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics