Skip to main content

Document modeling for form class identification

  • Oral Presentations
  • Conference paper
  • First Online:
Advances in Document Image Analysis (BSDIA 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1339))

Included in the following conference series:

Abstract

This article deals with the description of a document system analysis based on document modeling. This system is applied to forms which are used by the CAF, the French national family allowance Department -Caisse d 'Allocations Familiales. The system is composed by three different modules which deals with the different form processes. The first module - low-level processing - is divided into three stages : acquisition, binarisation and skew correction. These stages allow the transformation of a paper form into an image with correct qualities. The second module - document structuration - processes this image to extract the information contained in the form. The information is arranged to obtain a tree. This tree shows the organisation of the form content into a hierarchical way. In addition to the tree extraction, the document structuration module allows the creation of a form model base. The last module -form class identification - uses the tree and the form model base. It is composed with two pre-classifiers to extract possible lists of forms and a structural classifier. The two pre-classifiers filter the form classes among the 250 classes in order to reduce the treatment of the classifier. This classifier is based on graph matching to compare the tree of the particular form and the possible list of form extracted during the two pre-classifiers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Lamouche, I., Bellissant, C.: Séparation recto/verso d'images de manuscrits anciens. Colloque National sur l'Ecrit et le Document Nantes France (1996) 199–206.

    Google Scholar 

  2. Chatterjee, C., Raychowdhury, V.P.: Models and algorithms for real-time hybrid image enhancement methodology. Pattern Recognition 29-9 (1996) 1531–1542.

    Google Scholar 

  3. Sauvola, J., Pietikainen, M.: Page segmentation and classification using fast feature extraction and connectivity analysis. International Conference on Document Analysis and Recognition Montreal Canada 2 (1995) 1127–1131.

    Google Scholar 

  4. Esposito, F., Malerba, D., Semeraro, G.: Automated acquisition of rules for document understanding. International Conference on Document Analysis and Recognition Tsukuba Science City Japan (1993) 650–654.

    Google Scholar 

  5. Tang, Y.Y., Suen, C.Y.: Document structures: A survey. International Conference on Document Analysis and Recognition Tsukuba Science City Japan (1993) 99–102.

    Google Scholar 

  6. Brink, A.D., Pendock, N.E.: Minimum cross-entropy threshold selection. Pattern Recognition 29-1 (1996) 179–188.

    Google Scholar 

  7. Kittler, J., Illingworth, J.: Minimum error thresholding. Pattern Recognition 19-1 (1986) 41–47.

    Google Scholar 

  8. Kurita, T., Otsu, N., Abdelmalek, N.: Maximum likelihood thresholding based on population mixture models. Pattern Recognition 25-10 (1992) 1231–1240.

    Google Scholar 

  9. Le, D.S., Thoma, G.R., Wechsler, H.: Automated page orientation and skew angle detection for binary document images. Pattern Recognition 27-10 (1994) 1325–1344.

    Google Scholar 

  10. Leroux, M.: P.A.B.L.O, Procédure de saisie de bordereaux par lecture optique. Colloque National sur l'Ecrit et le Document Nantes France (1996) 259–266.

    Google Scholar 

  11. Trupin, E.: A modified contour following algorithm applied to document segmentation. Intelligence Artificial and Pattern Recognition The Hague Netherlands (1992) 525–528.

    Google Scholar 

  12. Wahl, F., Wong, F., Casey, R.: Block segmentation and text extraction in mixed text/image documents. Computer Graphics and Image Processing 20 (1982) 375–390.

    Google Scholar 

  13. Trier, O.D., Taxt, T.: Evaluation of binarisation methods for document images. Pattern Analysis and Machine Intelligence 17-3 (1995) 312–315.

    Google Scholar 

  14. Watanabe, T., Luo, Q., Sugie, N.: Layout recognition of multi-kinds of table form documents. Pattern Analysis and Machine Intelligence 17-4 (1995) 432–445.

    Google Scholar 

  15. Casey, R., Ferguson, D., Mohiuddin, K., Walach, E.: Intelligent forms processing system. Machine Vision and Applications 5 (1992) 143–155.

    Google Scholar 

  16. Dengel, A., Dubiel, F.: Clustering and classification of document structure-A machine learning approach. International Conference on Document Analysis and Recognition 2 (1995) 587–591.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Nabeel A. Murshed Flávio Bortolozzi

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Diana, S., Trupin, E., Lecourtier, Y., Labiche, J. (1997). Document modeling for form class identification. In: Murshed, N.A., Bortolozzi, F. (eds) Advances in Document Image Analysis. BSDIA 1997. Lecture Notes in Computer Science, vol 1339. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63791-5_13

Download citation

  • DOI: https://doi.org/10.1007/3-540-63791-5_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63791-2

  • Online ISBN: 978-3-540-69646-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics