Grammatical inference in document recognition

  • Alexander S. Saidi
  • Souad Tayeb-bey
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1433)


In this paper, we consider the Pattern Recognition applied to paper documents based on the grammatical inference (GI) for classes of structured documents like summaries, dictionaries, bibliographic data basis, encyclopaedias and so on. In this task, the inference engine takes as input a set of individual examples of these documents and outputs a set of rules that recognise similar documents. We place GI in an algebraic framework in which rewrite rules will define the process of generalisation. The implementation algorithm discussed here is used in a current document handling project in which paper documents are typographically tagged and then recognised. One of the current applications in this project is to extract the physical and the logical structures of a given set of paper documents and then reorganise them in a machine readable form like HTML code.


Logic Program Regular Expression Regular Language Finite Automaton Derivation Tree 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    S. Tayeb-Bey, A. S. Saidi “Grammatical Formalism for Document Understanding System: From Document towards HTML Text”. BSDIA'97, November 1997, Brasilia.Google Scholar
  2. [2]
    E.M. Gold. “Language identification in the limit”. Inf. and Control, 10(5)-1967.Google Scholar
  3. [3]
    H.S. Fu and T. Booth: “Grammatical Inference: Introduction and Survey”, parts 1 & 2. IEEE Trans. Sys. man and Cyber. SMC-5: 95–11.Google Scholar
  4. [4]
    R. C. Gonzalez and M. G. Thomason. “Syntactic Pattern Recognition, an Introduction”. Addison Wesley. Reading Mass. 1978.Google Scholar
  5. [5]
    H.S. Fu. “Syntactic Pattern Recognition and Applications”. Prentice Hall, N.Y. 1982.Google Scholar
  6. [6]
    L. Miclet. “Grammatical Inference”. Syntactic and Structural Pattern Recognition. H. Bunk and SanFeliu eds. World Scientific.Google Scholar
  7. [7]
    J. Onica, P. Garcia. “Inferring regular Languages in Polynomial Update time”. Pattern Recognition and Image Analysis. 1992.Google Scholar
  8. [8]
    P. Dupont, L. Miclet & E. Vidal. “What is the search space of Regular Inference?”. ICGI'94, Grammatical Inference and Applications. Springer-Verlag-94.Google Scholar
  9. [9]
    L. Fribourg, M. V. Peixoto. “Automates concurrents à Contraintes”. TSI.13 (6). 1994.Google Scholar
  10. [10]
    J. A. Goguen, J.W. Tatcher, E.G. Wagner, J.B. Wright. “Initial Algebra Semantics and Continuous Algebra”. JACM 24(1). 1977.Google Scholar
  11. [11]
    A. S. Saidi: “Extensions Grammaticales de la Programmation Logique”. PhD. 1992.Google Scholar
  12. [12]
    A. S. Saidi. “On the unification of phrases”. IFIP-94.Google Scholar
  13. [13]
    H. Ehrig, B. Mahr. “ Fundamentals of Algebraic Specification”. Vol-1 & 2. Springer-Verlag1985.Google Scholar
  14. [14]
    E.M. Gold. “Complexity of automaton identification from given data””. Information and Control, 37-1978.Google Scholar
  15. [15]
    J. E. Hopcroft, J.D. Ullmann. “Formal Languages and their Relation to Automata”. Addison-Wesley 1969.Google Scholar
  16. [16]
    F. Bancilhon & all. “Magic Sets and Other Strange Ways to Implement Logic Programs”. Proc. ACM Symp. on principles of Databases Systems. Boston 1986.Google Scholar
  17. [17]
    F. Coste, J. Nicols: “Regular Inference as a graph coloring Problem”. ICML'97. 1997.Google Scholar
  18. [18]
    K.R. Apt, M.H. Van Emden: “Contribution to the Theory of Logic Programming”. JACM. 29(3ℴ. 1982.Google Scholar
  19. [19]
    R.S. Michalski & all. “Machine Learning: An Artificial Intelligence Approach”, vol. 1 & 2. Springer-Verlag 1984 and Morgan Kaufmann 1986.Google Scholar
  20. [20]
    H. Ahohen, H. Mannila. “Forming Grammars for structured documents”. Research report. University of Helsinki. 1994.Google Scholar
  21. [21]
    P. Frankhauser, Y. Xu. “MarkitUp! an incremental approach to document structure recognition”. Elect. Publishing-Organisation, Dissemination and Design, 6(4). 1994.Google Scholar
  22. [22]
    G. Lindén “Structured Document Transformation”. PhD Thesis. University of Helsinki.Finland June 1997.Google Scholar
  23. [23]
    Y. Yan Tang, C. De Yan, C. Y. Suen “Document processing for Automatic Knowledge Acquisition”. IEEE transactions on Knowledge and Data Engineering. 6(1). 1994.Google Scholar
  24. [24]
    B. Poirier, M. Dagenais. “Outils d'extraction et de reconnaissance de la structure de documents”. CNED'96. pp. 179–184. Nantes-France 1996.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1998

Authors and Affiliations

  • Alexander S. Saidi
    • 1
    • 2
  • Souad Tayeb-bey
    • 1
  1. 1.Laboratoire de Reconnaissance de Formes et VisionINSA de Lyon-Bât.Villeurbanne
  2. 2.Dépt. MathématiquesInformatique et Systèmes Ecole Centrale de LyonEcully

Personalised recommendations