Abstract
In order to increase the performance of document analysis systems a detailed quality evaluation of the achieved results is required. By focussing on segmentation algorithms, we point out that the results produced by the module under consideration should be evaluated directly; we will show that the text-based evaluation method which is often used in the document analysis domain does not accomplish the purpose of a detailed quality evaluation. Therefore, we propose a general evaluation approach for the comparison of segmentation results which is based on the segments directly. This approach is able to handle both algorithms that produce complete segmentations (partition) and algorithms that only extract objects of interest (extraction). Classes of errors are defined in a systematic way, and frequencies for each class can be computed. The evaluation approach is applicable to segmentation or extraction algorithms in a wide range. We have chosen the character segmentation task as an example in order to demonstrate the applicability of our evaluation approach, and we suggest to apply our approach to other segmentation tasks.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Reference
T. Pavlidis: Problems in the Recognition of Poorly Printed Text, Proc. Symposium on Document Analysis and Information Retrieval, Las Vegas 1992, pp. 162–173
M. D. Garris: Method and Evaluation of Character Stroke Preservation on Handprint Recognition, National Institute of Standards and Technology (NIST) Technical Report NISTIR 5687, July 1995; published in: SPIE, Document Recognition III, pp. 321–332, San Jose, January 1996
F. M. Wahl, K. Y. Wong, R. G. Casey: Block Segmentation and Text Extraction in Mixed Text/Image Documents, Computer Graphics and Image Processing, Vol. 20, 1982, pp.375–390
R. M. Haralick: Document Image Understanding: Geometric and Logical Layout, CVPR, Seattle, USA, June 1994
R. G. Casey, E. Lecolinet: A Survey of Methods and Strategies in Character Segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence, July 1996, pp. 690–706
R. M. Haralick: Propagating Covariance in Computer Vision, Workshop on Performance Characteristics of Vision Algorithms, Robin College, Cambridge, UK, April 1996
J. Kanai, S. V. Rice, T. A. Nartker, G. Nagy: Automated Evaluation of OCR Zoning, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 17, No. 1, Jan. 1995, pp. 86–90
S. V. Rice, F. R. Jenkins, T. A. Nartker: The Fifth Annual Test of OCR Accuracy, Information Science Research Institute, University of Nevada, Las Vegas, Technical Report ISRI TR-96-01, April 1996
B. A. Yanikoglu, L. Vincent: Ground-truthing and Benchmarking Document Page Segmentation, Proc. 3rd Intern. Conf. on Document Analysis and Recognition (ICDAR), Montréal, Canada, 1995, pp. 601–604
S. Chen, R. M. Haralick, I. T. Phillips: Perfect Document Layout Ground Truth Generation Using DVI Files and Simultaneous Word Segmentation From Document Images, Proc. Fourth Annual Symposium on Document Analysis and Information Retrieval, Las Vegas 1995, pp. 229–248
A. Hoover, G. Jean-Baptiste, X. Jiang, P. J. Flynn, H. Bunke, D. Goldgof, K. Bowyer, D. Eggert, A. Fitzgibbon, R. Fisher: An Experimental Comparison of Range Image Segmentation Algorithms, IEEE Transactions on Pattern Analysis and Machine Intelligence, July 1996, pp. 1–17
H. S. Baird: Document Image Defect Models, in: Structured Document Image Analysis, Springer, New York, 1992, pp. 546–556
M. Thulke: Use of Geometrical Ground Truth for Quality Evaluation of Document Segmentation Algorithms, in:W. Förstner (editor): Workshop Performance Characteristics and Quality of Computer Vision Algorithms, Braunschweig, Germany, September 1997
P. Stubberud, J. Kanai, V. Kalluri: Adaptive Restoration of Text Images Containing Touching or Broken Characters, Information Science Research Institute (ISRI) 1995 Annual Research Report, pp. 61–96
C. L. Wilson, J. Geist, M. D. Garris, R. Chellappa: Design, Integration and Evaluation of Form-Based Handprint and OCR Systems, NIST Internal Report 5932, December 1996
R. Bippus, V. Märgner: Data Structures and Tools for Document Database Generation: An Experimental System, Proc. Third Intern. Conf. on Document Analysis and Recognition (ICDAR), Montréal, Canada, 1995, pp. 711–714
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1999 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Thulke, M., Märgner, V., Denge, A. (1999). A General Approach to Quality Evaluation of Document Segmentation Results. In: Lee, SW., Nakano, Y. (eds) Document Analysis Systems: Theory and Practice. DAS 1998. Lecture Notes in Computer Science, vol 1655. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-48172-9_5
Download citation
DOI: https://doi.org/10.1007/3-540-48172-9_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-66507-6
Online ISBN: 978-3-540-48172-0
eBook Packages: Springer Book Archive