Quality Assurance Tool Suite for Error Detection in Digital Repositories

  • Roman Graf
  • Ross King
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8839)


Digitization workflows for automatic acquisition of image collections are susceptible to errors and require quality assurance. This paper presents the automated quality assurance tools aiming at detection of possible quality issues that supports decision making for document image collections. The main contribution of this research is the implementation of various image processing tools for different error detection scenarios and their combination in to a single tool suite. The tool suite includes: (1) The matchbox tool for accurate near-duplicate detection in document image collections, based on SIFT feature extraction. (2) The finger detection tool aims at automatic detection of fingers that mistakenly appear in scans from digitized image collections, which uses processing techniques for edge detection, local image information extraction and its analysis for reasoning on scan quality. (3) The cropping error detection tool supports the detection of common cropping problems such as text shifted to the edge of the image, unwanted page borders, or unwanted text from a previous page on the image. Another important contribution of this work is a definition of the quality assurance workflow and its automatic execution for error detection in digital document collections. The presented tool suite detects described errors and presents them for additional manual analysis and collection cleaning. A statistical overview of evaluated data and characteristics like performance and accuracy is delivered. The results of the analysis confirm our hypothesis that an automated approach is able to detect errors with reliable quality, thus making quality control for large digitisation projects a feasible and affordable process.


digital library digital preservation quality assurance image processing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Canny, J.: A computational approach to edge detection. IEEE Trans. Pat. Anal. Mach. Intell., 679–698 (1986)Google Scholar
  2. 2.
    Csurka, G., Dance, C.R., Fan, L., Willamowski, J.: Visual categorization with bags of keypoints. In: Workshop on SLCV, ECCV, pp. 1–22 (2004)Google Scholar
  3. 3.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. Pattern Analysis and Machine Intelligence, IEEE Transactions on Pattern Analysis and Machine Intelligence 32, 1627–1645 (2010)CrossRefGoogle Scholar
  4. 4.
    Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Graf, R., King, R.: Finger detection for quality assurance of digitized image collections. In: Archiving Conference (2013)Google Scholar
  6. 6.
    Lu, G., Phillips, J.: Using perceptually weighted histograms for colour-based image retrieval. In: Fourth International Conference on Signal Processing, vol. 2 (1998)Google Scholar
  7. 7.
    Huber-Mörk, R., Schindler, A.: Quality assurance for document image collections in digital preservation. In: Blanc-Talon, J., Philips, W., Popescu, D., Scheunders, P., Zemčík, P. (eds.) ACIVS 2012. LNCS, vol. 7517, pp. 108–119. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  8. 8.
    Huber-Mörk, R., Schindler, A.: Quality assurance for document image collections in digital preservation. In: Blanc-Talon, J., Philips, W., Popescu, D., Scheunders, P., Zemčík, P. (eds.) ACIVS 2012. LNCS, vol. 7517, pp. 108–119. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  9. 9.
    Ke, Y., Sukthankar, R., Huston, L.: An efficient parts-based near-duplicate and sub-image retrieval system. In: Proceedings of the 12th Annual ACM International Conference on Multimedia, MULTIMEDIA 2004, pp. 869–876. ACM, New York (2004)Google Scholar
  10. 10.
    Le Bourgeois, F., Trinh, E., Allier, B., Eglin, V., Emptoz, H.: Document images analysis solutions for digital libraries, document image analysis for libraries. In: Proceedings of the First International Workshop on Document Image Analysis for Libraries (DIAL 2004), pp. 2–24 (2004)Google Scholar
  11. 11.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. of Comput. Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  12. 12.
    Marr, D., Hildreth, E.: Theory of edge detection. In: Proc. of the Royal Soc. London, pp. 187–217 (1980)Google Scholar
  13. 13.
    Meyer, F.: Color image segmentation. In: Image Processing and its Applications, pp. 303–306 (1992)Google Scholar
  14. 14.
    Graf, R., King, R., Schlarb, S.: Blank page and duplicate detection for quality assurance of document image collections. In: APA CDAC 2014 (2014)Google Scholar
  15. 15.
    Wu, X., Zhao, W.-L., Ngo, C.-W.: Near-duplicate keyframe retrieval with visual keywords and semantic context. In: Proc. of the 6th ACM ICIVR, pp. 162–169. ACM, New York (2007)Google Scholar
  16. 16.
    Zhao, W.-L., Ngo, C.-W., Tan, H.-K., Wu, X.: Near-duplicate keyframe identification with interest point matching and pattern learning. IEEE Transactions on Multimedia 9(5), 1037–1048 (2007)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Roman Graf
    • 1
  • Ross King
    • 1
  1. 1.Research Area Future Networks and Services, Department Safety & SecurityAustrian Institute of TechnologyViennaAustria

Personalised recommendations