Abstract
The main focus of this chapter is document image classification and retrieval, where we analyse and compare different parameters for the run-length histogram and Fisher vector-based image representations. We do an exhaustive experimental study using different document image data sets, including the MARG benchmarks, two data sets built on customer data and the images from the patent image classification task of the CLEF-IP 2011. The aim of the study is to give guidelines on how to best choose the parameters such that the same features perform well on different tasks. As an example of such need, we describe the image-based patent retrieval tasks of CLEF-IP 2011, where we used the same image representation to predict the image type and retrieve relevant patents.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Ah-Pine J, Csurka G, Clinchant S (2015) Unsupervised visual and textual information fusion in cbir using graph-based methods. ACM Trans Inf Syst 33(2):1–31
Akata Z, Perronnin F, Harchaoui Z, Schmid C (2014) Good practice in large-scale learning for image classification. Trans Pattern Anal Mach Intell 36:507–520
Bagdanov AD, Worring M (2004) Multiscale document description using rectangular granulometries. Int J Doc Anal Recognit 6:181–191
Bai B, Weston J, Grangier D, Collobert R, Sadamasa K, Qi Y, Chapelle O, Weinberger KQ (2009) Supervised semantic indexing. In: ACM international conference on information and knowledge management (CIKM)
Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: COMPSTAT, pp 177–186
Chan Y-K, Chang C-C (2001) Image matching using run-length feature. Pattern Recogn Lett 22:447–455
Chen N, Blostein D (2007) A survey of document image classification: problem statement, classifier architecture and performance evaluatio. Int J Doc Anal Recognit 10:1–16
Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: ECCV workshop on statistical learning for computer vision, vol 1, pp 1–2
Csurka G, Renders J-M, Jacquet G (2011) XRCEś participation at patent image classification and image-based patent retrieval tasks of the Clef-IP 2011. In: Intellectual property evaluation campaign (CLEF-IP)
Cullen JF, Jonathan JJH, Hart PE (1997) Document image database retrieval and browsing using texture analysis. In: International conference on document analysis and recognition (ICDAR), vol 2, pp 718–721
Davis JV, Kulis B, Jain P, Sra S, Dhillon IS (2007) Information-theoretic metric learning. In: International conference on machine learning (ICML)
Gordo A (2013) Document image representation, classification and retrieval in large-scale domains. PhD thesis, Computer Vision Center, Universitat Autònoma de Barcelona
Gordo A, Perronnin F (2010) A bag-of-pages approach to unordered multi-page document classification. In: International conference on pattern recognition (ICPR)
Gordo A, Perronnin F (2011) Asymmetric distances for binary embeddings. In: IEEE conference on computer vision and pattern recognition (CVPR). https://ai2-s2-pdfs.s3.amazonaws.com/d191/544940caac5f57363968539856343ad9a02d.pdf
Gordo A, Perronnin F, Valveny E (2012) Document classification using multiple views. In: International workshop on document analysis systems (DAS), pp 33–37
Gordo A, Perronnin F, Valveny E (2013) Large-scale document image retrieval and classification with runlength histograms and binary embeddings. Pattern Recogn 46(7):1898–1905
Harley A, Ufkes A, Derpanis K (2015) Evaluation of deep convolutional nets for document image classification and retrieval. In: International conference on document analysis and recognition (ICDAR), pp 991–995
Heroux P, Diana S, Ribert A, Trupin E (1998) Classification method study for automatic form class identification. In: International conference on pattern recognition (ICPR), vol 1, pp 926–928
Kang L, Kumar J, Ye P, Liy Y, Doermann D (2014) Convolutional neural networks for document image classification. In: International conference on pattern recognition (ICPR), pp 3168–3172
Keysers D, Shafait F, Breuel TM (2007) Document image zone classification - a simple high-performance approach. In: International conference on computer vision theory and applications (VISAPP), pp 44–51
Krapac J, Verbeek J, Jurie F (2011) Modeling spatial layout with fisher vectors for image categorization. In: IEEE international conference on computer vision (ICCV)
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE conference on computer vision and pattern recognition (CVPR), vol 2, pp 2169–2178
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110
Mensink T, Verbeek J, Perronnin F, Csurka G (2013) Distance-based image classification: generalizing to new classes at near-zero cost. Trans Pattern Anal Mach Intell 35(11):2624–2637
Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8
Perronnin F, Liu Y, Sánchez J, Poirier H (2010) Large scale image retrieval with compressed fisher vectors. In: IEEE conference on computer vision and pattern recognition (CVPR)
Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: European conference on computer vision (ECCV), pp 143–156
Piroi F, Lupu M, Hanbury A, Zenz V (2011) CLEF-IP 2011: retrieval in the intellectual property domain. In: Intellectual property evaluation campaign (CLEF-IP)
Pratikakis I, Gatos B, Ntirogiannis K (2012) ICFHR 2012 competition on handwritten document image binarization. In: Proceedings of the ICFHR
Rusiñol M, Frinken V, Karatzas D, Bagdanov AD, Llados J (2014) Multimodal page classification in administrative document image streams. Int J Doc Anal Recognit 17:331–341
Sánchez J, Perronnin F (2011) High-dimensional signature compression for large-scale image classification. In: IEEE conference on computer vision and pattern recognition (CVPR)
Sarkar P (2006) Image classification: classifying distributions of visual features. In: International conference on pattern recognition (ICPR), vol 2, pp 472–475
Shin C, Doermann D, Rosenfeld A (2001) Classification of document pages using structure-based features. Int J Doc Anal Recognit 3:232–247
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: IEEE international conference on computer vision (ICCV)
The Medical Article Records Groundtruth Dataset (2003) https://ceb.nlm.nih.gov/inactive-communications-engineering-branch-projects/medical-article-records-groundtruth-marg/. Last visited Jan 2017
The NIST Structured Forms Database (NIST Special Database 2) (2010) https://www.nist.gov/srd/nist-special-database-2. Last visited Jan 2017
Vedaldi A, Zisserman A (2012) Sparse kernel approximations for efficient classification and detection. In: IEEE conference on computer vision and pattern recognition (CVPR)
Weinberger K, Saul L (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer-Verlag GmbH Germany
About this chapter
Cite this chapter
Csurka, G. (2017). Document Image Classification, with a Specific View on Applications of Patent Images. In: Lupu, M., Mayer, K., Kando, N., Trippe, A. (eds) Current Challenges in Patent Information Retrieval. The Information Retrieval Series, vol 37. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-53817-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-662-53817-3_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-53816-6
Online ISBN: 978-3-662-53817-3
eBook Packages: Computer ScienceComputer Science (R0)