Skip to main content

Document Image Classification, with a Specific View on Applications of Patent Images

  • Chapter
  • First Online:

Part of the book series: The Information Retrieval Series ((INRE,volume 37))

Abstract

The main focus of this chapter is document image classification and retrieval, where we analyse and compare different parameters for the run-length histogram and Fisher vector-based image representations. We do an exhaustive experimental study using different document image data sets, including the MARG benchmarks, two data sets built on customer data and the images from the patent image classification task of the CLEF-IP 2011. The aim of the study is to give guidelines on how to best choose the parameters such that the same features perform well on different tasks. As an example of such need, we describe the image-based patent retrieval tasks of CLEF-IP 2011, where we used the same image representation to predict the image type and retrieve relevant patents.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ah-Pine J, Csurka G, Clinchant S (2015) Unsupervised visual and textual information fusion in cbir using graph-based methods. ACM Trans Inf Syst 33(2):1–31

    Article  Google Scholar 

  2. Akata Z, Perronnin F, Harchaoui Z, Schmid C (2014) Good practice in large-scale learning for image classification. Trans Pattern Anal Mach Intell 36:507–520

    Article  Google Scholar 

  3. Bagdanov AD, Worring M (2004) Multiscale document description using rectangular granulometries. Int J Doc Anal Recognit 6:181–191

    Article  MATH  Google Scholar 

  4. Bai B, Weston J, Grangier D, Collobert R, Sadamasa K, Qi Y, Chapelle O, Weinberger KQ (2009) Supervised semantic indexing. In: ACM international conference on information and knowledge management (CIKM)

    Google Scholar 

  5. Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: COMPSTAT, pp 177–186

    Google Scholar 

  6. Chan Y-K, Chang C-C (2001) Image matching using run-length feature. Pattern Recogn Lett 22:447–455

    Article  MATH  Google Scholar 

  7. Chen N, Blostein D (2007) A survey of document image classification: problem statement, classifier architecture and performance evaluatio. Int J Doc Anal Recognit 10:1–16

    Article  Google Scholar 

  8. Csurka G, Dance C, Fan L, Willamowski J, Bray C (2004) Visual categorization with bags of keypoints. In: ECCV workshop on statistical learning for computer vision, vol 1, pp 1–2

    Google Scholar 

  9. Csurka G, Renders J-M, Jacquet G (2011) XRCEś participation at patent image classification and image-based patent retrieval tasks of the Clef-IP 2011. In: Intellectual property evaluation campaign (CLEF-IP)

    Google Scholar 

  10. Cullen JF, Jonathan JJH, Hart PE (1997) Document image database retrieval and browsing using texture analysis. In: International conference on document analysis and recognition (ICDAR), vol 2, pp 718–721

    Google Scholar 

  11. Davis JV, Kulis B, Jain P, Sra S, Dhillon IS (2007) Information-theoretic metric learning. In: International conference on machine learning (ICML)

    Google Scholar 

  12. Gordo A (2013) Document image representation, classification and retrieval in large-scale domains. PhD thesis, Computer Vision Center, Universitat Autònoma de Barcelona

    Google Scholar 

  13. Gordo A, Perronnin F (2010) A bag-of-pages approach to unordered multi-page document classification. In: International conference on pattern recognition (ICPR)

    Google Scholar 

  14. Gordo A, Perronnin F (2011) Asymmetric distances for binary embeddings. In: IEEE conference on computer vision and pattern recognition (CVPR). https://ai2-s2-pdfs.s3.amazonaws.com/d191/544940caac5f57363968539856343ad9a02d.pdf

  15. Gordo A, Perronnin F, Valveny E (2012) Document classification using multiple views. In: International workshop on document analysis systems (DAS), pp 33–37

    Google Scholar 

  16. Gordo A, Perronnin F, Valveny E (2013) Large-scale document image retrieval and classification with runlength histograms and binary embeddings. Pattern Recogn 46(7):1898–1905

    Article  Google Scholar 

  17. Harley A, Ufkes A, Derpanis K (2015) Evaluation of deep convolutional nets for document image classification and retrieval. In: International conference on document analysis and recognition (ICDAR), pp 991–995

    Google Scholar 

  18. Heroux P, Diana S, Ribert A, Trupin E (1998) Classification method study for automatic form class identification. In: International conference on pattern recognition (ICPR), vol 1, pp 926–928

    Google Scholar 

  19. Kang L, Kumar J, Ye P, Liy Y, Doermann D (2014) Convolutional neural networks for document image classification. In: International conference on pattern recognition (ICPR), pp 3168–3172

    Google Scholar 

  20. Keysers D, Shafait F, Breuel TM (2007) Document image zone classification - a simple high-performance approach. In: International conference on computer vision theory and applications (VISAPP), pp 44–51

    Google Scholar 

  21. Krapac J, Verbeek J, Jurie F (2011) Modeling spatial layout with fisher vectors for image categorization. In: IEEE international conference on computer vision (ICCV)

    Google Scholar 

  22. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE conference on computer vision and pattern recognition (CVPR), vol 2, pp 2169–2178

    Google Scholar 

  23. Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  24. Mensink T, Verbeek J, Perronnin F, Csurka G (2013) Distance-based image classification: generalizing to new classes at near-zero cost. Trans Pattern Anal Mach Intell 35(11):2624–2637

    Article  Google Scholar 

  25. Perronnin F, Dance C (2007) Fisher kernels on visual vocabularies for image categorization. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8

    Google Scholar 

  26. Perronnin F, Liu Y, Sánchez J, Poirier H (2010) Large scale image retrieval with compressed fisher vectors. In: IEEE conference on computer vision and pattern recognition (CVPR)

    Google Scholar 

  27. Perronnin F, Sánchez J, Mensink T (2010) Improving the fisher kernel for large-scale image classification. In: European conference on computer vision (ECCV), pp 143–156

    Google Scholar 

  28. Piroi F, Lupu M, Hanbury A, Zenz V (2011) CLEF-IP 2011: retrieval in the intellectual property domain. In: Intellectual property evaluation campaign (CLEF-IP)

    Google Scholar 

  29. Pratikakis I, Gatos B, Ntirogiannis K (2012) ICFHR 2012 competition on handwritten document image binarization. In: Proceedings of the ICFHR

    Google Scholar 

  30. Rusiñol M, Frinken V, Karatzas D, Bagdanov AD, Llados J (2014) Multimodal page classification in administrative document image streams. Int J Doc Anal Recognit 17:331–341

    Article  Google Scholar 

  31. Sánchez J, Perronnin F (2011) High-dimensional signature compression for large-scale image classification. In: IEEE conference on computer vision and pattern recognition (CVPR)

    Google Scholar 

  32. Sarkar P (2006) Image classification: classifying distributions of visual features. In: International conference on pattern recognition (ICPR), vol 2, pp 472–475

    Google Scholar 

  33. Shin C, Doermann D, Rosenfeld A (2001) Classification of document pages using structure-based features. Int J Doc Anal Recognit 3:232–247

    Article  Google Scholar 

  34. Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: IEEE international conference on computer vision (ICCV)

    Google Scholar 

  35. The Medical Article Records Groundtruth Dataset (2003) https://ceb.nlm.nih.gov/inactive-communications-engineering-branch-projects/medical-article-records-groundtruth-marg/. Last visited Jan 2017

  36. The NIST Structured Forms Database (NIST Special Database 2) (2010) https://www.nist.gov/srd/nist-special-database-2. Last visited Jan 2017

  37. Vedaldi A, Zisserman A (2012) Sparse kernel approximations for efficient classification and detection. In: IEEE conference on computer vision and pattern recognition (CVPR)

    Google Scholar 

  38. Weinberger K, Saul L (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gabriela Csurka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer-Verlag GmbH Germany

About this chapter

Cite this chapter

Csurka, G. (2017). Document Image Classification, with a Specific View on Applications of Patent Images. In: Lupu, M., Mayer, K., Kando, N., Trippe, A. (eds) Current Challenges in Patent Information Retrieval. The Information Retrieval Series, vol 37. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-53817-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-53817-3_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-53816-6

  • Online ISBN: 978-3-662-53817-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics