Feature Extraction and Learning Using Context Cue and Rényi Entropy Based Mutual Information

  • Hong PanEmail author
  • Søren Ingvor Olsen
  • Yaping Zhu
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9493)


Feature extraction and learning play a critical role for visual perception tasks. We focus on improving the robustness of the kernel descriptors (KDES) by embedding context cues and further learning a compact and discriminative feature codebook for feature reduction using Rényi entropy based mutual information. In particular, for feature extraction, we develop a new set of kernel descriptors−Context Kernel Descriptors (CKD), which enhance the original KDES by embedding the spatial context into the descriptors. Context cues contained in the context kernel enforce some degree of spatial consistency, thus improving the robustness of CKD. For feature learning and reduction, we propose a novel codebook learning method, based on a Rényi quadratic entropy based mutual information measure called Cauchy-Schwarz Quadratic Mutual Information (CSQMI), to learn a compact and discriminative CKD codebook. Projecting the original full-dimensional CKD onto the codebook, we reduce the dimensionality of CKD while preserving its discriminability. Moreover, the latent connection between Rényi quadratic entropy and the mapping data in kernel feature space further facilitates us to capture the geometric structure as well as the information about the underlying labels of the CKD using CSQMI. Thus the resulting codebook and reduced CKD are discriminative. We verify the effectiveness of our method on several public image benchmark datasets such as YaleB, Caltech-101 and CIFAR-10, as well as a challenging chicken feet dataset of our own. Experimental results show that our method has promising potential for visual object recognition and detection applications.


Context Kernel Descriptors Cauchy-Schwarz Quadratic Mutual Information Feature extraction and learning Object classification and detection 



This work is supported by The Danish Agency for Science, Technology and Innovation, project “Real-time controlled robots for the meat industry”, and partly supported by Jiangsu Natural Science Foundation (JSNSF) under Grant BK20131296, and National Nature Science Foundation of China (NSFC) under Grant 61101165. The authors thank Lantmännen Danpo A/S for providing the chicken images.


  1. 1.
    Bo, L., Ren, X., Fox, D.: Kernel descriptors for visual recognition. In: NIPS, pp. 244–252 (2010)Google Scholar
  2. 2.
    Bo, L., Lai, K., Ren, X., Fox, D.: Object recognition with hierarchical kernel descriptors. In: CVPR, vol. 1, pp. 1729–1736 (2011)Google Scholar
  3. 3.
    Bo, L., Sminchisescu, C.: Efficient match kernel between sets of features for visual recognition. In: NIPS, vol. 1, pp. 135–143 (2009)Google Scholar
  4. 4.
    Wang, P., et al.: Supervised kernel descriptor for visual recognition. In: CVPR, vol. 1, pp. 2858–2865 (2013)Google Scholar
  5. 5.
    Jégou, H., Douze, M., Schmid, C.: Packing bag-of-features. In: ICCV, vol. 1, pp. 2357–2364 (2009)Google Scholar
  6. 6.
    Cao, Y. et al.: Spatial-bag-of-features. In: CVPR, vol. 1, pp. 3352–3359 (2010)Google Scholar
  7. 7.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: CVPR, vol. 1, pp. 2169–2178 (2006)Google Scholar
  8. 8.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60(2), 91–110 (2004)CrossRefGoogle Scholar
  9. 9.
    Bay, H., Ess, A., Tuytelaars, T., Gool, L.: Van.: SURF: speeded up robust features. Comput. Vis. Image Underst. 110(3), 346–359 (2008)CrossRefGoogle Scholar
  10. 10.
    Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. PAMI 24(7), 971–987 (2002)CrossRefGoogle Scholar
  11. 11.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR, vol. 1, pp. 886–893 (2005)Google Scholar
  12. 12.
    Pedersen, K., Smidt, K., Ziem, A., Igel, C.: Shape index descriptors applied to texture-based galaxy analysis. In: ICCV, vol. 1, pp. 2240–2447 (2013)Google Scholar
  13. 13.
    Alcantarilla, P.F., Bartoli, A., Davison, A.J.: KAZE features. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VI. LNCS, vol. 7577, pp. 214–227. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  14. 14.
    Scholkopf, B., Smola, A., Mulle, K.: Kernel principal component analysis. In: ICANN, vol. 1327, pp. 583–588 (1997)Google Scholar
  15. 15.
    Scholkopf, B., Smola, A., Mulle, K.: Nonlinear component analysis as a kernel eigenvalue problem. Neural Comput. 10(5), 1299–1319 (1998)CrossRefGoogle Scholar
  16. 16.
    Battiti, R.: Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 5(4), 537–550 (1994)CrossRefGoogle Scholar
  17. 17.
    Peng, H., Long, F., Ding, C.: Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. PAMI 27(8), 1226–1238 (2005)CrossRefGoogle Scholar
  18. 18.
    Yang, H., Moody, J.: Feature selection based on joint mutual information. Int. ICSC Symp. Adv. Intell. Data Anal. vol. 1, pp. 22–25 (1999)Google Scholar
  19. 19.
    Kwak, N., Choi, C.: Input feature selection by mutual information based on parzen window. IEEE Trans. PAMI 24(12), 1667–1671 (2002)CrossRefGoogle Scholar
  20. 20.
    Zhang, Z., Hancock, E.R.: A graph-based approach to feature selection. In: Jiang, X., Ferrer, M., Torsello, A. (eds.) GbRPR 2011. LNCS, vol. 6658, pp. 205–214. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  21. 21.
    Liu, C., Shum, H.: Kullback-Leibler boosting. In: CVPR, vol. 1, pp. 587–594 (2003)Google Scholar
  22. 22.
    Qiu, Q., Patel, V., Chellappa, R.: Information-theoretic dictionary learning for image classification. IEEE Trans. PAMI 36(11), 2173–2184 (2014)CrossRefGoogle Scholar
  23. 23.
    Brown, G., Pocock, A., Zhao, M., Luján, M.: Conditional likelihood maximisation: a unifying framework for information theoretic feature selection. J. Mach. Learn. Res. 13(1), 27–66 (2012)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Leiva, J., Artes, A.: Information-theoretic linear feature extraction based on kernel density estimators: a review. IEEE Trans. Syst. Man Cybern. Part C Appl. Rev. 42(6), 1180–1189 (2012)CrossRefGoogle Scholar
  25. 25.
    Hild II, K., Erdogmus, D., Principe, J.: An analysis of entropy estimators for blind source separation. Sign. Proces. 86(1), 182–194 (2006)CrossRefzbMATHGoogle Scholar
  26. 26.
    Hild II, K., Erdogmus, D., Torkkola, K., Principe, J.: Feature extraction using information-theoretic learning. IEEE Trans. PAMI 28(9), 1385–1392 (2006)CrossRefGoogle Scholar
  27. 27.
    Rényi, A.: On measures of entropy and information. In: Fourth Berkeley Symposium on Mathematical Statistics and Probability, pp. 547–561 (1961)Google Scholar
  28. 28.
    Principe, J.: Information theoretic learning: Renyi’s entropy and kernel perspectives. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  29. 29.
    Parzen, E.: On the estimation of a probability density function and the mode. Ann. Math. Statist. 33(3), 1065–1076 (1962)MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Jenssen, R.: Kernel entropy component analysis. IEEE Trans. PAMI 32(5), 847–860 (2010)CrossRefGoogle Scholar
  31. 31.
    Jenssen, R., Eltoft, T.: A new information theoretic analysis of sum-of-squared-error kernel clustering. Neurocomputing 72(1–3), 23–31 (2008)CrossRefGoogle Scholar
  32. 32.
    Gómez, L., Jenssen, R., Camps-Valls, G.: Kernel entropy component analysis for remote sensing image clustering. IEEE Geosci. Remote Sens. Lett. 9(2), 312–316 (2012)CrossRefGoogle Scholar
  33. 33.
    Zhong, Z., Hancock, E.: Kernel entropy-based unsupervised spectral feature selection. Int. J. Pattern Recogn. Artif. Intell. 26(5), 126002-1-18 (2012)Google Scholar
  34. 34.
    Hellman, M., Raviv, J.: Probability of error, equivocation, and the Chernoff bound. IEEE Trans. Inf. Theor. 16(4), 368–372 (1979)MathSciNetCrossRefGoogle Scholar
  35. 35.
    Georghiades, A., Belhumeur, P., Kriegman, D.: From few to many: Ilumination cone models for face recognition under variable lighting and pose. IEEE Trans. PAMI 23, 643–660 (2001)CrossRefGoogle Scholar
  36. 36.
    Li, F., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. PAMI 28(4), 594–611 (2006)CrossRefGoogle Scholar
  37. 37.
    Torralba, A., Fergus, R., Freeman, W.: 80 million tiny images: A large data set for nonparametric object and scene recognition. IEEE Trans. PAMI 30(11), 1958–1970 (2008)CrossRefGoogle Scholar
  38. 38.
    Mika, S., et al.: Fisher discriminant analysis with kernels. In: IEEE Neural Networks for Signal Processing Workshop, pp. 41–48 (1999)Google Scholar
  39. 39.
    He, X., et al.: Face recognition using laplacianfaces. IEEE Trans. PAMI 27(3), 328–340 (2005)CrossRefGoogle Scholar
  40. 40.
    Jia, Y., Huang, C., Darrell, T.: Beyond spatial pyramids: Receptive field learning for pooled image features. In: CVPR, vol. 1, pp. 3370–3377 (2012)Google Scholar
  41. 41.
    Jiang, Z., Zhang, G., Davis, L.: Submodular dictionary learning for sparse coding. In: CVPR, vol. 1, pp. 3418–3425 (2012)Google Scholar
  42. 42.
    Boureau, Y., et al.: Ask the locals: Multi-way local pooling for image recognition. In: ICCV, vol. 1, pp. 2651–2658 (2011)Google Scholar
  43. 43.
    Liu, L., et al.: In defense of soft-assignment coding. In: ICCV, pp. 2486–2493 (2011)Google Scholar
  44. 44.
    Zeiler, M., Taylor, W., Fergus, R.: Adaptive deconvolutional networks for mid and high level feature learning. In: ICCV, vol. 1, pp. 2018–2025 (2011)Google Scholar
  45. 45.
    Feng, J., Ni, B., Tian, Q., Yan, S.: Geometric p-norm feature pooling for image classification. In: CVPR, vol. 1, pp. 2697–2704 (2011)Google Scholar
  46. 46.
    Oliveira, G., Nascimento, E., Vieira, A.: Sparse spatial coding: a novel approach for efficient and accurate object recognition. In: ICRA, pp. 2592–2598 (2012)Google Scholar
  47. 47.
    McCann, S., Lowe, D.G.: Spatially local coding for object recognition. In: Lee, K.M., Matsushita, Y., Rehg, J.M., Hu, Z. (eds.) ACCV 2012, Part I. LNCS, vol. 7724, pp. 204–217. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  48. 48.
    Bo, L., Ren, X., Fox, D.: Multipath sparse coding using hierarchical matching pursuit. In: CVPR, vol. 1, pp. 660–667 (2013)Google Scholar
  49. 49.
    Seidenari, L., Serra, G., Bagdanov, A., Del Bimbo, A.: Local pyramidal descriptors for image recognition. IEEE Trans. PAMI 36(5), 1033–1040 (2014)CrossRefGoogle Scholar
  50. 50.
    Bosch, A., Zisserman, A., Munoz, X.: Image classification using random forests and ferns. In: ICCV, vol. 1, pp. 1–8 (2007)Google Scholar
  51. 51.
    Ciresan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: CVPR, pp. 3642–3649 (2012)Google Scholar
  52. 52.
    Zeiler, M., Fergus, R.: Stochastic pooling for regularization of deep convolutional neural networks. In: ICLR (2013)Google Scholar
  53. 53.
    Le, Q., et al.: Tiled convolutional neural networks. In: NIPS, vol. 1, pp. 1279–1287 (2010)Google Scholar
  54. 54.
    Yu, K., Zhang, T.: Improved local coordinate coding using local tangents. In: ICML, vol. 1, pp. 1215–1222 (2010)Google Scholar
  55. 55.
    Goodfellow, I., Courville, A., Bengio, Y.: Spike-and-slab sparse coding for unsupervised feature discovery. In: NIPS Workshop on Challenges in Learning Hierarchical Models (2011)Google Scholar
  56. 56.
    Everingham, M., et al.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of CopenhagenKøbenhavn ØDenmark
  2. 2.School of AutomationSoutheast UniversityNanjingChina

Personalised recommendations