Multidimensional Systems and Signal Processing

, Volume 30, Issue 1, pp 175–193 | Cite as

Extended histogram: probabilistic modelling of video content temporal evolutions

  • Elham Shabaninia
  • Ahmad Reza Naghsh-NilchiEmail author
  • Shohreh Kasaei


A probabilistic video content analysis method called extended histogram (EH) is proposed for modelling temporal evolutions of a set of histograms extracted from video frames. In EH, the number of counts for each histogram bin is considered as a random variable (instead of a single value) to account for bin variations. This representation is especially suitable for modelling the dynamic behaviour of a tracked video content of interest in a general manner. The pitfall of such a modelling is its negligence of the temporal order of observations in the collection. To overcome that problem, a hierarchical approach called hierarchical extended histogram (HEH) is proposed for extracting EHs in different levels of the temporal pyramid. Once these generative models are identified for each video, an information-based metric is proposed to be used for defining the similarity of the two EHs. Having this metric, EHs can be used in many different tasks including video retrieval, classification, summarization, and so forth. Especially in the case of discriminant learning, probabilistic kernels based on this metric are also defined to be able to use EHs/HEHs alongside machine learning models such as the SVM. Person re-identification and human action recognition are used as pilot applications to show the capabilities of proposed representations. Experimental results show the significant effectiveness of proposed models.


Extended histogram (EH) Hierarchical extended histogram (HEH) Probabilistic modelling Temporal evolutions Human action recognition Person re-identification 


  1. Aggarwal, J., & Ryoo, M. S. (2011). Human activity analysis: A review. ACM Computing Surveys (CSUR), 43(3), 16.CrossRefGoogle Scholar
  2. Bazzani, L., Cristani, M., & Murino, V. (2013). Symmetry-driven accumulation of local features for human characterization and re-identification. Computer Vision and Image Understanding, 117(2), 130–144.CrossRefGoogle Scholar
  3. Bedagkar-Gala, A., & Shah, S. K. (2014). A survey of approaches and trends in person re-identification. Image and Vision Computing, 32(4), 270–286.CrossRefGoogle Scholar
  4. Bishop, C. M. (2006). Pattern recognition and machine learning. Berlin: Springer.zbMATHGoogle Scholar
  5. Chaudhry, R., Ravichandran, A., Hager, G., & Vidal, R. (2009). Histograms of oriented optical flow and Binet–Cauchy kernels on nonlinear dynamical systems for the recognition of human actions. In Computer vision and pattern recognition, CVPR 2009. IEEE conference on (pp. 1932–1939). IEEE.Google Scholar
  6. Chen, L., Wei, H., & Ferryman, J. (2013a). A survey of human motion analysis using depth imagery. Pattern Recognition Letters, 34(15), 1995–2006.CrossRefGoogle Scholar
  7. Chen, Y., Lin, W., Zhang, C., Chen, Z., Xu, N., & Xie, J. (2013b). Intra-and-inter-constraint-based video enhancement based on piecewise tone mapping. IEEE Transactions on Circuits and Systems for Video Technology, 23(1), 74–82.CrossRefGoogle Scholar
  8. Cippitelli, E., Gasparrini, S., Gambi, E., & Spinsante, S. (2016). A human activity recognition system using skeleton data from RGBD sensors. Computational Intelligence and Neuroscience, 2016, 4351435.
  9. Costantini, L., Seidenari, L., Serra, G., Capodiferro, L., & Del Bimbo, A. (2011). Space-time Zernike moments and pyramid kernel descriptors for action classification. In International conference on image analysis and processing (pp. 199–208). Berlin: Springer.Google Scholar
  10. Faria, D. R., Premebida, C., & Nunes, U. (2014). A probabilistic approach for human everyday activities recognition using body motion from RGB-D images. In Robot and human interactive communication, 2014 RO-MAN: The 23rd IEEE international symposium on (pp. 732–737). IEEE.Google Scholar
  11. Fathi, A., & Naghsh-Nilchi, A. R. (2012). Noise tolerant local binary pattern operator for efficient texture analysis. Pattern Recognition Letters, 33(9), 1093–1100.CrossRefGoogle Scholar
  12. Gaglio, S., Re, G. L., & Morana, M. (2015). Human activity recognition process using 3-D posture data. IEEE Transactions on Human-Machine Systems, 45(5), 586–597.CrossRefGoogle Scholar
  13. Gao, C., Wang, J., Liu, L., Yu, J.-G., & Sang, N. (2016). Temporally aligned pooling representation for video-based person re-identification. In Image processing (ICIP), 2016 IEEE international conference on (pp. 4284–4288). IEEE.Google Scholar
  14. Grauman, K., & Darrell, T. (2005). The pyramid match kernel: Discriminative classification with sets of image features. In Computer vision, 2005. ICCV 2005. Tenth IEEE international conference on (Vol. 2, pp. 1458–1465). IEEE.Google Scholar
  15. Gray, D., Brennan, S., & Tao, H. (2007). Evaluating appearance models for recognition, reacquisition, and tracking. In IEEE international workshop on performance evaluation of tracking and surveillance. CiteseerGoogle Scholar
  16. Gupta, R., Chia, A. Y.-S., & Rajan, D. (2013). Human activities recognition using depth images. In Proceedings of the 21st ACM international conference on multimedia (pp. 283–292). ACM.Google Scholar
  17. Haibin, L., & Jacobs, D. W. (2005). Using the inner-distance for classification of articulated shapes. In Computer vision and pattern recognition, 2005. CVPR 2005. IEEE computer society conference on (Vol. 2, Vol. 712, pp. 719–726).
  18. Hershey, J. R., & Olsen, P. A. (2007). Approximating the Kullback–Leibler divergence between Gaussian mixture models. In Acoustics, speech and signal processing, 2007. ICASSP. IEEE international conference on (Vol. 4, pp. IV-317–IV-320). IEEE.Google Scholar
  19. Hirzer, M., Beleznai, C., Roth, P. M., & Bischof, H. (2011). Person re-identification by descriptive and discriminative classification. In Image analysis (pp. 91–102). Berlin: Springer.Google Scholar
  20. Javed, O., Shafique, K., Rasheed, Z., & Shah, M. (2008). Modeling inter-camera space time and appearance relationships for tracking across non-overlapping views. Computer Vision and Image Understanding, 109(2), 146–162.CrossRefGoogle Scholar
  21. Ji, X., Cheng, J., Tao, D., Wu, X., & Feng, W. (2017). The spatial Laplacian and temporal energy pyramid representation for human action recognition using depth sequences. Knowledge-Based Systems, 122, 64–74.CrossRefGoogle Scholar
  22. Johnson, A. E., & Hebert, M. (1999). Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(5), 433–449.CrossRefGoogle Scholar
  23. Karanam, S., Li, Y., & Radke, R. J. (2015). Sparse re-id: Block sparsity for person re-identification. In Proceedings of the IEEE conference on computer vision and pattern recognition workshops (pp. 33–40).Google Scholar
  24. Koppula, H. S., Gupta, R., & Saxena, A. (2013). Learning human activities and object affordances from RGB-D videos. The International Journal of Robotics Research, 32(8), 951–970.CrossRefGoogle Scholar
  25. Lazebnik, S., Schmid, C., & Ponce, J. (2005). A sparse texture representation using local affine regions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8), 1265–1278.CrossRefGoogle Scholar
  26. Lin, J. (1991). Divergence measures based on the Shannon entropy. IEEE Transactions on Information Theory, 37(1), 145–151.MathSciNetCrossRefzbMATHGoogle Scholar
  27. Lin, W., Shen, Y., Yan, J., Xu, M., Wu, J., Wang, J., et al. (2017). Learning correspondence structures for person re-identification. IEEE Transactions on Image Processing, 26(5), 2438–2453.MathSciNetCrossRefzbMATHGoogle Scholar
  28. Ling, H., & Okada, K. (2006). Diffusion distance for histogram comparison. In Computer vision and pattern recognition, 2006 IEEE computer society conference on (Vol. 1, pp. 246–253). IEEE.Google Scholar
  29. Liu, Z., Chen, J., & Wang, Y. (2016). A fast adaptive spatio-temporal 3D feature for video-based person re-identification. In Image processing (ICIP), 2016 IEEE international conference on (pp. 4294–4298). IEEE.Google Scholar
  30. Madden, C., Cheng, E. D., & Piccardi, M. (2007). Tracking people across disjoint camera views by an illumination-tolerant appearance representation. Machine Vision and Applications, 18(3–4), 233–247.CrossRefzbMATHGoogle Scholar
  31. McLaughlin, N., Martinez del Rincon, J., & Miller, P. (2016). Recurrent convolutional network for video-based person re-identification. In The IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  32. Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630.CrossRefGoogle Scholar
  33. Moreno, P. J., Ho, P. P., & Vasconcelos, N. (2003). A Kullback–Leibler divergence based kernel for SVM classification in multimedia applications. In Advances in neural information processing systems.Google Scholar
  34. Mortensen, E. N., Deng, H., & Shapiro, L. (2005). A SIFT descriptor with global context. In Computer vision and pattern recognition, 2005. CVPR 2005. IEEE computer society conference on (Vol. 1, pp. 184–190): IEEE.Google Scholar
  35. Ni, B., Pei, Y., Moulin, P., & Yan, S. (2013). Multilevel depth and image fusion for human activity detection. IEEE Transactions on Cybernetics, 43(5), 1383–1394.CrossRefGoogle Scholar
  36. Oreifej, O., & Liu, Z. (2013). Hon4d: Histogram of oriented 4d normals for activity recognition from depth sequences. In Computer vision and pattern recognition (CVPR), 2013 IEEE conference on (pp. 716–723). IEEE.Google Scholar
  37. Osada, R., Funkhouser, T., Chazelle, B., & Dobkin, D. (2002). Shape distributions. ACM Transactions on Graphics (TOG), 21(4), 807–832.MathSciNetCrossRefzbMATHGoogle Scholar
  38. Parisi, G. I., Weber, C., & Wermter, S. (2015). Self-organizing neural integration of pose-motion features for human action recognition. Frontiers in Neurorobotics, 9, 3.CrossRefGoogle Scholar
  39. Poppe, R. (2010). A survey on vision-based human action recognition. Image and Vision Computing, 28(6), 976–990.CrossRefGoogle Scholar
  40. Posada, D., & Buckley, T. R. (2004). Model selection and model averaging in phylogenetics: Advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Systematic Biology, 53(5), 793–808.CrossRefGoogle Scholar
  41. Rubner, Y., Tomasi, C., & Guibas, L. J. (2000). The earth mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2), 99–121.CrossRefzbMATHGoogle Scholar
  42. Shotton, J., Cook, A. F. M., Sharp, T., Finocchio, M., Moore, R., Kipman, A., & Blake A. (2011). Real-time human pose recognition in parts from a single depth image. In CVPR.Google Scholar
  43. Sung, J., Ponce, C., Selman, B., & Saxena, A. (2012). Unstructured human activity detection from RGBD images. In Robotics and automation (ICRA), 2012 IEEE international conference on (pp. 842–849). IEEE.Google Scholar
  44. Tu, Z., & Yuille, A. L. (2004). Shape matching and recognition-using generative models and informative features. In Computer vision-ECCV 2004 (pp. 195–209). Berlin: Springer.Google Scholar
  45. Wang, T., Gong, S., Zhu, X., & Wang, S. (2016). Person re-identification by discriminative selection in video ranking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38, 2501–2514.CrossRefGoogle Scholar
  46. Xia, L., Chen, C.-C., & Aggarwal, J. (2012) View invariant human action recognition using histograms of 3D joints. In Computer vision and pattern recognition workshops (CVPRW), 2012 IEEE computer society conference on (pp. 20–27). IEEE.Google Scholar
  47. Xie, J., Lin, W., Li, H., Xu, N., Gao, H., & Zhang, L. (2011). A new temporal-constraint-based algorithm by handling temporal qualities for video enhancement. In Circuits and systems (ISCAS), 2011 IEEE international symposium on (pp. 2789–2792). IEEE.Google Scholar
  48. Yang, X., & Tian, Y. (2014). Effective 3D action recognition using eigenjoints. Journal of Visual Communication and Image Representation, 25(1), 2–11.MathSciNetCrossRefGoogle Scholar
  49. You, J., Wu, A., Li, X., & Zheng, W.-S. (2016). Top-push video-based person re-identification. In CVPR.Google Scholar
  50. Zhu, X., Jing, X.-Y., Wu, F., & Feng, H. (2016). Video-based person re-identification by simultaneously learning intra-video and inter-video distance metrics. In IJCAI.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  • Elham Shabaninia
    • 1
  • Ahmad Reza Naghsh-Nilchi
    • 1
    Email author
  • Shohreh Kasaei
    • 2
  1. 1.Department of Artificial Intelligence, Faculty of Computer EngineeringUniversity of IsfahanIsfahanIran
  2. 2.Department of Computer EngineeringSharif University of TechnologyTehranIran

Personalised recommendations