Viewpoint Selection for Human Actions

Abstract

In many scenarios a dynamic scene is filmed by multiple video cameras located at different viewing positions. Visualizing such multi-view data on a single display raises an immediate question—which cameras capture better views of the scene? Typically, (e.g. in TV broadcasts) a human producer manually selects the best view. In this paper we wish to automate this process by evaluating the quality of a view, captured by every single camera. We regard human actions as three-dimensional shapes induced by their silhouettes in the space-time volume. The quality of a view is then evaluated based on features of the space-time shape, which correspond with limb visibility. Resting on these features, two view quality approaches are proposed. One is generic while the other can be trained to fit any preferred action recognition method. Our experiments show that the proposed view selection provide intuitive results which match common conventions. We further show that it improves action recognition results.

This is a preview of subscription content, access via your institution.

References

  1. Assa, J., Cohen-Or, D., Yeh, I. C., & Lee, T. Y. (2008). Motion overview of human actions. In International conference on computer graphics and interactive techniques, New York, NY, USA. New York: ACM Press.

    Google Scholar 

  2. Assa, J., Wolf, L., & Cohen-Or, D. (2010). The virtual director: a correlation-based online viewing of human motion. In Eurographics.

    Google Scholar 

  3. Attneave, F. (1954). Some informational aspects of visual perception. Psychological Review, 61(3), 183–193.

    Article  Google Scholar 

  4. Ballan, L., Brostow, G. J., Puwein, J., & Pollefeys, M. (2010). Unstructured video-based rendering: interactive exploration of casually captured videos. In ACM SIGGRAPH 2010 papers (pp. 1–11). New York: ACM Press.

    Google Scholar 

  5. Bobick, A. F., & Davis, J. W. (2001). The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3), 257–267.

    Article  Google Scholar 

  6. Bordoloi, U., & Shen, H. W. (2005). View selection for volume rendering. In IEEE Visualization (Vol. 5, pp. 487–494). New York: IEEE Press.

    Google Scholar 

  7. Christie, M., Olivier, P., & Normand, J. M. (2008). Camera control in computer graphics. Computer Graphics Forum, 27, 2197–2218.

    Article  Google Scholar 

  8. El-Alfy, H., Jacobs, D., & Davis, L. (2009). Assigning cameras to subjects in video surveillance systems. In Proceedings of the 2009 IEEE international conference on robotics and automation (pp. 3623–3629). New York: IEEE Press.

    Google Scholar 

  9. Feldman, J., & Singh, M. (2005). Information along contours and object boundaries. Psychological Review, 112(1), 243–252.

    Article  Google Scholar 

  10. Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2247–2253.

    Article  Google Scholar 

  11. Goshorn, R., Goshorn, J., Goshorn, D., & Aghajan, H. (2007). Architecture for cluster-based automated surveillance network for detecting and tracking multiple persons. In 1st int. conf. on distributed smart cameras (ICDSC).

    Google Scholar 

  12. IXMAS (2006). http://charibdis.inrialpes.fr.

  13. Johansson, G. (1973). Visual perception of biological motion and a model for its analysis. In Perceiving events and objects.

    Google Scholar 

  14. Junejo, I. N., Dexter, E., Laptev, I., & Pérez, P. (2010). View-independent action recognition from temporal self-similarities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 172–185. doi:10.1109/TPAMI.2010.68.

    Article  Google Scholar 

  15. Kindlmann, G., Whitaker, R., Tasdizen, T., & Moller, T. (2003). Curvature-based transfer functions for direct volume rendering: methods and applications. In Proceedings of the 14th IEEE visualization 2003 (p. 67). Los Alamitos: IEEE Computer Society.

    Google Scholar 

  16. Krahnstoever, N., Yu, T., Lim, S. N., Patwardhan, K., & Tu, P. (2008). Collaborative real-time control of active cameras in large scale surveillance systems. In Workshop on multi-camera and multi-modal sensor fusion algorithms and applications-M2SFA2.

    Google Scholar 

  17. Lee, C. H., Varshney, A., & Jacobs, D. W. (2005). Mesh saliency. ACM Transactions on Graphics, 24(3), 659–666.

    Article  Google Scholar 

  18. Mudge, M., Ryan, N., & Scopigno, R. (2005). Viewpoint quality and scene understanding. In Vast 2005 (p. 67).

    Google Scholar 

  19. Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT Press.

    Google Scholar 

  20. Tran, D., & Sorokin, A. (2008). Human activity recognition with metric learning. In Proceedings of the 10th European conference on computer vision: part I (p. 561). Berlin: Springer.

    Google Scholar 

  21. Vazquez, P. P., Feixas, M., Sbert, M., & Heidrich, W. (2003). Automatic view selection using viewpoint entropy and its application to image-based modelling. Computer Graphics Forum, 22, 689–700.

    Article  Google Scholar 

  22. Vieira, T., Bordignon, A., Peixoto, A., Tavares, G., Lopes, H., Velho, L., & Lewiner, T. (2009). Learning good views through intelligent galleries. Computer Graphics Forum, 28, 717–726.

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Dmitry Rudoy.

Electronic Supplementary Material

Below is the link to the electronic supplementary material.

(AVI 4.97 MB)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Rudoy, D., Zelnik-Manor, L. Viewpoint Selection for Human Actions. Int J Comput Vis 97, 243–254 (2012). https://doi.org/10.1007/s11263-011-0484-5

Download citation

Keywords

  • Video analysis
  • Viewpoint selection
  • Human actions
  • Multiple viewpoints