In many scenarios a dynamic scene is filmed by multiple video cameras located at different viewing positions. Visualizing such multi-view data on a single display raises an immediate question—which cameras capture better views of the scene? Typically, (e.g. in TV broadcasts) a human producer manually selects the best view. In this paper we wish to automate this process by evaluating the quality of a view, captured by every single camera. We regard human actions as three-dimensional shapes induced by their silhouettes in the space-time volume. The quality of a view is then evaluated based on features of the space-time shape, which correspond with limb visibility. Resting on these features, two view quality approaches are proposed. One is generic while the other can be trained to fit any preferred action recognition method. Our experiments show that the proposed view selection provide intuitive results which match common conventions. We further show that it improves action recognition results.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Assa, J., Cohen-Or, D., Yeh, I. C., & Lee, T. Y. (2008). Motion overview of human actions. In International conference on computer graphics and interactive techniques, New York, NY, USA. New York: ACM Press.
Assa, J., Wolf, L., & Cohen-Or, D. (2010). The virtual director: a correlation-based online viewing of human motion. In Eurographics.
Attneave, F. (1954). Some informational aspects of visual perception. Psychological Review, 61(3), 183–193.
Ballan, L., Brostow, G. J., Puwein, J., & Pollefeys, M. (2010). Unstructured video-based rendering: interactive exploration of casually captured videos. In ACM SIGGRAPH 2010 papers (pp. 1–11). New York: ACM Press.
Bobick, A. F., & Davis, J. W. (2001). The recognition of human movement using temporal templates. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(3), 257–267.
Bordoloi, U., & Shen, H. W. (2005). View selection for volume rendering. In IEEE Visualization (Vol. 5, pp. 487–494). New York: IEEE Press.
Christie, M., Olivier, P., & Normand, J. M. (2008). Camera control in computer graphics. Computer Graphics Forum, 27, 2197–2218.
El-Alfy, H., Jacobs, D., & Davis, L. (2009). Assigning cameras to subjects in video surveillance systems. In Proceedings of the 2009 IEEE international conference on robotics and automation (pp. 3623–3629). New York: IEEE Press.
Feldman, J., & Singh, M. (2005). Information along contours and object boundaries. Psychological Review, 112(1), 243–252.
Gorelick, L., Blank, M., Shechtman, E., Irani, M., & Basri, R. (2007). Actions as space-time shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(12), 2247–2253.
Goshorn, R., Goshorn, J., Goshorn, D., & Aghajan, H. (2007). Architecture for cluster-based automated surveillance network for detecting and tracking multiple persons. In 1st int. conf. on distributed smart cameras (ICDSC).
IXMAS (2006). http://charibdis.inrialpes.fr.
Johansson, G. (1973). Visual perception of biological motion and a model for its analysis. In Perceiving events and objects.
Junejo, I. N., Dexter, E., Laptev, I., & Pérez, P. (2010). View-independent action recognition from temporal self-similarities. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(1), 172–185. doi:10.1109/TPAMI.2010.68.
Kindlmann, G., Whitaker, R., Tasdizen, T., & Moller, T. (2003). Curvature-based transfer functions for direct volume rendering: methods and applications. In Proceedings of the 14th IEEE visualization 2003 (p. 67). Los Alamitos: IEEE Computer Society.
Krahnstoever, N., Yu, T., Lim, S. N., Patwardhan, K., & Tu, P. (2008). Collaborative real-time control of active cameras in large scale surveillance systems. In Workshop on multi-camera and multi-modal sensor fusion algorithms and applications-M2SFA2.
Lee, C. H., Varshney, A., & Jacobs, D. W. (2005). Mesh saliency. ACM Transactions on Graphics, 24(3), 659–666.
Mudge, M., Ryan, N., & Scopigno, R. (2005). Viewpoint quality and scene understanding. In Vast 2005 (p. 67).
Schölkopf, B., & Smola, A. J. (2002). Learning with kernels. Cambridge: MIT Press.
Tran, D., & Sorokin, A. (2008). Human activity recognition with metric learning. In Proceedings of the 10th European conference on computer vision: part I (p. 561). Berlin: Springer.
Vazquez, P. P., Feixas, M., Sbert, M., & Heidrich, W. (2003). Automatic view selection using viewpoint entropy and its application to image-based modelling. Computer Graphics Forum, 22, 689–700.
Vieira, T., Bordignon, A., Peixoto, A., Tavares, G., Lopes, H., Velho, L., & Lewiner, T. (2009). Learning good views through intelligent galleries. Computer Graphics Forum, 28, 717–726.
Electronic Supplementary Material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Rudoy, D., Zelnik-Manor, L. Viewpoint Selection for Human Actions. Int J Comput Vis 97, 243–254 (2012). https://doi.org/10.1007/s11263-011-0484-5
- Video analysis
- Viewpoint selection
- Human actions
- Multiple viewpoints