Abstract
Recently our research has focused on tools for video annotation — the generation of symbolic descriptions of dynamic scenes. In this paper I describe two specific examples of developing capabilities necessary for the understanding of action in video sequences. The first is a novel tracking technique that builds context-specific templates that are used only locally in time and space. The second example is some work on gesture recognition where time is implicitly represented in a probabilistic manner. We believe the future of computer vision lies in the processing of video and that while much work has been devoted to the representation and manipulation of static images, we are far behind in developing tools for considering action.
Preview
Unable to display preview. Download preview PDF.
References
A. F. Bobick and A. D. Wilson, “A state-based technique for the summarization and recognition of gesture,” Proc. Int. Conf. Comp. Vis., 1995.
A. D. Wilson and A. F. Bobick, “Learning visual behavior for gesture analysis,” in Proc. IEEE Int'l. Symp. on Comp. Vis., 1995. submitted for publication.
S. Intille and A. Bobick, “Closed-world tracking,” in Proc. Int. Conf. Comp. Vis., June 1995.
H.-H. Nagel, “From image sequences towards conceptual descriptions,” Image and Vision Comp., vol. 6, no. 2, pp. 59–74, 1988.
T. Strat and M. Fischler, “Context-based vision: recognizing objects using information from both 2D and 3D imagery,” IEEE Trans. Patt. Analy. and Mach. Intell., vol. 13, no. 10, pp. 1050–1065, 1991.
J. M. Rehg and T. Kanade, “Visual tracking of high DOF articulated structures: an application to human hand tracking,” Proc. European Conf. Comp. Vis., vol. 2, pp. 35–46, 1994.
K. Rohr, “Towards model-based recognition of human movements in image sequences,” Comp. Vis., Graph., and Img. Proc., vol. 59, no. 1, pp. 94–115, 1994.
T. Darrell and A. Pentland, “Space-time gestures,” Proc. Comp. Vis. and Pattern Rec., pp. 335–340, 1993.
Y. Cui and J. Weng, “Learning-based hand sign recognition,” in Proc. of the Intl. Workshop on Automatic Face-and Gesture-Recognition, (Zurich), 1995.
F. Quek, “Hand gesture interface for human-machine interaction,” in Proc. of Virtual Reality Systems, vol. Fall, 1993.
R. W. Picard and T. P. Minka, “Vision texture for annotation,” Journal of Multimedia Systems, vol. 3, pp. 3–14, 1995.
A. Bobick and C. Pinhanez, “Using approximate models as a source of contextual information for vision processing,” in IEEE Workshop on Context Based Vision, June 1995.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bobick, A.F. (1996). Video annotation: Computers watching video. In: Li, S.Z., Mital, D.P., Teoh, E.K., Wang, H. (eds) Recent Developments in Computer Vision. ACCV 1995. Lecture Notes in Computer Science, vol 1035. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60793-5_59
Download citation
DOI: https://doi.org/10.1007/3-540-60793-5_59
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60793-9
Online ISBN: 978-3-540-49448-5
eBook Packages: Springer Book Archive