Indexicality and dynamic attention control in qualitative recognition of assembly actions

  • Yasuo Kuniyoshi
  • Hirochika Inoue
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 588)


Visual recognition of physical actions requires temporal segmentation and identification of action types. Action concepts are analyzed into attention, context, and change. Temporal segmentation is defined as a context switch detected by a switching of attention. Actions are identified by detecting “indexical” features which can be quickly calculated from visual features and directly point to action concepts. Validity of the indexicality depends on the attention and the context. These are maintained by three types of attention control: spatial, temporal and hierarchical. They are combined by a mechanism called “attention stack”, which extends at important points and winds up elsewhere. An action recognizer built upon the framework successfully recognized human assembly action sequences in real time and output qualitative descriptions of the tasks.


Visual Search Visual Feature Target Object Attention Control Assembly Action 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    J. O'rourke and N. J. Badler. Model-based image analysis of human motion using constraint propagation. IEEE Trans., PAMI-2(6):522–536, 1980.Google Scholar
  2. 2.
    M. Yamamoto and K. Koshikawa. Human motion analysis based on a robot arm model. In Proc. of IEEE Conf. CVPR, 664–665, 1991.Google Scholar
  3. 3.
    S. Tsuji, A. Morizono, and S. Kuroda. Understanding a simple cartoon film by a computer vision system. In Proc. IJCAI5, 609–610, 1977.Google Scholar
  4. 4.
    R. Thibadeau. Artificial perception of actions. Cognitive Science, 10(2):117–149, 1986.Google Scholar
  5. 5.
    J. Aloimonos and I. Weiss. Active vision. Int. J. of Computer Vision, 333–356, 1988.Google Scholar
  6. 6.
    D. H. Ballard. Reference frames for animate vision. In Proc. IJCAI, 1635–1641, 1989.Google Scholar
  7. 7.
    S. D. Whitehead and D. H. Ballard. Learning to perceive and act. Technical Report 331, Computer Science Dept., Univ. of Rochester, Rochester, NY 14627, USA, June 1990.Google Scholar
  8. 8.
    Y. Kuniyoshi, M. Inaba, and H. Inoue. Teaching by showing: Generating robot programs by visual observation of human performance. In Proc. ISIR20, 119–126, 1989.Google Scholar
  9. 9.
    J. R. Hobbs. Granularity. In Proc. IJCAI, 432–435, 1985.Google Scholar
  10. 10.
    Y. Kuniyoshi, M. Inaba, and H. Inoue. Seeing, understanding and doing human task. In Proc. IEEE Int. Conf. Robotics and Automation, 1992.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1992

Authors and Affiliations

  • Yasuo Kuniyoshi
    • 1
  • Hirochika Inoue
    • 2
  1. 1.Autonomous Systems Section, Intelligent Systems DivisionElectrotechnical LaboratoryIbarakiJapan
  2. 2.Department of Mechano-InformaticsThe University of TokyoTokyoJapan

Personalised recommendations