Vision-Based Reacquisition for Task-Level Control

  • Matthew R. WalterEmail author
  • Yuli Friedman
  • Matthew Antone
  • Seth Teller
Part of the Springer Tracts in Advanced Robotics book series (STAR, volume 79)


We describe a vision-based algorithm that enables a robot to “reacquire” objects previously indicated by a human user through simple image-based stylus gestures. By automatically generating a multiple-view appearance model for each object, the method can reacquire the object and reconstitute the user’s segmentation hints even after the robot has moved long distances or significant time has elapsed since the gesture. We demonstrate that this capability enables novel command and control mechanisms: after a human gives the robot a “guided tour” of named objects and their locations in the environment, he can dispatch the robot to fetch any particular object simply by stating its name. We implement the object reacquisition algorithm on an outdoor mobile manipulation platform and evaluate its performance under challenging conditions that include lighting and viewpoint variation, clutter, and object relocation.


Ground Truth Appearance Model Sift Feature Multiple Instance Learning Viewpoint Variation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Babenko, B., Yang, M.H., Belongie, S.: Visual tracking with online multiple instance learning. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition (2009)Google Scholar
  2. 2.
    Breazeal, C., Brooks, A., Gray, J., Hoffman, G., Kidd, C., Lee, H., Lieberman, J., Lockerd, A., Chilongo, D.: Tutelage and collaboration for humanoid robots. Int’l J. of Humanoid Robotics 1(2), 315–348 (2004)CrossRefGoogle Scholar
  3. 3.
    Collet, A., Berenson, D., Srinivasa, S., Ferguson, D.: Object recognition and full pose registration from a single image for robotic manipulation. In: Proc. IEEE Int’l Conf. on Robotics and Automation, pp. 48–55 (2009)Google Scholar
  4. 4.
    Everingham, M., van Gool, L., Williams, C., Zisserman, A.: The PASCAL visual object classes (VOC) challenge. Int’l J. on Computer Vision 88(2), 303–338 (2010)CrossRefGoogle Scholar
  5. 5.
    Fischler, M., Bolles, R.: Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981)MathSciNetCrossRefGoogle Scholar
  6. 6.
    Gordon, I., Lowe, D.G.: What and Where: 3D Object Recognition with Accurate Pose. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds.) Toward Category-Level Object Recognition. LNCS, vol. 4170, pp. 67–82. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  7. 7.
    Haasch, A., Hofemann, N., Fritsch, J., Sagerer, G.: A multi-modal object attention system for a mobile robot. In: Proc. IEEE/RSJ Int’l Conf. on Intelligent Robots and Systems, pp. 2712–2717 (2005)Google Scholar
  8. 8.
    Hartley, R., Zisserman, A.: Multiple View Geometry in Computer Vision, 2nd edn. Cambridge University Press (2004)Google Scholar
  9. 9.
    Hoiem, D., Rother, C., Winn, J.: 3D LayoutCRF for multi-view object class recognition and segmentation. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 1–8 (2007)Google Scholar
  10. 10.
    Lowe, D.: Local feature view clustering for 3D object recognition. In: Proc. IEEE Comp. Soc. Conf. on Computer Vision and Pattern Recognition, pp. 682–688 (2001)Google Scholar
  11. 11.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int’l J. of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  12. 12.
    Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Proc. IEEE Comp. Soc. Conf. on Computer Vision and Pattern Recognition, pp. 2161–2168 (2006)Google Scholar
  13. 13.
    Savarese, S., Li, F.: 3D generic object categorization, localization and pose estimation. In: Proc. Int’l. Conf. on Computer Vision, pp. 1–8 (2007)Google Scholar
  14. 14.
    Teller, S., Walter, M.R., Antone, M., Correa, A., Davis, R., Fletcher, L., Frazzoli, E., Glass, J., How, J., Huang, A., Jeon, J., Karaman, S., Luders, B., Roy, N., Sainath, T.: A voice-commandable robotic forklift working alongside humans in minimally-prepared outdoor environments. In: Proc. IEEE Int’l Conf. on Robotics and Automation, pp. 526–533 (2010)Google Scholar
  15. 15.
    Walter, M.R., Friedman, Y., Antone, M., Teller, S.: Appearance-based object reacquisition for mobile manipulation. In: Proc. IEEE Conf. on Computer Vision and Pattern Recognition Work, pp. 1–8 (2010)Google Scholar
  16. 16.
    Yilmaz, A., Javed, O., Shah, M.: Object tracking: A survey. ACM Comput. Surv. 38(4), 13 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Berlin Heidelberg 2014

Authors and Affiliations

  • Matthew R. Walter
    • 1
    Email author
  • Yuli Friedman
    • 2
  • Matthew Antone
    • 2
  • Seth Teller
    • 1
  1. 1.MIT CS & AI Lab (CSAIL)CambridgeUSA
  2. 2.BAE SystemsBurlingtonUSA

Personalised recommendations