Determining Patch Saliency Using Low-Level Context

  • Devi Parikh
  • C. Lawrence Zitnick
  • Tsuhan Chen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5303)


The increased use of context for high level reasoning has been popular in recent works to increase recognition accuracy. In this paper, we consider an orthogonal application of context. We explore the use of context to determine which low-level appearance cues in an image are salient or representative of an image’s contents. Existing classes of low-level saliency measures for image patches include those based on interest points, as well as supervised discriminative measures. We propose a new class of unsupervised contextual saliency measures based on co-occurrence and spatial information between image patches. For recognition, image patches are sampled using a weighted random sampling based on saliency, or using a sequential approach based on maximizing the likelihoods of the image patches. We compare the different classes of saliency measures, along with a baseline uniform measure, for the task of scene and object recognition using the bag-of-features paradigm. In our results, the contextual saliency measures achieve improved accuracies over the previous methods. Moreover, our highest accuracy is achieved using a sparse sampling of the image, unlike previous approaches who’s performance increases with the sampling density.


Object Recognition Recognition Accuracy Interest Point Image Patch Salient Region 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. In: IJCV (2004)Google Scholar
  2. 2.
    Harris, C., Stephens, M.: A combined corner and edge detector. In: AVC (1988)Google Scholar
  3. 3.
    Kadir, T., Brady, M.: Saliency, scale and image description. In: IJCV (2001)Google Scholar
  4. 4.
    Mikolajczyk, K., Schmid, C.: Scale and affine invariant interest point detectors. In: IJCV (2004)Google Scholar
  5. 5.
    Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from maximally stable extremal regions. In: BMVC (2002)Google Scholar
  6. 6.
    Schmid, C., Mohr, R., Bauckhage, C.: Evaluation of interest point detectors. In: IJCV (2000)Google Scholar
  7. 7.
    Nowak, E., Jurie, F., Triggs, B.: Sampling strategies for bag-of-features image classification. In: ECCV (2006)Google Scholar
  8. 8.
    Moosmann, F., Larlus, D., Jurie, F.: Learning saliency maps for object categorization. In: ECCV International Workshop on The Representation and Use of Prior Knowledge in Vision (2006)Google Scholar
  9. 9.
    Walker, K., Cootes, T., Taylor, C.: Locating salient object features. In: BMVC (1998)Google Scholar
  10. 10.
    Fritz, G., Seifert, C., Paletta, L., Bischof, H.: Entropy based saliency maps for object recognition. In: ECOVISION (2004)Google Scholar
  11. 11.
    Serre, T., Riesenhuber, M., Louie, J., Poggio, T.: On the role of object-specific features for real world object recognition in biological vision. In: BMVC (2002)Google Scholar
  12. 12.
    Vidal-Naquet, M., Ullman, S.: Object recognition with informative features and linear classification. In: ICCV (2003)Google Scholar
  13. 13.
    Leung, T., Malik, J.: Representing and recognizing the visual appearance of materials using three-dimensional textons. In: IJCV (2001)Google Scholar
  14. 14.
    Lazebnik, S., Schmid, C., Ponce, J.: Affine-invariant local descriptors and neighborhood statistics for texture recognition. In: ICCV (2003)Google Scholar
  15. 15.
    Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: ECCV workshop on Statistical Learning in Computer Vision (2004)Google Scholar
  16. 16.
    Winn, J., Criminisi, A., Minka, T.: Object categorization by learned universal visual dictionary. In: ICCV (2005)Google Scholar
  17. 17.
    Fergus, R., Fei-Fei, L., Perona, P., Zisserman, A.: Learning object categories from googles image search. In: ICCV (2005)Google Scholar
  18. 18.
    Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in videos. In: ICCV (2003)Google Scholar
  19. 19.
    Sivic, J., Russell, B., Efros, A.A., Zisserman, A., Freeman, B.: Discovering objects and their location in images. In: ICCV (2005)Google Scholar
  20. 20.
    Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: ICCV (2005)Google Scholar
  21. 21.
    Ye, Y., Tsotsos, J.K.: Where to look next in 3d object search. In: ISCV (1995)Google Scholar
  22. 22.
    Viola, P., Jones, M.: Robust real-time object detection. In: IJCV (2001)Google Scholar
  23. 23.
    Grauman, K., Darrell, T.: Efficient image matching with distributions of local invariant features. In: CVPR (2005)Google Scholar
  24. 24.
    Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: CVPR (2005)Google Scholar
  25. 25.
    Agarwal, A., Triggs, B.: Hyperfeatures – multilevel local coding for visual recognition. In: ECCV (2006)Google Scholar
  26. 26.
    Treisman, A.M., Gelade, G.: A feature-integration theory of attention. Cognitive Psychology (1980)Google Scholar
  27. 27.
    Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. In: PAMI (1998)Google Scholar
  28. 28.
    Koch, C., Ullman, S.: Shifts in selective visual attention: towards the underlying neural circuitry. Human Neurobiology (1985)Google Scholar
  29. 29.
    Sebe, N., Lew, M.: Comparing salient point detectors. Pattern Recognition Letters (2003)Google Scholar
  30. 30.
    Hall, D., Leibe, B., Schiele, B.: Saliency of interest points under scale changes. In: BNVC (2002)Google Scholar
  31. 31.
    Walther, D., Rutishauser, U., Koch, C., Perona, P.: On the usefulness of attention for object recognition. In: ECCV (2004)Google Scholar
  32. 32.
    Hoiem, D., Efros, A., Hebert, M.: Putting objects in perspective. In: CVPR (2006)Google Scholar
  33. 33.
    Torralba, A., Murphy, K., Freeman, W.: Contextual models for object detection using boosted random fields. In: NIPS (2005)Google Scholar
  34. 34.
    Torralba, A., Sinha, P.: Statistical context priming for object detection. In: ICCV (2001)Google Scholar
  35. 35.
    Murphy, K., Torralba, A., Freeman, W.: Using the forest to see the trees: a graphical model relating features, objects, and scenes. In: NIPS (2003)Google Scholar
  36. 36.
    Bose, B., Grimson, E.: Improving object classification in far-field video. In: ECCV (2004)Google Scholar
  37. 37.
    Torralba, A., Murphy, K., Freeman, W., Rubin, M.: Context-based vision system for place and object recognition. AI Memo (2003)Google Scholar
  38. 38.
    Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: ICCV (2007)Google Scholar
  39. 39.
    Parikh, D., Zitnick, C.L., Chen, T.: From appearance to context-based recognition: Dense labeling in small images. In: CVPR (2008)Google Scholar
  40. 40.
    Singhal, A., Luo, J., Zhu, W.: Probabilistic spatial context models for scene content understanding. In: CVPR (2003)Google Scholar
  41. 41.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. In: IJCV (2001)Google Scholar
  42. 42.
  43. 43.
    Torralba, A.: Outdoor scene category dataset,
  44. 44.
    Shotton, J., Winn, J., Rother, C., Criminisi, A.: Textonboost: joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: ECCV (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Devi Parikh
    • 1
  • C. Lawrence Zitnick
    • 2
  • Tsuhan Chen
    • 1
  1. 1.Carnegie Mellon UniversityPittsburghUSA
  2. 2.Microsoft ResearchRedmondUSA

Personalised recommendations