Abstract
We present a framework for vision-assisted tagging of personal photo collections using context. Whereas previous efforts mainly focus on tagging people, we develop a unified approach to jointly tag across multiple domains (specifically people, events, and locations). The heart of our approach is a generic probabilistic model of context that couples the domains through a set of cross-domain relations. Each relation models how likely the instances in two domains are to co-occur. Based on this model, we derive an algorithm that simultaneously estimates the cross-domain relations and infers the unknown tags in a semi-supervised manner. We conducted experiments on two well-known datasets and obtained significant performance improvements in both people and location recognition. We also demonstrated the ability to infer event labels with missing timestamps (i.e. with no event features).
The research described in this paper was conducted when all four authors were affiliated with Microsoft Research Redmond.
Chapter PDF
References
Gallagher, A.C., Tsuhan, C.: Using context to recognize people in consumer images. IPSJ Journal 49, 1234–1245 (2008)
Zhang, L., Chen, L., Li, M., Zhang, H.: Automated annotation of human faces in family albums. In: 11th ACM Conf. on Multimedia (2003)
Davis, M., Smith, M., Canny, J., Good, N., King, S., Janakiraman, R.: Towards context-aware face recognition. In: 13th ACM Conf. on Multimedia (2005)
Davis, M., Smith, M., Stentiford, F., Bamidele, A., Canny, J., Good, N., King, S., Janakiraman, R.: Using context and similarity for face and location identification. In: SPIE’06 (2006)
Song, Y., Leung, T.: Context-aided human recognition - clustering. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 382–395. Springer, Heidelberg (2006)
Naaman, M., Garcia Molina, H., Paepcke, A., Yeh, R.B.: Leveraging context to resolve identity in photo albums. In: ACM/IEEE-CS Joint Conf. on Digi. Lib. (2005)
Gallagher, A.C., Tsuhan, C.: Using a markov network to recognize people in consumer images. In: ICIP (2007)
Gallagher, A.C., Chen, T.: Using group prior to identify people in consumer images. In: CVPR Workshop on SLAM’07 (2007)
Anguelov, D., Lee, K.c., Gokturk, S.B., Sumengen, B.: Contextual identity recognition in personal photo albums. In: CVPR’07 (2007)
Kapoor, A., Hua, G., Akbarzadeh, A., Baker, S.: Which faces to tag: Adding prior constraints into active learning. In: ICCV’09 (2009)
Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: ICCV’03 (2003)
Torralba, A.: Contextual priming for object detection. Int’l. J. on Computer Vision 53, 169–191 (2003)
Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: ICCV’07 (2007)
Galleguillos, C., Rabinovich, A., Belongie, S.: Object categorization using co-occurrence, location and appearance. In: CVPR’08 (2008)
Li, L.J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: CVPR’07 (2007)
Li, L.J., Socher, R., Fei-Fei, L.: Towards total scene understanding: Classification, annotation, and segmentation in an automatic framework. In: CVPR’09 (2009)
Cao, L., Luo, J., Kautz, H., Huang, T.S.: Annotating collections of photos using hierarchical event and scene models. In: CVPR’08 (2008)
Sutton, C., McCallum, A.: An Introduction to Conditional Random Fields for Relational Learning. In: Introduction to Statistical Learning. MIT Press, Cambridge (2007)
Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning 1, 1–305 (2008)
Wainwright, M.J., Jaakkola, T., Willsky, A.: A new class of upper bounds on the log partition function. IEEE Transaction on Information Theory 51, 2313–2335 (2005)
Byrd, R.H., Lu, P., Nocedal, J.: A limited memory algorithm for bound constrained optimization. SIAM Journal on SSC 16, 1190–1208 (1995)
Cui, J., Wen, F., Xiao, R., Tian, Y., Tang, X.: Easyalbum: an interactive photo annotation system based on face clustering and re-ranking. In: SIGCHI, pp. 367–376 (2007)
Gallagher, A.C.: Clothing cosegmentation for recognizing people. In: CVPR’08 (2008)
Hua, G., Akbarzadeh, A.: A robust elastic and partial matching metric for face recognition. In: ICCV’09 (2009)
Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int’l. Journal on Computer Vision 40, 99–121 (2000)
Schroff, F., Zitnick, C., Baker, S.: Clustering videos by location. In: British Machine Vision Conference (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lin, D., Kapoor, A., Hua, G., Baker, S. (2010). Joint People, Event, and Location Recognition in Personal Photo Collections Using Cross-Domain Context. In: Daniilidis, K., Maragos, P., Paragios, N. (eds) Computer Vision – ECCV 2010. ECCV 2010. Lecture Notes in Computer Science, vol 6311. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15549-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-15549-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15548-2
Online ISBN: 978-3-642-15549-9
eBook Packages: Computer ScienceComputer Science (R0)