Joint People, Event, and Location Recognition in Personal Photo Collections Using Cross-Domain Context

Lin, Dahua; Kapoor, Ashish; Hua, Gang; Baker, Simon

doi:10.1007/978-3-642-15549-9_18

Joint People, Event, and Location Recognition in Personal Photo Collections Using Cross-Domain Context

Dahua Lin^19,20,
Ashish Kapoor²⁰,
Gang Hua²¹ &
…
Simon Baker²⁰

Conference paper

8845 Accesses
21 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 6311))

Abstract

We present a framework for vision-assisted tagging of personal photo collections using context. Whereas previous efforts mainly focus on tagging people, we develop a unified approach to jointly tag across multiple domains (specifically people, events, and locations). The heart of our approach is a generic probabilistic model of context that couples the domains through a set of cross-domain relations. Each relation models how likely the instances in two domains are to co-occur. Based on this model, we derive an algorithm that simultaneously estimates the cross-domain relations and infers the unknown tags in a semi-supervised manner. We conducted experiments on two well-known datasets and obtained significant performance improvements in both people and location recognition. We also demonstrated the ability to infer event labels with missing timestamps (i.e. with no event features).

The research described in this paper was conducted when all four authors were affiliated with Microsoft Research Redmond.

Download to read the full chapter text

Chapter PDF

References

Gallagher, A.C., Tsuhan, C.: Using context to recognize people in consumer images. IPSJ Journal 49, 1234–1245 (2008)
Google Scholar
Zhang, L., Chen, L., Li, M., Zhang, H.: Automated annotation of human faces in family albums. In: 11th ACM Conf. on Multimedia (2003)
Google Scholar
Davis, M., Smith, M., Canny, J., Good, N., King, S., Janakiraman, R.: Towards context-aware face recognition. In: 13th ACM Conf. on Multimedia (2005)
Google Scholar
Davis, M., Smith, M., Stentiford, F., Bamidele, A., Canny, J., Good, N., King, S., Janakiraman, R.: Using context and similarity for face and location identification. In: SPIE’06 (2006)
Google Scholar
Song, Y., Leung, T.: Context-aided human recognition - clustering. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 382–395. Springer, Heidelberg (2006)
Chapter Google Scholar
Naaman, M., Garcia Molina, H., Paepcke, A., Yeh, R.B.: Leveraging context to resolve identity in photo albums. In: ACM/IEEE-CS Joint Conf. on Digi. Lib. (2005)
Google Scholar
Gallagher, A.C., Tsuhan, C.: Using a markov network to recognize people in consumer images. In: ICIP (2007)
Google Scholar
Gallagher, A.C., Chen, T.: Using group prior to identify people in consumer images. In: CVPR Workshop on SLAM’07 (2007)
Google Scholar
Anguelov, D., Lee, K.c., Gokturk, S.B., Sumengen, B.: Contextual identity recognition in personal photo albums. In: CVPR’07 (2007)
Google Scholar
Kapoor, A., Hua, G., Akbarzadeh, A., Baker, S.: Which faces to tag: Adding prior constraints into active learning. In: ICCV’09 (2009)
Google Scholar
Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: ICCV’03 (2003)
Google Scholar
Torralba, A.: Contextual priming for object detection. Int’l. J. on Computer Vision 53, 169–191 (2003)
Article Google Scholar
Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: ICCV’07 (2007)
Google Scholar
Galleguillos, C., Rabinovich, A., Belongie, S.: Object categorization using co-occurrence, location and appearance. In: CVPR’08 (2008)
Google Scholar
Li, L.J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: CVPR’07 (2007)
Google Scholar
Li, L.J., Socher, R., Fei-Fei, L.: Towards total scene understanding: Classification, annotation, and segmentation in an automatic framework. In: CVPR’09 (2009)
Google Scholar
Cao, L., Luo, J., Kautz, H., Huang, T.S.: Annotating collections of photos using hierarchical event and scene models. In: CVPR’08 (2008)
Google Scholar
Sutton, C., McCallum, A.: An Introduction to Conditional Random Fields for Relational Learning. In: Introduction to Statistical Learning. MIT Press, Cambridge (2007)
Google Scholar
Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning 1, 1–305 (2008)
Article MATH Google Scholar
Wainwright, M.J., Jaakkola, T., Willsky, A.: A new class of upper bounds on the log partition function. IEEE Transaction on Information Theory 51, 2313–2335 (2005)
Article MathSciNet Google Scholar
Byrd, R.H., Lu, P., Nocedal, J.: A limited memory algorithm for bound constrained optimization. SIAM Journal on SSC 16, 1190–1208 (1995)
Article MATH MathSciNet Google Scholar
Cui, J., Wen, F., Xiao, R., Tian, Y., Tang, X.: Easyalbum: an interactive photo annotation system based on face clustering and re-ranking. In: SIGCHI, pp. 367–376 (2007)
Google Scholar
Gallagher, A.C.: Clothing cosegmentation for recognizing people. In: CVPR’08 (2008)
Google Scholar
Hua, G., Akbarzadeh, A.: A robust elastic and partial matching metric for face recognition. In: ICCV’09 (2009)
Google Scholar
Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int’l. Journal on Computer Vision 40, 99–121 (2000)
Article MATH Google Scholar
Schroff, F., Zitnick, C., Baker, S.: Clustering videos by location. In: British Machine Vision Conference (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Artificial Intelligence Laboratory, MIT,
Dahua Lin
Microsoft Research,
Dahua Lin, Ashish Kapoor & Simon Baker
Nokia Research Center Hollywood,
Gang Hua

Authors

Dahua Lin
View author publications
You can also search for this author in PubMed Google Scholar
Ashish Kapoor
View author publications
You can also search for this author in PubMed Google Scholar
Gang Hua
View author publications
You can also search for this author in PubMed Google Scholar
Simon Baker
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

GRASP Laboratory, University of Pennsylvania, 3330 Walnut Street, 19104, Philadelphia, PA, USA
Kostas Daniilidis
School of Electrical and Computer Engineering, National Technical University of Athens, 15773, Athens, Greece
Petros Maragos
Department of Applied Mathematics, Ecole Centrale de Paris, Grande Voie des Vignes, 92295, Chatenay-Malabry, France
Nikos Paragios

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lin, D., Kapoor, A., Hua, G., Baker, S. (2010). Joint People, Event, and Location Recognition in Personal Photo Collections Using Cross-Domain Context. In: Daniilidis, K., Maragos, P., Paragios, N. (eds) Computer Vision – ECCV 2010. ECCV 2010. Lecture Notes in Computer Science, vol 6311. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15549-9_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-15549-9_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-15548-2
Online ISBN: 978-3-642-15549-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics