Multiple Instance Metric Learning from Automatically Labeled Bags of Faces

  • Matthieu Guillaumin
  • Jakob Verbeek
  • Cordelia Schmid
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6311)


Metric learning aims at finding a distance that approximates a task-specific notion of semantic similarity. Typically, a Mahalanobis distance is learned from pairs of data labeled as being semantically similar or not. In this paper, we learn such metrics in a weakly supervised setting where “bags” of instances are labeled with “bags” of labels. We formulate the problem as a multiple instance learning (MIL) problem over pairs of bags. If two bags share at least one label, we label the pair positive, and negative otherwise. We propose to learn a metric using those labeled pairs of bags, leading to MildML, for multiple instance logistic discriminant metric learning. MildML iterates between updates of the metric and selection of putative positive pairs of examples from positive pairs of bags. To evaluate our approach, we introduce a large and challenging data set, Labeled Yahoo! News, which we have manually annotated and contains 31147 detected faces of 5873 different people in 20071 images. We group the faces detected in an image into a bag, and group the names detected in the caption into a corresponding set of labels. When the labels come from manual annotation, we find that MildML using the bag-level annotation performs as well as fully supervised metric learning using instance-level annotation. We also consider performance in the case of automatically extracted labels for the bags, where some of the bag labels do not correspond to any example in the bag. In this case MildML works substantially better than relying on noisy instance-level annotations derived from the bag-level annotation by resolving face-name associations in images with their captions.


Semantic Similarity Multiple Instance Semantic Distance Multiple Instance Learning Instance Label 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Weinberger, K., Blitzer, J., Saul, L.: Distance metric learning for large margin nearest neighbor classification. In: NIPS (2006)Google Scholar
  2. 2.
    Bilenko, M., Basu, S., Mooney, R.: Integrating constraints and metric learning in semi-supervised clustering. In: ICML, p. 11. ACM, New York (2004)CrossRefGoogle Scholar
  3. 3.
    Guillaumin, M., Verbeek, J., Schmid, C.: Is that you? Metric learning approaches for face identification. In: ICCV (2009)Google Scholar
  4. 4.
    Fu, Y., Li, Z., Huang, T., Katsaggelos, A.: Locally adaptive subspace and similarity metric learning for visual data clustering and retrieval. Computer Vision and Image Understanding 110, 390–402 (2008)CrossRefGoogle Scholar
  5. 5.
    Jain, P., Kulis, B., Dhillon, I., Grauman, K.: Online metric learning and fast similarity search. In: NIPS (2008)Google Scholar
  6. 6.
    Xing, E., Ng, A., Jordan, M., Russell, S.: Distance metric learning, with application to clustering with side-information. In: NIPS (2004)Google Scholar
  7. 7.
    Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning a Mahanalobis metric from equivalence constraints. Journal of Machine Learning Research 6, 937–965 (2005)MathSciNetGoogle Scholar
  8. 8.
    Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: CVPR (2005)Google Scholar
  9. 9.
    Globerson, A., Roweis, S.: Metric learning by collapsing classes. In: NIPS (2006)Google Scholar
  10. 10.
    Davis, J., Kulis, B., Jain, P., Sra, S., Dhillon, I.: Information-theoretic metric learning. In: ICML (2007)Google Scholar
  11. 11.
    Wang, J., Markert, K., Everingham, M.: Learning models for object recognition from natural language descriptions. In: BMVC (2009)Google Scholar
  12. 12.
    Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: CVPR (2008)Google Scholar
  13. 13.
    Wang, F., Chen, S., Zhang, C., Li, T.: Semi-supervised metric learning by maximizing constraint margin. In: Conference on Information and Knowledge Management (2008)Google Scholar
  14. 14.
    Yang, J., Yan, R., Hauptmann, A.: Multiple instance learning for labeling faces in broadcasting news video. In: ACM Multimedia (2005)Google Scholar
  15. 15.
    Zhou, Z., Zhang, M.: Multi-instance multi-label learning with application to scene classification. In: NIPS (2007)Google Scholar
  16. 16.
    Dietterich, T., Lathrop, R., Lozano-Perez, T., Pharmaceutical, A.: Solving the multiple-instance problem with axis-parallel rectangles. Artificial Intelligence 89, 31–71 (1997)zbMATHCrossRefGoogle Scholar
  17. 17.
    Jin, R., Wang, S., Zhou, Z.H.: Learning a distance metric from multi-instance multi-label data. In: CVPR (2009)Google Scholar
  18. 18.
    Satoh, S., Kanade, T.: Name-It: Association of face and name in video. In: CVPR (1997)Google Scholar
  19. 19.
    Berg, T., Berg, A., Edwards, J., Maire, M., White, R., Teh, Y., Learned-Miller, E., Forsyth, D.: Names and faces in the news. In: CVPR (2004)Google Scholar
  20. 20.
    Everingham, M., Sivic, J., Zisserman, A.: ‘Hello! My name is.. Buffy’ - Automatic naming of characters in TV video. In: BMVC (2006)Google Scholar
  21. 21.
    Holub, A., Moreels, P., Perona, P.: Unsupervised clustering for Google searches of celebrity images. In: IEEE Conference on Face and Gesture Recognition (2008)Google Scholar
  22. 22.
    Pham, P., Moens, M.F., Tuytelaars, T.: Linking names and faces: Seeing the problem in different ways. In: Proceedings of ECCV Workshop on Faces in Real-Life Images (2008)Google Scholar
  23. 23.
    Bertsekas, D.: On the Goldstein-Levitin-Polyak gradient projection method. IEEE Transactions on Automatic Control 21, 174–184 (1976)zbMATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    Guillaumin, M., Mensink, T., Verbeek, J., Schmid, C.: Automatic face naming with caption-based supervision. In: CVPR (2008)Google Scholar
  25. 25.
    Huang, G., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst (2007)Google Scholar
  26. 26.
    Deschacht, K., Moens, M.: Efficient hierarchical entity classification using conditional random fields. In: Proceedings of Workshop on Ontology Learning and Population (2006)Google Scholar
  27. 27.
    Ozkan, D., Duygulu, P.: A graph based approach for naming faces in news photos. In: CVPR, pp.1477–1482 (2006)Google Scholar
  28. 28.
    Mensink, T., Verbeek, J.: Improving people search using query expansions: How friends help to find people. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 86–99. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  29. 29.
    Huang, G., Jain, V., Learned-Miller, E.: Unsupervised joint alignment of complex images. In: ICCV (2007)Google Scholar
  30. 30.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Matthieu Guillaumin
    • 1
  • Jakob Verbeek
    • 1
  • Cordelia Schmid
    • 1
  1. 1.Laboratoire Jean KuntzmannLEAR, INRIAGrenoble

Personalised recommendations