Abstract
Structured-output learning is a challenging problem; particularly so because of the difficulty in obtaining large datasets of fully labelled instances for training. In this paper we try to overcome this difficulty by presenting a multi-utility learning framework for structured prediction that can learn from training instances with different forms of supervision. We propose a unified technique for inferring the loss functions most suitable for quantifying the consistency of solutions with the given weak annotation. We demonstrate the effectiveness of our framework on the challenging semantic image segmentation problem for which a wide variety of annotations can be used. For instance, the popular training datasets for semantic segmentation are composed of images with hard-to-generate full pixel labellings, as well as images with easy-to-obtain weak annotations, such as bounding boxes around objects, or image-level labels that specify which object categories are present in an image. Experimental evaluation shows that the use of annotation-specific loss functions dramatically improves segmentation accuracy compared to the baseline system where only one type of weak annotation is used.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Boykov, Y., Veksler, O., Zabih, R.: Fast approximate energy minimization via graph cuts. PAMI 23(11), 1222–1239 (2001)
Delong, A., Osokin, A., Isack, H.N., Boykov, Y.: Fast Approximate Energy Minimization with Label Costs. IJCV 96(1), 1–27 (2012)
Heitz, G., Koller, D.: Learning spatial context: Using stuff to find things. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 30–43. Springer, Heidelberg (2008)
Joachims, T., Finley, T., Yu, C.: Cutting-plane training of structural SVMs. Machine Learning 77(1), 27–59 (2009)
Kumar, M.P., Turki, H., Preston, D., Koller, D.: Learning specific-class segmentation from diverse data. In: ICCV, pp. 1800–1807 (November 2011)
Ladický, Ľ., Sturgess, P., Alahari, K., Russell, C., Torr, P.H.S.: What, Where and How Many? Combining Object Detectors and CRFs. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 424–437. Springer, Heidelberg (2010)
Lempitsky, V., Kohli, P., Rother, C., Sharp, T.: Image segmentation with a bounding box prior. In: ICCV, pp. 277–284 (September 2009)
Liu, K., Raghavan, S., Nelesen, S., Linder, C.R., Warnow, T.: Rapid and accurate large-scale coestimation of sequence alignments and phylogenetic trees. Science (New York, N.Y.) 324(5934), 1561–1564 (2009)
Pletscher, P., Kohli, P.: Learning low-order models for enforcing high-order statistics. In: AISTATS (2012)
Quattoni, A., Wang, S., Morency, L.P., Collins, M., Darrell, T.: Hidden conditional random fields. PAMI 29(10), 1848–1853 (2007)
Schwing, A.G., Hazan, T., Pollefeys, M., Urtasun, R.: Efficient Structured Prediction with Latent Variables for General Graphical Models. In: ICML (2012)
Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: CVPR (June 2008)
Shotton, J., Winn, J.M., Rother, C., Criminisi, A.: textonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006, Part I. LNCS, vol. 3951, pp. 1–15. Springer, Heidelberg (2006)
Tarlow, D., Zemel, R.S.: Structured Output Learning with High Order Loss Functions. In: AISTATS (2012)
Taskar, B., Chatalbashev, V., Koller, D.: Learning associative Markov networks. In: ICML. pp. 102–109, Banff, Alberta, Canada (2004)
Tighe, J., Lazebnik, S.: SuperParsing: Scalable Nonparametric Image Parsing with Superpixels. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 352–365. Springer, Heidelberg (2010)
Tighe, J., Lazebnik, S.: Finding Things: Image Parsing with Regions and Per-Exemplar Detectors. In: CVPR, pp. 3001–3008 (June 2013)
Torralba, A., Russel, B.C., Yuen, J.: LabelMe: Online Image Annotation and Applications. Proceedings of the IEEE 98(8), 1467–1484 (2010)
Tsochantaridis, I., Joachims, T., Hofmann, T., Altun, Y.: Large margin methods for structured and interdependent output variables. JMLR 6, 1453–1484 (2006)
Vezhnevets, A., Ferrari, V., Buhmann, J.M.: Weakly Supervised Semantic Segmentation with a Multi-Image Model. In: ICCV, Barcelona, ES (2011)
Vezhnevets, A., Ferrari, V., Buhmann, J.M.: Weakly Supervised Structured Output Learning for Semantic Segmentation. In: CVPR, Providence, RI (2012)
Yao, J., Fidler, S., Urtasun, R.: Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation. In: CVPR (June 2012)
Yu, C.N.J., Joachims, T.: Learning structural SVMs with latent variables. In: ICML, Montreal, Canada (2009)
Yuille, A., Rangarajan, A.: The concave-convex procedure (CCCP). In: NIPS (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Shapovalov, R., Vetrov, D., Osokin, A., Kohli, P. (2015). Multi-utility Learning: Structured-Output Learning with Multiple Annotation-Specific Loss Functions. In: Tai, XC., Bae, E., Chan, T.F., Lysaker, M. (eds) Energy Minimization Methods in Computer Vision and Pattern Recognition. EMMCVPR 2015. Lecture Notes in Computer Science, vol 8932. Springer, Cham. https://doi.org/10.1007/978-3-319-14612-6_30
Download citation
DOI: https://doi.org/10.1007/978-3-319-14612-6_30
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14611-9
Online ISBN: 978-3-319-14612-6
eBook Packages: Computer ScienceComputer Science (R0)