Unsupervised Visual Learning: From Pixels to Seeing

Leordeanu, Marius

doi:10.1007/978-3-030-42128-1_1

Marius Leordeanu¹²

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

1950 Accesses

Abstract

This book is about unsupervised learning. That is one of the most challenging puzzles that we must solve and put together, piece by piece, in order to decode the secrets of intelligence. Here, we move closer to that goal by connecting classical computational models to newer deep learning ones, and then build based on some fundamental and intuitive unsupervised learning principles. We want to reduce the unsupervised learning problem to a set of essential ideas and then develop the computational tools needed to implement them in the real world. Eventually, we aim to imagine a universal unsupervised learning machine, the Visual Story Network. The book is written for young students as well as experienced researchers, engineers, and professors. It presents computational models and optimization algorithms in sufficient technical detail, while also creating and maintaining a big intuitive picture about the main subject. Different tasks, such as graph matching and clustering, feature selection, classifier learning, unsupervised object discovery and segmentation in video, teacher-student learning over multiple generations as well as recursive graph neural networks are brought together, chapter by chapter, under the same umbrella of unsupervised learning. In the current chapter, we introduce the reader to the overall story of the book, which presents a unified image of the different topics that will be presented in detail in the chapters to follow. Besides sharing that main goal of learning without human supervision, the problems and tasks presented in the book also share common computational graph models and optimization methods, such as spectral graph matching, spectral clustering, and the integer projected fixed point method. By bringing together similar mathematical formulations across different tasks, all guided by common intuitive principles towards a universal unsupervised learning system, the book invites the reader to absorb and then participate in the creation of the next generation of artificial intelligence.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Code available at: https://sites.google.com/site/multipleframesmatching/.

References

Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications, vol 20. Siam
Google Scholar
Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137
Article MathSciNet Google Scholar
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J Roy Stat Soc: Ser B (Methodol) 39(1):1–22
MathSciNet MATH Google Scholar
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. Pattern Anal Mac Intell 24(5):
Google Scholar
Fukunaga K, Hostetler L (1975) The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inf Theory 21(1):32–40
Article MathSciNet Google Scholar
Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, vol 96, pp 226–231
Google Scholar
Day WH, Edelsbrunner H (1984) Efficient algorithms for agglomerative hierarchical clustering methods. J Classif 1(1):7–24
Article Google Scholar
Ward JH Jr (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244
Article MathSciNet Google Scholar
Sibson R (1973) Slink: an optimally efficient algorithm for the single-link cluster method. Comput J 16(1):30–34
Article MathSciNet Google Scholar
Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254
Article Google Scholar
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol 344. Wiley
Google Scholar
Cheeger J (1969) A lower bound for the smallest eigenvalue of the laplacian. In: Proceedings of the Princeton conference in honor of Professor S. Bochner, pp 195–199
Google Scholar
Donath WE, Hoffman AJ (1972) Algorithms for partitioning of graphs and computer logic based on eigenvectors of connection matrices. IBM Tech Discl Bull 15(3):938–944
Google Scholar
Meila M, Shi J (2001) Learning segmentation by random walks. In: Advances in neural information processing systems, pp 873–879
Google Scholar
Shi J, Malik J (2000) Normalized cuts and image segmentation. PAMI 22(8)
Google Scholar
Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: NIPS
Google Scholar
Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley
Google Scholar
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: CVPR
Google Scholar
Leordeanu M, Collins R, Hebert M (2005) Unsupervised learning of object features from video sequences. In: IEEE computer society conference on computer vision and pattern recognition, IEEE computer society; 1999, vol 1, p 1142
Google Scholar
Kwak S, Cho M, Laptev I, Ponce J, Schmid C (2015) Unsupervised object discovery and tracking in video collections. In: Proceedings of the IEEE international conference on computer vision, pp 3173–3181
Google Scholar
Liu D, Chen T (2007) A topic-motion model for unsupervised video object discovery. In: CVPR
Google Scholar
Wang L, Hua G, Sukthankar R, Xue J, Niu Z, Zheng N (2016) Video object discovery and co-segmentation with extremely weak supervision. IEEE transactions on pattern analysis and machine intelligence
Google Scholar
Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M, Sorkine-Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: Computer vision and pattern recognition
Google Scholar
Lao D, Sundaramoorthi G (2018) Extending layered models to 3d motion. In: Proceedings of the European conference on computer vision (ECCV), pp 435–451
Google Scholar
Papazoglou A, Ferrari V (2013) Fast object segmentation in unconstrained video. In: Proceedings of the IEEE international conference on computer vision, pp 1777–1784
Google Scholar
Keuper M, Andres B, Brox T (2015) Motion trajectory segmentation via minimum cost multicuts. In: Proceedings of the IEEE international conference on computer vision, pp 3271–3279
Google Scholar
Faktor A, Irani M (2014) Video segmentation by non-local consensus voting. In: BMVC, vol 2, p 8
Google Scholar
Haller E, Leordeanu M (2017) Unsupervised object segmentation in video by efficient selection of highly probable positive features. In: Proceedings of the IEEE international conference on computer vision, pp 5085–5093
Google Scholar
Luiten J, Voigtlaender P, Leibe B (2018) Premvos: proposal-generation, refinement and merging for the davis challenge on video object segmentation 2018. In: The 2018 DAVIS challenge on video object segmentation-CVPR workshops
Google Scholar
Maninis KK, Caelles S, Chen Y, Pont-Tuset J, Leal-Taixé L, Cremers D, Van Gool L (2017) Video object segmentation without temporal information. arXiv preprint arXiv:170906031
Voigtlaender P, Leibe B (2017) Online adaptation of convolutional neural networks for the 2017 davis challenge on video object segmentation. In: The 2017 DAVIS challenge on video object segmentation-CVPR workshops, vol 5
Google Scholar
Bao L, Wu B, Liu W (2018) Cnn in mrf: video object segmentation via inference in a cnn-based higher-order spatio-temporal mrf. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5977–5986
Google Scholar
Wug Oh S, Lee JY, Sunkavalli K, Joo Kim S (2018) Fast video object segmentation by reference-guided mask propagation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7376–7385
Google Scholar
Cheng J, Tsai YH, Hung WC, Wang S, Yang MH (2018) Fast and accurate online video object segmentation via tracking parts. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7415–7424
Google Scholar
Caelles S, Maninis KK, Pont-Tuset J, Leal-Taixé L, Cremers D, Van Gool L (2017) One-shot video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 221–230
Google Scholar
Perazzi F, Khoreva A, Benenson R, Schiele B, Sorkine-Hornung A (2017) Learning video object segmentation from static images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2663–2672
Google Scholar
Chen Y, Pont-Tuset J, Montes A, Van Gool L (2018) Blazingly fast video object segmentation with pixel-wise metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1189–1198
Google Scholar
Song H, Wang W, Zhao S, Shen J, Lam KM (2018) Pyramid dilated deeper convlstm for video salient object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 715–731
Google Scholar
Tokmakov P, Alahari K, Schmid C (2017) Learning video object segmentation with visual memory. arXiv preprint arXiv:170405737
Jain SD, Xiong B, Grauman K (2017) Fusionseg: learning to combine motion and appearance for fully automatic segmention of generic objects in videos. arXiv preprint arXiv:170105384 2(3):6
Yang Z, Wang Q, Bertinetto L, Hu W, Bai S, Torr PH (2019) Anchor diffusion for unsupervised video object segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 931–940
Google Scholar
Wang W, Song H, Zhao S, Shen J, Zhao S, Hoi SC, Ling H (2019) Learning unsupervised video object segmentation through visual attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3064–3074
Google Scholar
Kulkarni TD, Gupta A, Ionescu C, Borgeaud S, Reynolds M, Zisserman A, Mnih V (2019) Unsupervised learning of object keypoints for perception and control. In: Advances in neural information processing systems, pp 10,723–10,733
Google Scholar
Minderer M, Sun C, Villegas R, Cole F, Murphy K, Lee H (2019) Unsupervised learning of object structure and dynamics from videos. NeurlPS
Google Scholar
Thewlis J, Bilen H, Vedaldi A (2017) Unsupervised learning of object landmarks by factorized spatial embeddings. In: Proceedings of the IEEE international conference on computer vision, pp 5916–5925
Google Scholar
Roufosse JM, Sharma A, Ovsjanikov M (2019) Unsupervised deep learning for structured shape matching. In: Proceedings of the IEEE international conference on computer vision, pp 1617–1627
Google Scholar
Leordeanu M, Sukthankar R, Hebert M (2009) Unsupervised learning for graph matching. IJCV 96(1)
Google Scholar
Halimi O, Litany O, Rodola E, Bronstein AM, Kimmel R (2019) Unsupervised learning of dense shape correspondence. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Google Scholar
Vo HV, Bach F, Cho M, Han K, LeCun Y, Perez P, Ponce J (2019) Unsupervised image matching and object discovery as optimization. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Google Scholar
Pei Y, Huang F, Shi F, Zha H (2011) Unsupervised image matching based on manifold alignment. IEEE Trans Pattern Anal Mach Intell 34(8):1658–1664
Google Scholar
Leordeanu M, Zanfir A, Sminchisescu C (2011) Semi-supervised learning and optimization for hypergraph matching. In: ICCV
Google Scholar
Rezende DJ, Eslami SA, Mohamed S, Battaglia P, Jaderberg M, Heess N (2016) Unsupervised learning of 3d structure from images. In: Advances in neural information processing systems, pp 4996–5004
Google Scholar
Cha G, Lee M, Oh S (2019) Unsupervised 3d reconstruction networks. In: International conference on computer vision
Google Scholar
Nunes UM, Demiris Y (2019) Online unsupervised learning of the 3d kinematic structure of arbitrary rigid bodies. In: Proceedings of the IEEE international conference on computer vision, pp 3809–3817
Google Scholar
Chen Y, Schmid C, Sminchisescu C (2019) Self-supervised learning with geometric constraints in monocular video: connecting flow, depth, and camera. In: Proceedings of the IEEE international conference on computer vision, pp 7063–7072
Google Scholar
Godard C, Mac Aodha O, Firman M, Brostow GJ (2019) Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE international conference on computer vision, pp 3828–3838
Google Scholar
Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1851–1858
Google Scholar
Ranjan A, Jampani V, Balles L, Kim K, Sun D, Wulff J, Black MJ (2019) Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 12,240–12,249
Google Scholar
Bian J, Li Z, Wang N, Zhan H, Shen C, Cheng MM, Reid I (2019) Unsupervised scale-consistent depth and ego-motion learning from monocular video. In: Advances in neural information processing systems, pp 35–45
Google Scholar
Gordon A, Li H, Jonschkowski R, Angelova A (2019) Depth from videos in the wild: unsupervised monocular depth learning from unknown cameras. arXiv preprint arXiv:190404998
Yang Z, Wang P, Wang Y, Xu W, Nevatia R (2018) Lego: learning edge with geometry all at once by watching videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 225–234
Google Scholar
Yang Z, Wang P, Xu W, Zhao L, Nevatia R (2018) Unsupervised learning of geometry from videos with edge-aware depth-normal consistency. In: Thirty-Second AAAI conference on artificial intelligence
Google Scholar
de Sa VR (1994) Unsupervised classification learning from cross-modal environmental structure. PhD thesis, University of Rochester
Google Scholar
Hu D, Nie F, Li X (2019) Deep multimodal clustering for unsupervised audiovisual learning. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Google Scholar
Li Y, Zhu JY, Tedrake R, Torralba A (2019) Connecting touch and vision via cross-modal prediction. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Google Scholar
Zhang R, Isola P, Efros AA (2017) Split-brain autoencoders: unsupervised learning by cross-channel prediction. In: CVPR, vol 1, p 5
Google Scholar
Pan JY, Yang HJ, Faloutsos C, Duygulu P (2004) Automatic multimedia cross-modal correlation discovery. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 653–658
Google Scholar
He L, Xu X, Lu H, Yang Y, Shen F, Shen HT (2017) Unsupervised cross-modal retrieval through adversarial learning. In: 2017 IEEE International conference on multimedia and expo (ICME), IEEE, pp 1153–1158
Google Scholar
Zhao H, Gan C, Rouditchenko A, Vondrick C, McDermott J, Torralba A (2018) The sound of pixels. In: Proceedings of the European conference on computer vision (ECCV), pp 570–586
Google Scholar
Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 41–48
Google Scholar
Koffka K (2013) Principles of Gestalt psychology. Routledge
Google Scholar
Rock I, Palmer S (1990) Gestalt psychology. Sci Am 263:84–90
Article Google Scholar
Stretcu O, Leordeanu M (2015) Multiple frames matching for object discovery in video. In: BMVC, pp 186–1
Google Scholar
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241
Google Scholar
Leordeanu M, Hebert M (2005) A spectral technique for correspondence problems using pairwise constraints. In: ICCV
Google Scholar
Leordeanu M, Hebert M, Sukthankar R (2009) An integer projected fixed point method for graph matching and map inference. In: NIPS
Google Scholar
Brendel W, Todorovic S (2010) Segmentation as maximum-weight independent set. In: NIPS
Google Scholar
Jain A, Gupta A, Rodriguez M, Davis L (2013) Representing videos using mid-level discriminative patches. In: Computer vision and pattern recognition, pp 2571–2578
Google Scholar
Semenovich D (2010) Tensor power method for efficient map inference in higher-order mrfs. In: ICPR
Google Scholar
Monroy A, Bell P, Ommer B (2014) Morphological analysis for investigating artistic images. Image Visi Comput 32(6)
Google Scholar
Leordeanu M, Sminchisescu C (2012) Efficient hypergraph clustering. In: International conference on artificial intelligence and statistics
Google Scholar
Leordeanu M, Radu A, Baluja S, Sukthankar R (2015) Labeling the features not the samples: efficient video classification with minimal supervision. arXiv preprint arXiv:151200517
Haller E, Leordeanu M (2017) Unsupervised object segmentation in video by efficient selection of highly probable positive features. In: The IEEE international conference on computer vision (ICCV)
Google Scholar
Haller E, Florea AM, Leordeanu M (2019) Spacetime graph optimization for video object segmentation. arXiv preprint arXiv:190703326
Besag J (1986) On the statistical analysis of dirty pictures. J Roy Stat Soc 48(5):259–302
Google Scholar
Frank M, Wolfe P (1956) An algorithm for quadratic programming. Naval Res Logistics Q 3(1–2):95–110
Article MathSciNet Google Scholar
Magnus JR, Neudecker H (1999) Matrix differential calculus with applications in statistics and econometrics. Wiley
Google Scholar
Cour T, Shi J, Gogin N (2005) Learning spectral graph segmentation. In: International conference on artificial intelligence and statistics
Google Scholar
Ding C, Li T, Jordan M (2008) Nonnegative matrix factorization of combinatorial optimization: spectral clustering, graph matching, and clique finding. In: IEEE international conference on data mining
Google Scholar
Motzkin T, Straus E (1965) Maxima for graphs and a new proof of a theorem of turan. Canad J Math
Google Scholar
Bulo S, Pellilo M (2009) A game-theoretic approach to hypergraph clustering. In: NIPS
Google Scholar
Liu H, Latecki L, Yan S (2010) Robust clustering as ensembles of affinity relations. In: NIPS
Google Scholar
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM multimedia
Google Scholar
Prest A, Leistner C, Civera J, Schmid C, Ferrari V (2012) Learning object class detectors from weakly annotated video. In: CVPR
Google Scholar
Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. IEEE Trans Pattern Anal Mach Intell 34(11):2189–2202
Article Google Scholar
Meila M, Shi J (2001) A random walks view of spectral segmentation. In: AISTATS
Google Scholar
Leordeanu M, Sukthankar R, Hebert M (2012) Unsupervised learning for graph matching. Int J Comput Vis 96:28–45
Article MathSciNet Google Scholar
Croitoru I, Bogolin SV, Leordeanu M (2017) Unsupervised learning from video to detect foreground objects in single images. In: 2017 IEEE international conference on computer vision (ICCV), IEEE, pp 4345–4353
Google Scholar
Croitoru I, Bogolin SV, Leordeanu M (2019) Unsupervised learning of foreground object segmentation. Int J Comput Vis:1–24
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Science and Engineering Department, Polytechnic University of Bucharest, Bucharest, Romania
Marius Leordeanu

Authors

Marius Leordeanu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marius Leordeanu .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Leordeanu, M. (2020). Unsupervised Visual Learning: From Pixels to Seeing. In: Unsupervised Learning in Space and Time. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-030-42128-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-42128-1_1
Published: 18 April 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-42127-4
Online ISBN: 978-3-030-42128-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics