Abstract
This book is about unsupervised learning. That is one of the most challenging puzzles that we must solve and put together, piece by piece, in order to decode the secrets of intelligence. Here, we move closer to that goal by connecting classical computational models to newer deep learning ones, and then build based on some fundamental and intuitive unsupervised learning principles. We want to reduce the unsupervised learning problem to a set of essential ideas and then develop the computational tools needed to implement them in the real world. Eventually, we aim to imagine a universal unsupervised learning machine, the Visual Story Network. The book is written for young students as well as experienced researchers, engineers, and professors. It presents computational models and optimization algorithms in sufficient technical detail, while also creating and maintaining a big intuitive picture about the main subject. Different tasks, such as graph matching and clustering, feature selection, classifier learning, unsupervised object discovery and segmentation in video, teacher-student learning over multiple generations as well as recursive graph neural networks are brought together, chapter by chapter, under the same umbrella of unsupervised learning. In the current chapter, we introduce the reader to the overall story of the book, which presents a unified image of the different topics that will be presented in detail in the chapters to follow. Besides sharing that main goal of learning without human supervision, the problems and tasks presented in the book also share common computational graph models and optimization methods, such as spectral graph matching, spectral clustering, and the integer projected fixed point method. By bringing together similar mathematical formulations across different tasks, all guided by common intuitive principles towards a universal unsupervised learning system, the book invites the reader to absorb and then participate in the creation of the next generation of artificial intelligence.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Code available at: https://sites.google.com/site/multipleframesmatching/.
References
Gan G, Ma C, Wu J (2007) Data clustering: theory, algorithms, and applications, vol 20. Siam
Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137
Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the em algorithm. J Roy Stat Soc: Ser B (Methodol) 39(1):1–22
Comaniciu D, Meer P (2002) Mean shift: a robust approach toward feature space analysis. Pattern Anal Mac Intell 24(5):
Fukunaga K, Hostetler L (1975) The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans Inf Theory 21(1):32–40
Ester M, Kriegel HP, Sander J, Xu X et al (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Kdd, vol 96, pp 226–231
Day WH, Edelsbrunner H (1984) Efficient algorithms for agglomerative hierarchical clustering methods. J Classif 1(1):7–24
Ward JH Jr (1963) Hierarchical grouping to optimize an objective function. J Am Stat Assoc 58(301):236–244
Sibson R (1973) Slink: an optimally efficient algorithm for the single-link cluster method. Comput J 16(1):30–34
Johnson SC (1967) Hierarchical clustering schemes. Psychometrika 32(3):241–254
Kaufman L, Rousseeuw PJ (2009) Finding groups in data: an introduction to cluster analysis, vol 344. Wiley
Cheeger J (1969) A lower bound for the smallest eigenvalue of the laplacian. In: Proceedings of the Princeton conference in honor of Professor S. Bochner, pp 195–199
Donath WE, Hoffman AJ (1972) Algorithms for partitioning of graphs and computer logic based on eigenvectors of connection matrices. IBM Tech Discl Bull 15(3):938–944
Meila M, Shi J (2001) Learning segmentation by random walks. In: Advances in neural information processing systems, pp 873–879
Shi J, Malik J (2000) Normalized cuts and image segmentation. PAMI 22(8)
Ng A, Jordan M, Weiss Y (2002) On spectral clustering: analysis and an algorithm. In: NIPS
Duda RO, Hart PE, Stork DG (2012) Pattern classification. Wiley
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: CVPR
Leordeanu M, Collins R, Hebert M (2005) Unsupervised learning of object features from video sequences. In: IEEE computer society conference on computer vision and pattern recognition, IEEE computer society; 1999, vol 1, p 1142
Kwak S, Cho M, Laptev I, Ponce J, Schmid C (2015) Unsupervised object discovery and tracking in video collections. In: Proceedings of the IEEE international conference on computer vision, pp 3173–3181
Liu D, Chen T (2007) A topic-motion model for unsupervised video object discovery. In: CVPR
Wang L, Hua G, Sukthankar R, Xue J, Niu Z, Zheng N (2016) Video object discovery and co-segmentation with extremely weak supervision. IEEE transactions on pattern analysis and machine intelligence
Perazzi F, Pont-Tuset J, McWilliams B, Van Gool L, Gross M, Sorkine-Hornung A (2016) A benchmark dataset and evaluation methodology for video object segmentation. In: Computer vision and pattern recognition
Lao D, Sundaramoorthi G (2018) Extending layered models to 3d motion. In: Proceedings of the European conference on computer vision (ECCV), pp 435–451
Papazoglou A, Ferrari V (2013) Fast object segmentation in unconstrained video. In: Proceedings of the IEEE international conference on computer vision, pp 1777–1784
Keuper M, Andres B, Brox T (2015) Motion trajectory segmentation via minimum cost multicuts. In: Proceedings of the IEEE international conference on computer vision, pp 3271–3279
Faktor A, Irani M (2014) Video segmentation by non-local consensus voting. In: BMVC, vol 2, p 8
Haller E, Leordeanu M (2017) Unsupervised object segmentation in video by efficient selection of highly probable positive features. In: Proceedings of the IEEE international conference on computer vision, pp 5085–5093
Luiten J, Voigtlaender P, Leibe B (2018) Premvos: proposal-generation, refinement and merging for the davis challenge on video object segmentation 2018. In: The 2018 DAVIS challenge on video object segmentation-CVPR workshops
Maninis KK, Caelles S, Chen Y, Pont-Tuset J, Leal-Taixé L, Cremers D, Van Gool L (2017) Video object segmentation without temporal information. arXiv preprint arXiv:170906031
Voigtlaender P, Leibe B (2017) Online adaptation of convolutional neural networks for the 2017 davis challenge on video object segmentation. In: The 2017 DAVIS challenge on video object segmentation-CVPR workshops, vol 5
Bao L, Wu B, Liu W (2018) Cnn in mrf: video object segmentation via inference in a cnn-based higher-order spatio-temporal mrf. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5977–5986
Wug Oh S, Lee JY, Sunkavalli K, Joo Kim S (2018) Fast video object segmentation by reference-guided mask propagation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7376–7385
Cheng J, Tsai YH, Hung WC, Wang S, Yang MH (2018) Fast and accurate online video object segmentation via tracking parts. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7415–7424
Caelles S, Maninis KK, Pont-Tuset J, Leal-Taixé L, Cremers D, Van Gool L (2017) One-shot video object segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 221–230
Perazzi F, Khoreva A, Benenson R, Schiele B, Sorkine-Hornung A (2017) Learning video object segmentation from static images. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2663–2672
Chen Y, Pont-Tuset J, Montes A, Van Gool L (2018) Blazingly fast video object segmentation with pixel-wise metric learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1189–1198
Song H, Wang W, Zhao S, Shen J, Lam KM (2018) Pyramid dilated deeper convlstm for video salient object detection. In: Proceedings of the European conference on computer vision (ECCV), pp 715–731
Tokmakov P, Alahari K, Schmid C (2017) Learning video object segmentation with visual memory. arXiv preprint arXiv:170405737
Jain SD, Xiong B, Grauman K (2017) Fusionseg: learning to combine motion and appearance for fully automatic segmention of generic objects in videos. arXiv preprint arXiv:170105384 2(3):6
Yang Z, Wang Q, Bertinetto L, Hu W, Bai S, Torr PH (2019) Anchor diffusion for unsupervised video object segmentation. In: Proceedings of the IEEE international conference on computer vision, pp 931–940
Wang W, Song H, Zhao S, Shen J, Zhao S, Hoi SC, Ling H (2019) Learning unsupervised video object segmentation through visual attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3064–3074
Kulkarni TD, Gupta A, Ionescu C, Borgeaud S, Reynolds M, Zisserman A, Mnih V (2019) Unsupervised learning of object keypoints for perception and control. In: Advances in neural information processing systems, pp 10,723–10,733
Minderer M, Sun C, Villegas R, Cole F, Murphy K, Lee H (2019) Unsupervised learning of object structure and dynamics from videos. NeurlPS
Thewlis J, Bilen H, Vedaldi A (2017) Unsupervised learning of object landmarks by factorized spatial embeddings. In: Proceedings of the IEEE international conference on computer vision, pp 5916–5925
Roufosse JM, Sharma A, Ovsjanikov M (2019) Unsupervised deep learning for structured shape matching. In: Proceedings of the IEEE international conference on computer vision, pp 1617–1627
Leordeanu M, Sukthankar R, Hebert M (2009) Unsupervised learning for graph matching. IJCV 96(1)
Halimi O, Litany O, Rodola E, Bronstein AM, Kimmel R (2019) Unsupervised learning of dense shape correspondence. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Vo HV, Bach F, Cho M, Han K, LeCun Y, Perez P, Ponce J (2019) Unsupervised image matching and object discovery as optimization. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Pei Y, Huang F, Shi F, Zha H (2011) Unsupervised image matching based on manifold alignment. IEEE Trans Pattern Anal Mach Intell 34(8):1658–1664
Leordeanu M, Zanfir A, Sminchisescu C (2011) Semi-supervised learning and optimization for hypergraph matching. In: ICCV
Rezende DJ, Eslami SA, Mohamed S, Battaglia P, Jaderberg M, Heess N (2016) Unsupervised learning of 3d structure from images. In: Advances in neural information processing systems, pp 4996–5004
Cha G, Lee M, Oh S (2019) Unsupervised 3d reconstruction networks. In: International conference on computer vision
Nunes UM, Demiris Y (2019) Online unsupervised learning of the 3d kinematic structure of arbitrary rigid bodies. In: Proceedings of the IEEE international conference on computer vision, pp 3809–3817
Chen Y, Schmid C, Sminchisescu C (2019) Self-supervised learning with geometric constraints in monocular video: connecting flow, depth, and camera. In: Proceedings of the IEEE international conference on computer vision, pp 7063–7072
Godard C, Mac Aodha O, Firman M, Brostow GJ (2019) Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE international conference on computer vision, pp 3828–3838
Zhou T, Brown M, Snavely N, Lowe DG (2017) Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1851–1858
Ranjan A, Jampani V, Balles L, Kim K, Sun D, Wulff J, Black MJ (2019) Competitive collaboration: joint unsupervised learning of depth, camera motion, optical flow and motion segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 12,240–12,249
Bian J, Li Z, Wang N, Zhan H, Shen C, Cheng MM, Reid I (2019) Unsupervised scale-consistent depth and ego-motion learning from monocular video. In: Advances in neural information processing systems, pp 35–45
Gordon A, Li H, Jonschkowski R, Angelova A (2019) Depth from videos in the wild: unsupervised monocular depth learning from unknown cameras. arXiv preprint arXiv:190404998
Yang Z, Wang P, Wang Y, Xu W, Nevatia R (2018) Lego: learning edge with geometry all at once by watching videos. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 225–234
Yang Z, Wang P, Xu W, Zhao L, Nevatia R (2018) Unsupervised learning of geometry from videos with edge-aware depth-normal consistency. In: Thirty-Second AAAI conference on artificial intelligence
de Sa VR (1994) Unsupervised classification learning from cross-modal environmental structure. PhD thesis, University of Rochester
Hu D, Nie F, Li X (2019) Deep multimodal clustering for unsupervised audiovisual learning. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Li Y, Zhu JY, Tedrake R, Torralba A (2019) Connecting touch and vision via cross-modal prediction. In: The IEEE conference on computer vision and pattern recognition (CVPR)
Zhang R, Isola P, Efros AA (2017) Split-brain autoencoders: unsupervised learning by cross-channel prediction. In: CVPR, vol 1, p 5
Pan JY, Yang HJ, Faloutsos C, Duygulu P (2004) Automatic multimedia cross-modal correlation discovery. In: Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 653–658
He L, Xu X, Lu H, Yang Y, Shen F, Shen HT (2017) Unsupervised cross-modal retrieval through adversarial learning. In: 2017 IEEE International conference on multimedia and expo (ICME), IEEE, pp 1153–1158
Zhao H, Gan C, Rouditchenko A, Vondrick C, McDermott J, Torralba A (2018) The sound of pixels. In: Proceedings of the European conference on computer vision (ECCV), pp 570–586
Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 41–48
Koffka K (2013) Principles of Gestalt psychology. Routledge
Rock I, Palmer S (1990) Gestalt psychology. Sci Am 263:84–90
Stretcu O, Leordeanu M (2015) Multiple frames matching for object discovery in video. In: BMVC, pp 186–1
Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: International conference on medical image computing and computer-assisted intervention. Springer, pp 234–241
Leordeanu M, Hebert M (2005) A spectral technique for correspondence problems using pairwise constraints. In: ICCV
Leordeanu M, Hebert M, Sukthankar R (2009) An integer projected fixed point method for graph matching and map inference. In: NIPS
Brendel W, Todorovic S (2010) Segmentation as maximum-weight independent set. In: NIPS
Jain A, Gupta A, Rodriguez M, Davis L (2013) Representing videos using mid-level discriminative patches. In: Computer vision and pattern recognition, pp 2571–2578
Semenovich D (2010) Tensor power method for efficient map inference in higher-order mrfs. In: ICPR
Monroy A, Bell P, Ommer B (2014) Morphological analysis for investigating artistic images. Image Visi Comput 32(6)
Leordeanu M, Sminchisescu C (2012) Efficient hypergraph clustering. In: International conference on artificial intelligence and statistics
Leordeanu M, Radu A, Baluja S, Sukthankar R (2015) Labeling the features not the samples: efficient video classification with minimal supervision. arXiv preprint arXiv:151200517
Haller E, Leordeanu M (2017) Unsupervised object segmentation in video by efficient selection of highly probable positive features. In: The IEEE international conference on computer vision (ICCV)
Haller E, Florea AM, Leordeanu M (2019) Spacetime graph optimization for video object segmentation. arXiv preprint arXiv:190703326
Besag J (1986) On the statistical analysis of dirty pictures. J Roy Stat Soc 48(5):259–302
Frank M, Wolfe P (1956) An algorithm for quadratic programming. Naval Res Logistics Q 3(1–2):95–110
Magnus JR, Neudecker H (1999) Matrix differential calculus with applications in statistics and econometrics. Wiley
Cour T, Shi J, Gogin N (2005) Learning spectral graph segmentation. In: International conference on artificial intelligence and statistics
Ding C, Li T, Jordan M (2008) Nonnegative matrix factorization of combinatorial optimization: spectral clustering, graph matching, and clique finding. In: IEEE international conference on data mining
Motzkin T, Straus E (1965) Maxima for graphs and a new proof of a theorem of turan. Canad J Math
Bulo S, Pellilo M (2009) A game-theoretic approach to hypergraph clustering. In: NIPS
Liu H, Latecki L, Yan S (2010) Robust clustering as ensembles of affinity relations. In: NIPS
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM multimedia
Prest A, Leistner C, Civera J, Schmid C, Ferrari V (2012) Learning object class detectors from weakly annotated video. In: CVPR
Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. IEEE Trans Pattern Anal Mach Intell 34(11):2189–2202
Meila M, Shi J (2001) A random walks view of spectral segmentation. In: AISTATS
Leordeanu M, Sukthankar R, Hebert M (2012) Unsupervised learning for graph matching. Int J Comput Vis 96:28–45
Croitoru I, Bogolin SV, Leordeanu M (2017) Unsupervised learning from video to detect foreground objects in single images. In: 2017 IEEE international conference on computer vision (ICCV), IEEE, pp 4345–4353
Croitoru I, Bogolin SV, Leordeanu M (2019) Unsupervised learning of foreground object segmentation. Int J Comput Vis:1–24
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Leordeanu, M. (2020). Unsupervised Visual Learning: From Pixels to Seeing. In: Unsupervised Learning in Space and Time. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-030-42128-1_1
Download citation
DOI: https://doi.org/10.1007/978-3-030-42128-1_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-42127-4
Online ISBN: 978-3-030-42128-1
eBook Packages: Computer ScienceComputer Science (R0)