Learning based on Similarity

Ionescu, Radu Tudor; Popescu, Marius

doi:10.1007/978-3-319-30367-3_2

Radu Tudor Ionescu⁴ &
Marius Popescu⁴

Part of the book series: Advances in Computer Vision and Pattern Recognition ((ACVPR))

1680 Accesses
1 Citations

Abstract

This chapter describes all the machine learning methods that are employed in this book to obtain results for different applications of computer vision and string processing . The chapter gives an overview of the main concepts of learning based on similarity. Specific machine learning methods that are based on these concepts are then presented. First, nearest neighbor models are discussed. A nonstandard learning formulation based on the notions of similarity and nearest neighbors , known as local learning , is then presented. An overview of kernel methods is also given, since the state-of-the-art methods consistently used in the supervised learning tasks presented throughout this book are kernel methods . This chapter ends with a discussion about cluster analysis . Clustering techniques are used throughout this book in various contexts, from building vocabularies of visual words to phylogenetic analysis .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Achtert E, Bohm C, Kriegel HP, Kroger P, Muller-Gorman I, Zimek A (2006) Detection and visualization of subspace cluster hierarchies. In: Proceedings of DASFAA, pp 152–163
Google Scholar
Achtert E, Bohm C, Kriegel HP, Kroger P, Muller-Gorman I, Zimek A (2007) Finding hierarchies of subspace clusters. In: Proceedings of PKDD, pp 446–453
Google Scholar
Agrawal R, Gehrke J, Gunopulos D, Raghavan P (1998) Automatic subspace clustering of high dimensional data for data mining applications. SIGMOD Rec 27(2):94–105
Google Scholar
Altschul S, Gish W, Miller W, Myers E, Lipman D (1990) Basic local alignment search tool. J Mol Biol 215(3):403–410
Article Google Scholar
Belongie S, Malik J, Puzicha J (2002) Shape matching and object recognition using shape contexts. IEEE Trans Pattern Anal Mach Intell 24(4):509–522
Article Google Scholar
Bohm C, Kailing K, Kroger P, Zimek A (2004) Computing clusters of correlation connected objects. In: Proceedings of the 2004 ACM SIGMOD, pp 455–466
Google Scholar
Borg I, Groenen PJF (2005) Modern multidimensional scaling: theory and applications. Springer, Berlin
Google Scholar
Bottou L, Vapnik V (1992) Local learning algorithms. Neural Comput 4:888–900
Google Scholar
Cazzanti L, Gupta MR, Koppal AJ (2008) Generative models for similarity-based classification. Pattern Recognit 41(7):2289–2297
Google Scholar
Cazzanti L, Gupta MR (2007) Local similarity discriminant analysis. In: Proceedings of ICML, pp 137–144
Google Scholar
Chen Y, Garcia EK, Gupta MR, Rahimi A, Luca C (2009) Similarity-based classification: concepts and algorithms. J Mach Learn Res 10:747–776
Google Scholar
Chimani M, Woste M, Bocker S (2011) A closer look at the closest string and closest substring problem. In: Proceedings of ALENEX, pp 13–24
Google Scholar
Cortes C, Mohri M, Rostamizadeh A (2013) Multi-class classification with maximum margin multiple kernel. J Mach Learn Res 28(3):46–54
Google Scholar
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Google Scholar
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Article MATH Google Scholar
Cristianini N, Shawe-Taylor J, Elisseeff A, Kandola JS (2001) On kernel-target alignment. In: Proceedings of NIPS, pp 367–373
Google Scholar
Dasarathy BV (1991) Nearest neighbor (NN) norms: pattern classification techniques. IEEE Computer Society Press, Los Alamitos
Google Scholar
Devroye L, Györfi L, Lugosi GA (1996) Probabilistic theory of pattern recognition. Springer, New York
Google Scholar
Dinu LP, Ionescu RT (2012a) Clustering based on rank distance with applications on DNA. In: Proceedings of ICONIP, vol 7667, pp 722–729
Google Scholar
Dinu LP, Ionescu RT (2012b) An efficient rank based approach for closest string and closest substring. PLoS ONE 7(6):e37576
Google Scholar
Faragó A, Linder T, Lugosi G (1993) Fast nearest-neighbor search in dissimilarity spaces. IEEE Trans Pattern Anal Mach Intell 15(9):957–962
Google Scholar
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugenics 7(7):179–188
Article Google Scholar
Fix E, Hodges J (1951) Discriminatory analysis, non-parametric discrimination: consistency properties. Technical report, USAF School of Aviation and Medicine, Randolph Field, TX, 1951. Technical Report 4
Google Scholar
Gonen M, Alpaydin E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268
Google Scholar
Graepel T, Herbrich R, Bollmann-Sdorra P, Obermayer K (1998) Classification on pairwise proximity data. In: Proceedings of NIPS, pp 438–444
Google Scholar
Graepel T, Herbrich R, Scholkopf B, Smola A, Bartlett P, Muller K, Obermayer K, Williamson R (1999) Classification on proximity data with LP-machines. In: Proceedings of ICANN, vol 1, pp 304–309
Google Scholar
Grauman K, Darrell T (2005) The pyramid match kernel: discriminative classification with sets of image features. In: Proceedings of ICCV, vol 2, pp 1458–1465
Google Scholar
Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann Publishers Inc., San Francisco
Google Scholar
Hastie T, Tibshirani R (2003) The elements of statistical learning. Springer, New York. ISBN 0387952845
Google Scholar
Huang Z (1998) Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining Knowl Discov 2(3):283–304
Article Google Scholar
Ionescu RT, Popescu M (2013) Kernels for visual words histograms. In: Proceedings of ICIAP, vol 8156, pp 81–90
Google Scholar
Ionescu RT, Popescu M, Cahill A (2014) Can characters reveal your native language? A language-independent approach to native language identification. In: Proceedings of EMNLP, pp 1363–1373
Google Scholar
Kailing K, Kriegel HP, Kroger P (2004) Density-connected subspace clustering for high-dimensional data. In: Proceedings of SDM
Google Scholar
Kriegel HP, Kroger P, Zimek A (2009) Clustering high-dimensional data: a survey on subspace clustering, pattern-based clustering, and correlation clustering. ACM Trans Knowl Discov Data 3(1):1:1–1:58
Google Scholar
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of CVPR, vol 2, pp 2169–2178
Google Scholar
Liao L, Noble WS (2003) Combining pairwise sequence similarity and support vector machines for detecting remote protein evolutionary and structural relationships. J Comput Biol 10(6):857–868
Google Scholar
Lipman DJ, Pearson WR (1985) Rapid and sensitive protein similarity searches. Science 227:1435–1441
Article Google Scholar
Lodhi H, Saunders C, Shawe-Taylor J, Cristianini N, Watkins CJCH (2002) Text classification using string kernels. J Mach Learn Res 2:419–444
Google Scholar
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, New York
Google Scholar
McCallum A, Nigam K, Ungar LH (2000) Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of ACM SIGKDD, pp 169–178
Google Scholar
Nadaraya EA (1964) On estimating regression. Theory Probab Appl 9:141–142
Article MATH Google Scholar
Ng RT, Jiawei H (2002) CLARANS: a method for clustering objects for spatial data mining. IEEE Trans Knowl Data Eng 14(5):1003–1016
Google Scholar
Pekalska E, Duin RPW (2002) Dissimilarity representations allow for building good classifiers. Pattern Recognit Lett 23(8):943–956
Google Scholar
Popescu M, Dinu LP (2007) Kernel methods and string kernels for authorship identification: the federalist papers case. In: Proceedings of RANLP
Google Scholar
Popescu M, Grozea C (2012) Kernel methods and string kernels for authorship analysis. CLEF (Online Working Notes/Labs/Workshop)
Google Scholar
Popescu M, Ionescu RT (2013) The story of the characters, the DNA and the native language. In: Proceedings of the Eighth Workshop on Innovative Use of NLP for Building Educational Applications, pp 270–278
Google Scholar
Popov YV (2007) Multiple genome rearrangement by swaps and by element duplications. Theoret Comput Sci 385(1–3):115–126
Article MathSciNet MATH Google Scholar
Rubner Y, Tomasi C, Guibas LJ (2000) The Earth Mover’s distance as a metric for image retrieval. Int J Comput Vis 40(2):99–121
Google Scholar
Sanderson C, Guenter S (2006) Short text authorship attribution via sequence kernels, Markov chains and author unmasking: An investigation. In: Proceedings of EMNLP, pp 482–491
Google Scholar
Shapira D, Storer JA (2003) Large edit distance with multiple block operations. In: Proceedings of SPIRE, vol 2857, pp 369–377
Google Scholar
Shawe-Taylor J, Cristianini N (2004) Kernel methods for pattern analysis. Cambridge University Press, Cambridge
Google Scholar
Simard P, LeCun Y, Denker JS, Victorri B (1996) Transformation invariance in pattern recognition, tangent distance and tangent propagation. Neural Networks: Tricks of the Trade
Google Scholar
Vapnik V (2006) Estimation of dependencies based on empirical data (Information Science and Statistics), 2nd edn. Springer, New York
Google Scholar
Vedaldi A, Zisserman A (2010) Efficient additive kernels via explicit feature maps. In: Proceedings of CVPR, pp 3539–3546
Google Scholar
Vezzi F, Fabbro CD, Tomescu AI, Policriti A (2012) rNA: a fast and accurate short reads numerical aligner. Bioinformatics 28(1):123–124
Google Scholar
Vladimir V, Chervonenkis A (1971) On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab Appl 16(2):264–280
Google Scholar
Zhang T, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large databases. SIGMOD Rec 25(2):103–114. ISSN 0163–5808
Google Scholar
Zhang B, Srihari SN (2004) Fast k-nearest neighbor classification using cluster-based trees. IEEE Trans Pattern Anal Mach Intell 26(4):525–528, 2004
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Bucharest, Bucharest, Romania
Radu Tudor Ionescu & Marius Popescu

Authors

Radu Tudor Ionescu
View author publications
You can also search for this author in PubMed Google Scholar
Marius Popescu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Radu Tudor Ionescu .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ionescu, R.T., Popescu, M. (2016). Learning based on Similarity. In: Knowledge Transfer between Computer Vision and Text Mining. Advances in Computer Vision and Pattern Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-30367-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-30367-3_2
Published: 26 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30365-9
Online ISBN: 978-3-319-30367-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics