Similarity Estimation Using Bayes Ensembles

Emrich, Tobias; Graf, Franz; Kriegel, Hans-Peter; Schubert, Matthias; Thoma, Marisa

doi:10.1007/978-3-642-13818-8_37

Tobias Emrich¹⁸,
Franz Graf¹⁸,
Hans-Peter Kriegel¹⁸,
Matthias Schubert¹⁸ &
…
Marisa Thoma¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6187))

Included in the following conference series:

International Conference on Scientific and Statistical Database Management

1835 Accesses

Abstract

Similarity search and data mining often rely on distance or similarity functions in order to provide meaningful results and semantically meaningful patterns. However, standard distance measures like L _p-norms are often not capable to accurately mirror the expected similarity between two objects. To bridge the so-called semantic gap between feature representation and object similarity, the distance function has to be adjusted to the current application context or user. In this paper, we propose a new probabilistic framework for estimating a similarity value based on a Bayesian setting. In our framework, distance comparisons are modeled based on distribution functions on the difference vectors. To combine these functions, a similarity score is computed by an Ensemble of weak Bayesian learners for each dimension in the feature space. To find independent dimensions of maximum meaning, we apply a space transformation based on eigenvalue decomposition. In our experiments, we demonstrate that our new method shows promising results compared to related Mahalanobis learners on several test data sets w.r.t. nearest-neighbor classification and precision-recall-graphs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning distance functions using equivalence relations. In: Proceedings of the 20th International Conference on Machine Learning (ICML), Washington, DC, USA, pp. 11–18 (2003)
Google Scholar
Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7), 711–720 (1997)
Article Google Scholar
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15(6), 1373–1396 (2003)
Article MATH Google Scholar
Chinga, G., Gregersen, O., Dougherty, B.: Paper surface characterisation by laser profilometry and image analysis. Journal of Microscopy and Analysis 84, 5–7 (2003)
Google Scholar
Comon, P.: Independent component analysis, a new concept? Signal Processing 36(3), 287–314 (1994)
Article MATH Google Scholar
Cox, T.F., Cox, M.A.A.: Multidimensional Scaling, 2nd edn. Chapman & Hall/CRC, Boca Raton (2001)
MATH Google Scholar
Davis, J., Kulis, B., Sra, S., Dhillon, I.: Information-theoretic metric learning. In: NIPS 2006 Workshop on Learning to Compare Examples (2007)
Google Scholar
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annals of Eugenics 7, 179–188 (1936)
Google Scholar
Goldberger, J., Roweis, S., Hinton, G., Salakhutdinov, R.: Neighborhood component analysis. In: Advances in Neural Information Processing Systems, pp. 513–520. MIT Press, Cambridge (2004)
Google Scholar
Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Addison-Wesley Longman Publishing Co., Inc., Boston (2001)
Google Scholar
Haralick, R.M., Shanmugam, K., Dinstein, I.: Textural features for image classification. IEEE Transactions on Speech and Audio Processing 3(6), 6103–6623 (1973)
Google Scholar
Jacobs, D.W., Weinshall, D., Gdalyahu, Y.: Classification with non-metric distances: Image retrieval and class representation. IEEE Trans. Pattern Analysis and Machine Intelligence 22(6), 583–600 (2000)
Article Google Scholar
Moghaddam, B., Pentland, A.: Probabilistic visual learning for object representation. IEEE Trans. Pattern Analysis and Machine Intelligence 19(7), 696–710 (1997)
Article Google Scholar
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998)
Google Scholar
Nilsback, M.E., Zisserman, A.: A visual vocabulary for flower classification. CVPR 2, 1447–1454 (2006)
Google Scholar
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Article Google Scholar
Santini, S., Jain, R.: Similarity measures. IEEE Trans. Pattern Analysis and Machine Intelligence 21, 871–883 (1999)
Article Google Scholar
Tan, X., Chen, S., Zhou, Z.H., Liu, J.: Learning non-metric partial similarity based on maximal margin criterion. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 138–145 (2006)
Google Scholar
Tenenbaum, J.B., Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Article Google Scholar
Tversky, A.: Features of similarity. Psychological Review 84(4), 327–352 (1977)
Article Google Scholar
Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: Advances in Neural Information Processing Systems. MIT Press, Cambridge (2006)
Google Scholar
Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research 10, 207–244 (2009)
Google Scholar
Yang, L.: An overview of distance metric learning. Technical report, Department of Computer Science and Engineering, Michigan State University (2007)
Google Scholar
Yang, L., Jin, R.: Distance metric learning: A comprehensive survey. Technical report, Department of Computer Science and Engineering, Michigan State University (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Ludwig-Maximilians-Universität München, Oettingenstr. 67, Munich, Germany
Tobias Emrich, Franz Graf, Hans-Peter Kriegel, Matthias Schubert & Marisa Thoma

Authors

Tobias Emrich
View author publications
You can also search for this author in PubMed Google Scholar
Franz Graf
View author publications
You can also search for this author in PubMed Google Scholar
Hans-Peter Kriegel
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Schubert
View author publications
You can also search for this author in PubMed Google Scholar
Marisa Thoma
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Computer Science, University of Heidelberg, 69120, Heidelberg, Germany
Michael Gertz
Dept. of Computer Science and Genome Center, University of California, One Shields Avenue, 95616, Davis, CA, USA
Bertram Ludäscher

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Emrich, T., Graf, F., Kriegel, HP., Schubert, M., Thoma, M. (2010). Similarity Estimation Using Bayes Ensembles. In: Gertz, M., Ludäscher, B. (eds) Scientific and Statistical Database Management. SSDBM 2010. Lecture Notes in Computer Science, vol 6187. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13818-8_37

Download citation

DOI: https://doi.org/10.1007/978-3-642-13818-8_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13817-1
Online ISBN: 978-3-642-13818-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics