Abstract
Similarity search and data mining often rely on distance or similarity functions in order to provide meaningful results and semantically meaningful patterns. However, standard distance measures like L p -norms are often not capable to accurately mirror the expected similarity between two objects. To bridge the so-called semantic gap between feature representation and object similarity, the distance function has to be adjusted to the current application context or user. In this paper, we propose a new probabilistic framework for estimating a similarity value based on a Bayesian setting. In our framework, distance comparisons are modeled based on distribution functions on the difference vectors. To combine these functions, a similarity score is computed by an Ensemble of weak Bayesian learners for each dimension in the feature space. To find independent dimensions of maximum meaning, we apply a space transformation based on eigenvalue decomposition. In our experiments, we demonstrate that our new method shows promising results compared to related Mahalanobis learners on several test data sets w.r.t. nearest-neighbor classification and precision-recall-graphs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning distance functions using equivalence relations. In: Proceedings of the 20th International Conference on Machine Learning (ICML), Washington, DC, USA, pp. 11–18 (2003)
Belhumeur, P.N., Hespanha, J.P., Kriegman, D.J.: Eigenfaces vs. Fisherfaces: Recognition using class specific linear projection. IEEE Transactions on Pattern Analysis and Machine Intelligence 19(7), 711–720 (1997)
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Computation 15(6), 1373–1396 (2003)
Chinga, G., Gregersen, O., Dougherty, B.: Paper surface characterisation by laser profilometry and image analysis. Journal of Microscopy and Analysis 84, 5–7 (2003)
Comon, P.: Independent component analysis, a new concept? Signal Processing 36(3), 287–314 (1994)
Cox, T.F., Cox, M.A.A.: Multidimensional Scaling, 2nd edn. Chapman & Hall/CRC, Boca Raton (2001)
Davis, J., Kulis, B., Sra, S., Dhillon, I.: Information-theoretic metric learning. In: NIPS 2006 Workshop on Learning to Compare Examples (2007)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Annals of Eugenics 7, 179–188 (1936)
Goldberger, J., Roweis, S., Hinton, G., Salakhutdinov, R.: Neighborhood component analysis. In: Advances in Neural Information Processing Systems, pp. 513–520. MIT Press, Cambridge (2004)
Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Addison-Wesley Longman Publishing Co., Inc., Boston (2001)
Haralick, R.M., Shanmugam, K., Dinstein, I.: Textural features for image classification. IEEE Transactions on Speech and Audio Processing 3(6), 6103–6623 (1973)
Jacobs, D.W., Weinshall, D., Gdalyahu, Y.: Classification with non-metric distances: Image retrieval and class representation. IEEE Trans. Pattern Analysis and Machine Intelligence 22(6), 583–600 (2000)
Moghaddam, B., Pentland, A.: Probabilistic visual learning for object representation. IEEE Trans. Pattern Analysis and Machine Intelligence 19(7), 696–710 (1997)
Newman, D.J., Hettich, S., Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998)
Nilsback, M.E., Zisserman, A.: A visual vocabulary for flower classification. CVPR 2, 1447–1454 (2006)
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000)
Santini, S., Jain, R.: Similarity measures. IEEE Trans. Pattern Analysis and Machine Intelligence 21, 871–883 (1999)
Tan, X., Chen, S., Zhou, Z.H., Liu, J.: Learning non-metric partial similarity based on maximal margin criterion. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 138–145 (2006)
Tenenbaum, J.B., Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)
Tversky, A.: Features of similarity. Psychological Review 84(4), 327–352 (1977)
Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: Advances in Neural Information Processing Systems. MIT Press, Cambridge (2006)
Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. Journal of Machine Learning Research 10, 207–244 (2009)
Yang, L.: An overview of distance metric learning. Technical report, Department of Computer Science and Engineering, Michigan State University (2007)
Yang, L., Jin, R.: Distance metric learning: A comprehensive survey. Technical report, Department of Computer Science and Engineering, Michigan State University (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Emrich, T., Graf, F., Kriegel, HP., Schubert, M., Thoma, M. (2010). Similarity Estimation Using Bayes Ensembles. In: Gertz, M., Ludäscher, B. (eds) Scientific and Statistical Database Management. SSDBM 2010. Lecture Notes in Computer Science, vol 6187. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13818-8_37
Download citation
DOI: https://doi.org/10.1007/978-3-642-13818-8_37
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13817-1
Online ISBN: 978-3-642-13818-8
eBook Packages: Computer ScienceComputer Science (R0)