# Bayesian object matching

- 528 Downloads
- 2 Citations

## Abstract

Matching of object refers to the problem of inferring unknown co-occurrence or alignment between observations or samples in two data sets. Given two sets of equally many samples, the task is to find for each sample a representative sample in the other set, without prior knowledge on a distance measure between the sets. Given a distance measure, the problem would correspond to a linear assignment problem, the problem of finding a permutation that re-orders samples in one set to minimize the total distance. When no such measure is available, we need to consider more complex solutions. Typical approaches maximize statistical dependency between the two sets, whereas in this work we present a Bayesian solution that builds a joint model for the two sources. We learn a Bayesian canonical correlation analysis model that includes a permutation parameter for re-ordering the samples in one of the sets. We provide both variational and sampling-based inference for approximative Bayesian analysis, and demonstrate on three data sets that the resulting methods outperform the earlier solutions.

## Keywords

Canonical correlation analysis Matching Permutation Bayesian analysis## Notes

### Acknowledgements

The research was funded primarily by the TEKES, as part of the TIVIT Data to Intelligence (D2I) Program, and in part by Academy of Finland (Finnish Center of Excellence for Computational Inference COIN, 251170). We provide our grateful thanks for Prof. Matej Orešič for providing the data used in the metabolomics experiment, for Novi Quadrianto for providing the data for the image matching experiment, and for Nemanja Djuric for providing the code for CKS and the data for the document alignment task.

## References

- Andrieu, C., & Robers, G. O. (2009). The pseudo-marginal approach for efficient Monte Carlo computations.
*The Annals of Statistics*,*37*(2), 697–725. MathSciNetMATHCrossRefGoogle Scholar - Bach, F. R., & Jordan, M. I. (2005).
*A probabilistic interpretation of canonical correlation analysis*(Technical Report 688), Department of Statistics, University of California, Berkeley. Google Scholar - Boyd-Graber, J., & Blei, D. M. (2009). Multilingual topic models for unaligned text. In
*Uncertainty in artificial intelligence*. Google Scholar - Burkard, R. E. (1984). Quadratic assignment problems.
*European Journal of Operational Research*,*15*(3), 283–289. MathSciNetMATHCrossRefGoogle Scholar - Djuric, N., Grbovic, M., & Vucetic, S. (2012). Convex kernelized sorting. In
*Proceedings of the 26th AAAI conference on artificial intelligence*(pp. 893–899). Google Scholar - Haghighi, A., Liang, P., Berh-Kirkpatrick, T., & Klein, D. (2008). Learning bilingual lexicons from monolingual corpora. In
*Proceedings of ACL-08: HLT*(pp. 771–779). Google Scholar - Jagarlamudi, J., Juarez, S., & Daumé, H. III (2010). Kernelized sorting for natural language processing. In
*Proceedings of the 24th AAAI conference on artificial intelligence (AAAI-10)*(pp. 1020–1025). Google Scholar - Jebara, T. (2004). Kernelized sorting, permutation, and alignment for minimal volume PCA. In
*LNAI: Vol.**3120*.*Conference on computational learning theory (COLT)*(pp. 609–623). Google Scholar - Klami, A. (2012). Variational Bayesian matching. In
*JMLR C&WP: Vol.**25*.*Proceedings of Asian conference on machine learning*(pp. 205–220). Google Scholar - Klami, A., & Kaski, S. (2007). Local dependent components. In
*Proceedings of the 24th international conference on machine learning (ICML)*(pp. 425–432). CrossRefGoogle Scholar - Klami, A., Virtanen, S., & Kaski, S. (2013). Bayesian canonical correlation analysis.
*Journal of Machine Learning Research*,*14*, 899–937. Google Scholar - Knowles, D., & Ghahramani, Z. (2011). Nonparametric Bayesian sparse factor models with application to gene expression modeling.
*Annals of Applied Statistics*,*5*(2B), 1534–1552. MathSciNetMATHCrossRefGoogle Scholar - Kondor, R., Howard, A., & Jebara, T. (2007). Multi-object tracking with representations of the symmetric group. In
*Proceedings of the 11th international conference on artificial intelligence and statistics (AISTATS)*. Google Scholar - Kuhn, H. W. (1955). The Hungarian method for the assignment problem.
*Naval Research Logistics Quarterly*,*2*(1–2), 83–97. MathSciNetCrossRefGoogle Scholar - Leskovec, J., Chakrabarti, D., Kleinberg, J., Faloutsos, C., & Ghahramani, Z. (2010). Kronecker graphs: an approach to modeling networks.
*Journal of Machine Learning Research*,*11*, 985–1042. MathSciNetMATHGoogle Scholar - Plis, S. M., McCracken, S., Lane, T., & Calhoun, V. D. (2011). Directional statistics on permutations. In
*Proceedings of the 14th international conference on artificial intelligence and statistics (AISTATS)*(pp. 600–608). Google Scholar - Quadrianto, N., Song, L., & Smola, A. (2009). Kernelized sorting. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.),
*Advances in neural information processing systems*(Vol. 21, pp. 1289–1296). Google Scholar - Quadrianto, N., Smola, A. J., Song, L., & Tuytelaars, T. (2010). Kernelized sorting.
*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*32*(10), 1809–1821. CrossRefGoogle Scholar - Smola, A. J., Gretton, A., Song, L., & Schölkopf, B. (2007). A Hilbert space embedding for distributions. In
*LNCS: Vol.**4754*.*Algorithmic learning theory*(pp. 13–31). CrossRefGoogle Scholar - Sysi-Aho, M., et al. (2011). Metabolic regulation in progression to autoimmune diabetes.
*PLoS Computational Biology*,*7*, e1002257. CrossRefGoogle Scholar - Tripathi, A., Klami, A., & Kaski, S. (2009). Using dependencies to pair samples for multi-view learning. In
*Proceedings of ICASSP 09, the international conference on acoustics, speech, and signal processing*(pp. 1561–1564). Google Scholar - Tripathi, A., Klami, A., & Virpioja, S. (2010). Bilingual sentence matching using kernel CCA. In
*Proceedings of MLSP 2010, IEEE international workshop on machine learning for signal processing*(pp. 130–135). CrossRefGoogle Scholar - Tripathi, A., Klami, A., Orešič, M., & Kaski, S. (2011). Matching samples of multiple views.
*Data Mining and Knowledge Discovery*,*23*, 300–321. MathSciNetMATHCrossRefGoogle Scholar - Virtanen, S., Klami, A., & Kaski, S. (2011). Bayesian CCA via group sparsity. In
*Proceedings of the 28th international conference on machine learning (ICML)*(pp. 457–464). Google Scholar - Yamada, M., & Sugiyama, M. (2011). Cross-domain object matching with model selection. In
*Proceedings of the 14th international conference on artificial intelligence and statistics (AISTATS)*(pp. 807–815). Google Scholar - Wang, C., & Mahadevan, S. (2009). Manifold alignment without correspondence. In
*Proceedings of the 21st international joint conference on artificial intelligence (IJCAI)*(pp. 1273–1278). Google Scholar