The Similarity Between Dissimilarities

  • David M. J. TaxEmail author
  • Veronika Cheplygina
  • Robert P. W. Duin
  • Jan van de Poll
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10029)


When characterizing teams of people, molecules, or general graphs, it is difficult to encode all information using a single feature vector only. For these objects dissimilarity matrices that do capture the interaction or similarity between the sub-elements (people, atoms, nodes), can be used. This paper compares several representations of dissimilarity matrices, that encode the cluster characteristics, latent dimensionality, or outliers of these matrices. It appears that both the simple eigenvalue spectrum, or histogram of distances are already quite effective, and are able to reach high classification performances in multiple instance learning (MIL) problems. Finally, an analysis on teams of people is given, illustrating the potential use of dissimilarity matrix characterization for business consultancy.


Feature Vector Dissimilarity Matrix Multiple Instance Learning Dissimilarity Matrice Graph Kernel 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Ahrens, T., Chapman, C.S.: Doing qualitative field research in management accounting: positioning data to contribute to theory. Acc. Organ. Soc. 31(8), 819–841 (2006)CrossRefGoogle Scholar
  2. 2.
    Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Advances in Neural Information Processing Systems, pp. 561–568 (2002)Google Scholar
  3. 3.
    Bunke, H., Riesen, K.: Recent advances in graph-based pattern recognition with applications in document analysis. Pattern Recogn. 44(5), 1057–1067 (2011)CrossRefzbMATHGoogle Scholar
  4. 4.
    Chen, Y., Bi, J., Wang, J.: MILES: multiple-instance learning via embedded instance selection. IEEE Trans. Pattern Anal. Mach. Intell. 28(12), 1931–1947 (2006)CrossRefGoogle Scholar
  5. 5.
    Cheplygina, V., Tax, D.M.J., Loog, M.: Multiple instance learning with bag dissimilarities. Pattern Recogn. 48(1), 264–275 (2015)CrossRefGoogle Scholar
  6. 6.
    Cvetkovic, D., Doob, M., Sachs, H.: Spectra of Graphs, 3rd edn. Johann Ambrosius Barth Verlag, Heidelberg (1995)zbMATHGoogle Scholar
  7. 7.
    De Jong, M.G., Steenkamp, J.B.E., Fox, J.P., Baumgartner, H.: Using item response theory to measure extreme response style in marketing research: a global investigation. J. Market. Res. 45(1), 104–115 (2008)CrossRefGoogle Scholar
  8. 8.
    Diamond, I.D., McDonald, J., Shah, I.: Proportional hazards models for current status data: application to the study of age at weaning differentials in Pakistan. Demography 23(4), 607–620 (1986)CrossRefGoogle Scholar
  9. 9.
    Edelen, M.O., Reeve, B.B.: Applying item response theory (IRT) modeling to questionnaire development, evaluation, and refinement. Qual. Life Res. 5, 5–18 (2007)CrossRefGoogle Scholar
  10. 10.
    Feragen, A., Kasenburg, N., Petersen, J., de Bruijne, M., Borgwardt, K.: Scalable kernels for graphs with continuous attributes. In: Advances in Neural Information Processing Systems, pp. 216–224 (2013)Google Scholar
  11. 11.
    Gärtner, T., Flach, P.A., Kowalczyk, A., Smola, A.J.: Multi-instance kernels. In: International Conference on Machine Learning, pp. 179–186 (2002)Google Scholar
  12. 12.
    Gärtner, T.: Predictive graph mining with kernel methods. In: Advanced Methods for Knowledge Discovery from Complex Data, pp. 95–121 (2005)Google Scholar
  13. 13.
    Hopkins, L., Ferguson, K.E.: Looking forward: the role of multiple regression in family business research. J. Fam. Bus. Strategy 5(1), 52–62 (2014)CrossRefGoogle Scholar
  14. 14.
    Hubert, L., Arabie, P., Meulman, J.: 9. Anti-Robinson Matrices for Symmetric Proximity Data. ASA-SIAM Series on Statistics and Applied Probability (Book 19), chap. 11, pp. 115–141 (2006)Google Scholar
  15. 15.
    Lau, L., Yang-Turner, F., Karacapilidis, N.: Requirements for big data analytics supporting decision making: a sensemaking perspective. In: Karacapilidis, N. (ed.) Mastering Data-Intensive Collaboration and Decision Making. SBD, vol. 5, pp. 49–70. Springer, Heidelberg (2014). doi: 10.1007/978-3-319-02612-1_3 CrossRefGoogle Scholar
  16. 16.
    Lee, W.J., Cheplygina, V., Tax, D.M.J., Loog, M., Duin, R.P.W.: Bridging structure and feature representations in graph matching. Int. J. Pattern Recogn. Artif. Intell. (IJPRAI) 26(05), 1260005 (2012)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(2579–2605), 85 (2008)zbMATHGoogle Scholar
  18. 18.
    Papadopoulos, A., Manolopoulos, Y.: Structure-based similarity search with graph histograms. In: Proceedings of the International Workshop on Similarity Search, pp. 174–178 (1999)Google Scholar
  19. 19.
    Piterenko, K.: Business and impact alignment of questionnaire. Master’s thesis, Gjovik University College (2013)Google Scholar
  20. 20.
    Plewis, I., Mason, P.: What works and why: combining quantitative and qualitative approaches in large-scale evaluations. Int. J. Soc. Res. Methodol. 8(3), 185–194 (2007)CrossRefGoogle Scholar
  21. 21.
    Riesen, K., Fankhauser, S., Bunke, H., Dickinson, P.J.: Efficient suboptimal graph isomorphism. In: Torsello, A., Escolano, F., Brun, L. (eds.) GbRPR 2009. LNCS, vol. 5534, pp. 124–133. Springer, Heidelberg (2009). doi: 10.1007/978-3-642-02124-4_13 CrossRefGoogle Scholar
  22. 22.
    Roulston, K., Shelton, S.A.: Reconceptualizing bias in teaching qualitative research methods. Qual. Inq. 21(4), 332–342 (2015)CrossRefGoogle Scholar
  23. 23.
    Schnitzer, D., Flexer, A., Schedl, M., Widmer, G.: Local and global scaling reduce hubs in space. J. Mach. Learn. Res. 13, 2871–2902 (2012)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Schnitzer, D., Flexer, A., Tomasev, N.: Choosing the metric in high-dimensional spaces based on hub analysis. In: ESANN (2014)Google Scholar
  25. 25.
    Tax, D.M.J., Loog, M., Duin, R.P.W., Cheplygina, V., Lee, W.-J.: Bag dissimilarities for multiple instance learning. In: Pelillo, M., Hancock, E.R. (eds.) SIMBAD 2011. LNCS, vol. 7005, pp. 222–234. Springer, Heidelberg (2011). doi: 10.1007/978-3-642-24471-1_16 CrossRefGoogle Scholar
  26. 26.
    Wang, H.Y., Yang, Q., Zha, H.: Adaptive p-posterior mixture-model kernels for multiple instance learning. In: Proceedings of 25th International Conference Machine learning, pp. 1136–1143 (2008)Google Scholar
  27. 27.
    Zhang, Q., Goldman, S.A., et al.: EM-DD: an improved multiple-instance learning technique. In: Advances in Neural Information Processing Systems, pp. 1073–1080 (2001)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • David M. J. Tax
    • 1
    Email author
  • Veronika Cheplygina
    • 1
    • 2
  • Robert P. W. Duin
    • 1
  • Jan van de Poll
    • 3
  1. 1.Pattern Recognition LaboratoryDelft University of TechnologyDelftThe Netherlands
  2. 2.Biomedical Imaging Group RotterdamErasmus Medical CenterRotterdamThe Netherlands
  3. 3.Transparency LabAmsterdamThe Netherlands

Personalised recommendations