Abstract
Both ‘distance’ and ‘similarity’ measures have been proposed for the comparison of sequences and for the comparison of trees, based on scoring mappings. For a given alphabet of node-labels, the measures are parameterised by a table giving label-dependent values for swaps, deletions and insertions. The paper addresses the question whether an ordering by a ‘distance’ measure, with some parameter setting, can be also expressed by a ‘similarity’ measure, with some other parameter setting, and vice versa. Ordering of three kinds is considered: alignment-orderings, for fixed source S and target T, neighbour-orderings, where for a fixed S, varying candidate neighbours T i are ranked, and pair-orderings, where for varying S i , and varying T j , the pairings \(\langle {S}_{i},{T}_{j}\rangle\) are ranked. We show that (1) any alignment-ordering expressed by ‘distance’ setting be re-expressed by a ‘similarity’ setting, and vice versa; (2) any neigbour-ordering and pair-ordering expressed by a ‘distance’ setting be re-expressed by a ‘similarity’ setting; (3) there are neighbour-orderings and pair-orderings expressed by a ‘similarity’ setting which cannot be expressed by a ‘similarity’ setting. A consequence of this is that there are categorisation and hierarchical clustering outcomes which can be achieved via similarity but not via
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
So if (i, j) and (i′, j′) are in the mapping then (T1) left(i, i′) iff left(j, j′) and (T2) anc(i, i′) iff anc(j, j′).
- 2.
Note in this general setting even a pairing of two nodes with identical labels can in principal make a nonzero cost contribution.
- 3.
The literature contains quite a number of inequivalent notions, all referred to as ‘tree distance’; in this article Definition 2 will be understood to define the term.
- 4.
Or a subtree.
- 5.
- 6.
See Sect. 3 of [2].
- 7.
See Sect. 4, Corollary 4.7 of their paper.
- 8.
The proofs in [15] do require that some some conditions on the input similarity table C Θ be imposed.
References
Batagelj, V., Bren, M.: Comparing resemblance measures. J. Classif. 12(1), 73–90 (1995)
Chen, S., Ma, B., Zhang, K.: On the similarity metric and the distance metric. Theoret. Comput. Sci. 410(24–25), 2365–2376 (2009)
Emms, M.: On stochastic tree distances and their training via expectation-maximisation. In: Proceedings of ICPRAM 2012 International Conference on Pattern Recognition Application and Methods. SciTePress (2012)
Emms, M., Franco-Penya, H.: Data-set used in Kendall-Tau experiments. http://www.scss.tcd.ie/Martin.Emms/SimVsDistData September 8th (2011)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997)
Haji, J., Ciaramita, M., Johansson, R., Kawahara, D., Meyers, A., Nivre, J., Surdeanu, M., Xue, N., Zhang, Y.: The conll-2009 shared task: Syntactic and semantic dependencies in multiple languages. In: Proceedings of the 13th Conference on Computational Natural Language Learning (CoNLL-2009). OmniPress (2009)
Herrbach, C., Denise, A., Dulucq, S., Touzet, H.: Alignment of rna secondary structures using a full set of operations. Technical Report 145, LRI (2006)
Kendall, M.G.: The treatment of ties in ranking problems. Biometrika 33(3), 239–251 (1945)
Kuboyama, T.: Matching and learning in trees. PhD thesis, Graduate School of Engineering, University of Tokyo (2007)
Lesot, M.J., Rifqi, M.: Order-based equivalence degrees for similarity and distance measures. In: Proceedings of the Computational Intelligence for Knowledge-Based Systems Design, and 13th International Conference on Information Processing and Management of Uncertainty. IPMU’10, pp. 19–28. Springer, Berlin (2010)
Omhover, J.F., Rifqi, M., Detyniecki, M.: Ranking invariance based on similarity measures in document retrieval. In: Adaptive Multimedia Retrieval, pp. 55–64 Elsevier (2005)
Ristad, E.S., Yianilos, P.N.: Learning string edit distance. IEEE Trans. Pattern Recogn. Mach. Intell. 20(5), 522–532 (1998)
Smith, T.F., Waterman, M.S.: Comparison of biosequences. Adv. Appl. Math. 2(4), 482–489 (1981)
Spiro, P.A., Macura, N.: A local alignment metric for accelerating biosequence database search. J. Comput. Biol. 11(1), 61–82 (2004)
Stojmirovic, A., Yu, Y.K.: Geometric aspects of biological sequence comparison. J. Comput. Biol. 16, 579–610 (2009)
Tai, K.C.: The tree-to-tree correction problem. J. ACM (JACM) 26(3), 433 (1979)
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. Assoc. Comput. Mach. 21(1), 168–173 (1974)
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18, 1245–1262 (1989)
Acknowledgements
This research is supported by the Science Foundation Ireland (Grant 07/CE/I1142) as part of the Centre for Next Generation Localisation (http://www.cngl.ie) at Trinity College Dublin.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this paper
Cite this paper
Emms, M., Franco-Penya, HH. (2013). On the Expressivity of Alignment-Based Distance and Similarity Measures on Sequences and Trees in Inducing Orderings. In: Latorre Carmona, P., Sánchez, J., Fred, A. (eds) Mathematical Methodologies in Pattern Recognition and Machine Learning. Springer Proceedings in Mathematics & Statistics, vol 30. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5076-4_1
Download citation
DOI: https://doi.org/10.1007/978-1-4614-5076-4_1
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-5075-7
Online ISBN: 978-1-4614-5076-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)