On the Expressivity of Alignment-Based Distance and Similarity Measures on Sequences and Trees in Inducing Orderings

Emms, Martin; Franco-Penya, Hector-Hugo

doi:10.1007/978-1-4614-5076-4_1

Martin Emms⁴ &
Hector-Hugo Franco-Penya⁴

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 30))

2784 Accesses

Abstract

Both ‘distance’ and ‘similarity’ measures have been proposed for the comparison of sequences and for the comparison of trees, based on scoring mappings. For a given alphabet of node-labels, the measures are parameterised by a table giving label-dependent values for swaps, deletions and insertions. The paper addresses the question whether an ordering by a ‘distance’ measure, with some parameter setting, can be also expressed by a ‘similarity’ measure, with some other parameter setting, and vice versa. Ordering of three kinds is considered: alignment-orderings, for fixed source S and target T, neighbour-orderings, where for a fixed S, varying candidate neighbours T _i are ranked, and pair-orderings, where for varying S _i, and varying T _j, the pairings \(\langle {S}_{i},{T}_{j}\rangle\) are ranked. We show that (1) any alignment-ordering expressed by ‘distance’ setting be re-expressed by a ‘similarity’ setting, and vice versa; (2) any neigbour-ordering and pair-ordering expressed by a ‘distance’ setting be re-expressed by a ‘similarity’ setting; (3) there are neighbour-orderings and pair-orderings expressed by a ‘similarity’ setting which cannot be expressed by a ‘similarity’ setting. A consequence of this is that there are categorisation and hierarchical clustering outcomes which can be achieved via similarity but not via

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
So if (i, j) and (i′, j′) are in the mapping then (T1) left(i, i′) iff left(j, j′) and (T2) anc(i, i′) iff anc(j, j′).
2.
Note in this general setting even a pairing of two nodes with identical labels can in principal make a nonzero cost contribution.
3.
The literature contains quite a number of inequivalent notions, all referred to as ‘tree distance’; in this article Definition 2 will be understood to define the term.
4.
Or a subtree.
5.
While Definition 3 formulates Θ with deletion/insertion contributions subtracted, as is often done [13, 15], an alternative formulation has these treated additively [5]. With the additive formulation, the same consideration suggests making deletion/insertions non-positive.
6.
See Sect. 3 of [2].
7.
See Sect. 4, Corollary 4.7 of their paper.
8.
The proofs in [15] do require that some some conditions on the input similarity table C ^Θ be imposed.

References

Batagelj, V., Bren, M.: Comparing resemblance measures. J. Classif. 12(1), 73–90 (1995)
Article MathSciNet MATH Google Scholar
Chen, S., Ma, B., Zhang, K.: On the similarity metric and the distance metric. Theoret. Comput. Sci. 410(24–25), 2365–2376 (2009)
Article MathSciNet MATH Google Scholar
Emms, M.: On stochastic tree distances and their training via expectation-maximisation. In: Proceedings of ICPRAM 2012 International Conference on Pattern Recognition Application and Methods. SciTePress (2012)
Google Scholar
Emms, M., Franco-Penya, H.: Data-set used in Kendall-Tau experiments. http://www.scss.tcd.ie/Martin.Emms/SimVsDistData September 8th (2011)
Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997)
Book MATH Google Scholar
Haji, J., Ciaramita, M., Johansson, R., Kawahara, D., Meyers, A., Nivre, J., Surdeanu, M., Xue, N., Zhang, Y.: The conll-2009 shared task: Syntactic and semantic dependencies in multiple languages. In: Proceedings of the 13th Conference on Computational Natural Language Learning (CoNLL-2009). OmniPress (2009)
Google Scholar
Herrbach, C., Denise, A., Dulucq, S., Touzet, H.: Alignment of rna secondary structures using a full set of operations. Technical Report 145, LRI (2006)
Google Scholar
Kendall, M.G.: The treatment of ties in ranking problems. Biometrika 33(3), 239–251 (1945)
Article MathSciNet MATH Google Scholar
Kuboyama, T.: Matching and learning in trees. PhD thesis, Graduate School of Engineering, University of Tokyo (2007)
Google Scholar
Lesot, M.J., Rifqi, M.: Order-based equivalence degrees for similarity and distance measures. In: Proceedings of the Computational Intelligence for Knowledge-Based Systems Design, and 13th International Conference on Information Processing and Management of Uncertainty. IPMU’10, pp. 19–28. Springer, Berlin (2010)
Google Scholar
Omhover, J.F., Rifqi, M., Detyniecki, M.: Ranking invariance based on similarity measures in document retrieval. In: Adaptive Multimedia Retrieval, pp. 55–64 Elsevier (2005)
Google Scholar
Ristad, E.S., Yianilos, P.N.: Learning string edit distance. IEEE Trans. Pattern Recogn. Mach. Intell. 20(5), 522–532 (1998)
Article Google Scholar
Smith, T.F., Waterman, M.S.: Comparison of biosequences. Adv. Appl. Math. 2(4), 482–489 (1981)
Article MathSciNet MATH Google Scholar
Spiro, P.A., Macura, N.: A local alignment metric for accelerating biosequence database search. J. Comput. Biol. 11(1), 61–82 (2004)
Article Google Scholar
Stojmirovic, A., Yu, Y.K.: Geometric aspects of biological sequence comparison. J. Comput. Biol. 16, 579–610 (2009)
Article MathSciNet Google Scholar
Tai, K.C.: The tree-to-tree correction problem. J. ACM (JACM) 26(3), 433 (1979)
Google Scholar
Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. Assoc. Comput. Mach. 21(1), 168–173 (1974)
Article MathSciNet MATH Google Scholar
Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18, 1245–1262 (1989)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research is supported by the Science Foundation Ireland (Grant 07/CE/I1142) as part of the Centre for Next Generation Localisation (http://www.cngl.ie) at Trinity College Dublin.

Author information

Authors and Affiliations

School of Computer Science and Statistics, Trinity College, Dublin, Ireland
Martin Emms & Hector-Hugo Franco-Penya

Authors

Martin Emms
View author publications
You can also search for this author in PubMed Google Scholar
Hector-Hugo Franco-Penya
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Emms .

Editor information

Editors and Affiliations

, Dpt. Lenguajes y Sistemas Informáticos, Jaume I University, Campus del Riu Sec s/n, Castellón de la Plana, 12071, Spain
Pedro Latorre Carmona
, Dpt. Lenguajes y Sistemas Informáticos, Jaume I University, Campus del Riu Sec s/n, Castellón de la Plana, 12071, Spain
J. Salvador Sánchez
Technical University of Lisbon, Av. Rovisco Pais, Torre Norte, piso 10, Lisbon, 1049-001, Portugal
Ana L.N. Fred

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Emms, M., Franco-Penya, HH. (2013). On the Expressivity of Alignment-Based Distance and Similarity Measures on Sequences and Trees in Inducing Orderings. In: Latorre Carmona, P., Sánchez, J., Fred, A. (eds) Mathematical Methodologies in Pattern Recognition and Machine Learning. Springer Proceedings in Mathematics & Statistics, vol 30. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5076-4_1

Download citation

DOI: https://doi.org/10.1007/978-1-4614-5076-4_1
Published: 16 October 2012
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-5075-7
Online ISBN: 978-1-4614-5076-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics