Skip to main content

On the Expressivity of Alignment-Based Distance and Similarity Measures on Sequences and Trees in Inducing Orderings

  • Conference paper
  • First Online:
Mathematical Methodologies in Pattern Recognition and Machine Learning

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 30))

  • 2784 Accesses

Abstract

Both ‘distance’ and ‘similarity’ measures have been proposed for the comparison of sequences and for the comparison of trees, based on scoring mappings. For a given alphabet of node-labels, the measures are parameterised by a table giving label-dependent values for swaps, deletions and insertions. The paper addresses the question whether an ordering by a ‘distance’ measure, with some parameter setting, can be also expressed by a ‘similarity’ measure, with some other parameter setting, and vice versa. Ordering of three kinds is considered: alignment-orderings, for fixed source S and target T, neighbour-orderings, where for a fixed S, varying candidate neighbours T i are ranked, and pair-orderings, where for varying S i , and varying T j , the pairings \(\langle {S}_{i},{T}_{j}\rangle\) are ranked. We show that (1) any alignment-ordering expressed by ‘distance’ setting be re-expressed by a ‘similarity’ setting, and vice versa; (2) any neigbour-ordering and pair-ordering expressed by a ‘distance’ setting be re-expressed by a ‘similarity’ setting; (3) there are neighbour-orderings and pair-orderings expressed by a ‘similarity’ setting which cannot be expressed by a ‘similarity’ setting. A consequence of this is that there are categorisation and hierarchical clustering outcomes which can be achieved via similarity but not via

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    So if (i, j) and (i′, j′) are in the mapping then (T1) left(i, i′) iff left(j, j′) and (T2) anc(i, i′) iff anc(j, j′).

  2. 2.

    Note in this general setting even a pairing of two nodes with identical labels can in principal make a nonzero cost contribution.

  3. 3.

    The literature contains quite a number of inequivalent notions, all referred to as ‘tree distance’; in this article Definition 2 will be understood to define the term.

  4. 4.

    Or a subtree.

  5. 5.

    While Definition 3 formulates Θ with deletion/insertion contributions subtracted, as is often done [13, 15], an alternative formulation has these treated additively [5]. With the additive formulation, the same consideration suggests making deletion/insertions non-positive.

  6. 6.

    See Sect. 3 of [2].

  7. 7.

    See Sect. 4, Corollary 4.7 of their paper.

  8. 8.

    The proofs in [15] do require that some some conditions on the input similarity table C Θ be imposed.

References

  1. Batagelj, V., Bren, M.: Comparing resemblance measures. J. Classif. 12(1), 73–90 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  2. Chen, S., Ma, B., Zhang, K.: On the similarity metric and the distance metric. Theoret. Comput. Sci. 410(24–25), 2365–2376 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  3. Emms, M.: On stochastic tree distances and their training via expectation-maximisation. In: Proceedings of ICPRAM 2012 International Conference on Pattern Recognition Application and Methods. SciTePress (2012)

    Google Scholar 

  4. Emms, M., Franco-Penya, H.: Data-set used in Kendall-Tau experiments. http://www.scss.tcd.ie/Martin.Emms/SimVsDistData September 8th (2011)

  5. Gusfield, D.: Algorithms on Strings, Trees, and Sequences. Cambridge University Press, Cambridge (1997)

    Book  MATH  Google Scholar 

  6. Haji, J., Ciaramita, M., Johansson, R., Kawahara, D., Meyers, A., Nivre, J., Surdeanu, M., Xue, N., Zhang, Y.: The conll-2009 shared task: Syntactic and semantic dependencies in multiple languages. In: Proceedings of the 13th Conference on Computational Natural Language Learning (CoNLL-2009). OmniPress (2009)

    Google Scholar 

  7. Herrbach, C., Denise, A., Dulucq, S., Touzet, H.: Alignment of rna secondary structures using a full set of operations. Technical Report 145, LRI (2006)

    Google Scholar 

  8. Kendall, M.G.: The treatment of ties in ranking problems. Biometrika 33(3), 239–251 (1945)

    Article  MathSciNet  MATH  Google Scholar 

  9. Kuboyama, T.: Matching and learning in trees. PhD thesis, Graduate School of Engineering, University of Tokyo (2007)

    Google Scholar 

  10. Lesot, M.J., Rifqi, M.: Order-based equivalence degrees for similarity and distance measures. In: Proceedings of the Computational Intelligence for Knowledge-Based Systems Design, and 13th International Conference on Information Processing and Management of Uncertainty. IPMU’10, pp. 19–28. Springer, Berlin (2010)

    Google Scholar 

  11. Omhover, J.F., Rifqi, M., Detyniecki, M.: Ranking invariance based on similarity measures in document retrieval. In: Adaptive Multimedia Retrieval, pp. 55–64 Elsevier (2005)

    Google Scholar 

  12. Ristad, E.S., Yianilos, P.N.: Learning string edit distance. IEEE Trans. Pattern Recogn. Mach. Intell. 20(5), 522–532 (1998)

    Article  Google Scholar 

  13. Smith, T.F., Waterman, M.S.: Comparison of biosequences. Adv. Appl. Math. 2(4), 482–489 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  14. Spiro, P.A., Macura, N.: A local alignment metric for accelerating biosequence database search. J. Comput. Biol. 11(1), 61–82 (2004)

    Article  Google Scholar 

  15. Stojmirovic, A., Yu, Y.K.: Geometric aspects of biological sequence comparison. J. Comput. Biol. 16, 579–610 (2009)

    Article  MathSciNet  Google Scholar 

  16. Tai, K.C.: The tree-to-tree correction problem. J. ACM (JACM) 26(3), 433 (1979)

    Google Scholar 

  17. Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. Assoc. Comput. Mach. 21(1), 168–173 (1974)

    Article  MathSciNet  MATH  Google Scholar 

  18. Zhang, K., Shasha, D.: Simple fast algorithms for the editing distance between trees and related problems. SIAM J. Comput. 18, 1245–1262 (1989)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research is supported by the Science Foundation Ireland (Grant 07/CE/I1142) as part of the Centre for Next Generation Localisation (http://www.cngl.ie) at Trinity College Dublin.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Martin Emms .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this paper

Cite this paper

Emms, M., Franco-Penya, HH. (2013). On the Expressivity of Alignment-Based Distance and Similarity Measures on Sequences and Trees in Inducing Orderings. In: Latorre Carmona, P., Sánchez, J., Fred, A. (eds) Mathematical Methodologies in Pattern Recognition and Machine Learning. Springer Proceedings in Mathematics & Statistics, vol 30. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-5076-4_1

Download citation

Publish with us

Policies and ethics