Phrase Similarity through the Edit Distance

  • Manuel Vilares
  • Francisco J. Ribadas
  • Jesús Vilares
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3180)


This work intends to capture the concept of similarity between phrases. The algorithm is based on a dynamic programming approach integrating both the edit distance between parse trees and single-term similarity. Our work stresses the use of the underlying grammatical structure, which serves as a guide in the computation of semantic similarity between words. This proposal allows us to obtain a more accurate notion of semantic proximity at sentence level, without increasing the complexity of the pattern-matching algorithm on which it is based.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Hammouda, K., Kamel, M.: Phrase-based document similarity based on an index graph model. In: 2002 IEEE Int. Conf. on Data Mining, Maebashi, Japan, pp. 203–210 (2002)Google Scholar
  2. 2.
    Montes-y-Gomez, M., Gelbukh, A., Lopez-Lopez, A., Baeza-Yates, R.: Flexible Comparison of Conceptual Graphs. In: Mayr, H.C., Lazanský, J., Quirchmayr, G., Vogel, P. (eds.) DEXA 2001. LNCS, vol. 2113, p. 102. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  3. 3.
    Lin, D.: An information-theoretic definition of similarity. In: Proc. 15th International Conf. on Machine Learning, pp. 296–304 (1998)Google Scholar
  4. 4.
    Miller, G.: WordNet: An online lexical database. International Journal of Lexico- graphy 3(4) (1990)Google Scholar
  5. 5.
    Mitchell: Machine learning and data mining. CACM: Communications of the ACM 42 (1999)Google Scholar
  6. 6.
    Tai, K.-C.: The Tree-to-Tree Correction Problem. Journal of the ACM 26(3), 422–433 (1979)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Vilares, M., Dion, B.A.: Efficient incremental parsing for context-free languages. In: Proc. of the 5th IEEE Int. Conf. on Computer Languages, Toulouse, France, pp. 241–252 (1994)Google Scholar
  8. 8.
    Vilares, M., Ribadas, F.J., Darriba, V.M.: Approximate pattern matching in shared-forest. In: Ibrahim, M., Küng, J., Revell, N. (eds.) DEXA 2000. LNCS, vol. 1873, pp. 322–333. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  9. 9.
    Wagner, R.A., Fischer, M.J.: The string to string correction problem. Journal of the ACM 21(1), 168–173 (1974)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Zhang, K., Shasha, D., Wang, J.T.L.: Approximate tree matching in the presence of variable length don’t cares. Journal of Algorithms 16(1), 33–66 (1994)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Manuel Vilares
    • 1
  • Francisco J. Ribadas
    • 1
  • Jesús Vilares
    • 2
  1. 1.Computer Science DeptUniv of VigoOrenseSpain
  2. 2.Computer Science DeptUniv of A CoruñaA CoruñaSpain

Personalised recommendations