Advertisement

Complexities of the Centre and Median String Problems

  • François Nicolas
  • Eric Rivals
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2676)

Abstract

Given a finite set of strings, the median string problem consists in finding a string that minimizes the sum of the distances to the strings in the set. Approximations of the median string are used in a very broad range of applications where one needs a representative string that summarizes common information to the strings of the set. It is the case in Classification, in Speech and Pattern Recognition, and in Computational Biology. In the latter, Median String is related to the key problem of Multiple Alignment. In the recent literature, one finds a theorem stating the NP-completeness of the median string for unbounded alphabets. However, in the above mentioned areas, the alphabet is often finite. Thus, it remains a crucial question whether the median string problem is NP-complete for finite and even binary alphabets. In this work, we provide an answer to this question and also give the complexity of the related centre string problem. Moreover, we study the parametrized complexity of both problems with respect to the number of input strings.

Keywords

Edit Distance Input String Polynomial Time Approximation Scheme Levenshtein Distance Binary Alphabet 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    C. de la Higuera and F. Casacuberta. Topology of strings: Median string is NP-complete. Theoretical Computer Science, 230:39–48, 2000.zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    X. Deng, G. Li, Z. Li, B. Ma, and L. Wang. A ptas for distinguishing (sub)string selection. In ICALP, pages 740–751, 2002.Google Scholar
  3. 3.
    R. G. Downey and M. R. Fellows. Parameterized Complexity. Springer, 1999.Google Scholar
  4. 4.
    Michael R. Fellows, Jens Gramm, and Rolf Niedermeier. On the parameterized intractability of CLOSEST SUBSTRING and related problems. In Symposium on Theoretical Aspects of Computer Science, pages 262–273, 2002.Google Scholar
  5. 5.
    Jens Gramm, Rolf Niedermeier, and Peter Rossmanith. Exact solutions for CLOSEST STRING and related problems. In ISAAC, volume 2223 of LCNS, pages 441–453, 2001.MathSciNetGoogle Scholar
  6. 6.
    Dan Gusfield. Efficient methods for multiple sequence alignment with guaranteed error bounds. Bull. Math. Biol., 55:141–154, 1993.zbMATHGoogle Scholar
  7. 7.
    Dan Gusfield. Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge University Press, 1997.Google Scholar
  8. 8.
    Tao Jiang, Eugene L. Lawler, and Lusheng Wang. Approximation algorithms for tree alignment with a given phylogeny. Algorithmica, 16(3):302–315, 1996.zbMATHMathSciNetGoogle Scholar
  9. 9.
    T. Kohonen. Median strings. Pattern Recognition Letters, 3:309–313, 1985.CrossRefGoogle Scholar
  10. 10.
    J. Lanctot, M. Li, B. Ma, S. Wang, and L. Zhang. Distinguishing string selection problems. In SODA: ACM-SIAM Symposium on Discrete Algorithms, 1999.Google Scholar
  11. 11.
    V. I. Levenshtein. Binary codes capable of correcting deletions, insertions and Reverseals. Cybernetics and Control Theory, 10(8):707–710, 1966.MathSciNetGoogle Scholar
  12. 12.
    M. Li, B. Ma, and L. Wang. On the closest string and substing problems. Journal of the ACM, 49(2):157–171, 2002.CrossRefMathSciNetGoogle Scholar
  13. 13.
    Ming Li, Bin Ma, and Lusheng Wang. Finding similar regions in many strings. In Proceedings of the 31st Annual ACM Symposium on Theory of Computing (STOC’99), pages 473–482, 1999.Google Scholar
  14. 14.
    Bin Ma. A polynomial time approximation scheme for the closest substring problem. In CPM, volume 1848 of LNCS, pages 99–107, 2000.Google Scholar
  15. 15.
    D. Maier. The complexity of some problems on subsequences and supersequences. Journal of the Association for Computing Machinery, 25:322–336, 1978.zbMATHMathSciNetGoogle Scholar
  16. 16.
    L. Marsan and M. F. Sagot. Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. J Comput Biol, 7(3–4):345–62, 2000.CrossRefGoogle Scholar
  17. 17.
    C. D. Martinez, A. Juan, and F. Casacuberta. Improving classification using median string and nn rules. In Spanish Symp. on Pattern Recognition and Image Analysis, pages 391–395, 2001.Google Scholar
  18. 18.
    C. D. Martinez-Hinarejos, A. Juan, and F. Casacuberta. Use of median string for classification. In 15th International Conference on Pattern Recognition, volume 2, pages 907–910, september 2000.Google Scholar
  19. 19.
    Pavel Pevzner. Computational Molecular Biology. MIT Press, 2000.Google Scholar
  20. 20.
    Krzysztof Pietrzak. On the parameterized complexity of the fixed alphabet shortest common supersequence and longest common subsequence problems. Journal of Computer and System Sciences, 2003. to appear.Google Scholar
  21. 21.
    J. S. Sim and K. Park. The consensus string problem for a metric is NP-complete. In R. Raman and J. Simpson, editors, Proceedings of the 10th Australasian Workshop On Combinatorial Algorithms, pages 107–113, Perth, WA, Australia, 1999.Google Scholar
  22. 22.
    David J. States and Pankaj Agarwal. Compact encoding strategies for DNA sequence similarity search. In Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology, pages 211–217. AAAI Press, 1996.Google Scholar
  23. 23.
    Robert A. Wagner and Michael J. Fischer. The string-to-string correction problem. Journal of the ACM (JACM), 21(1):168–173, 1974.zbMATHCrossRefMathSciNetGoogle Scholar
  24. 24.
    L. Wang and D. Gusfield. Improved approximation algorithms for tree alignment. J. Algorithms, 25(2):255–273, 1997.zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • François Nicolas
    • 1
  • Eric Rivals
    • 1
  1. 1.L.I.R.M.M., CNRS U.M.R. 5506Montpellier Cedex 5France

Personalised recommendations