Faster Algorithm for the Set Variant of the String Barcoding Problem

  • Leszek Gąsieniec
  • Cindy Y. Li
  • Meng Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5029)


A string barcoding problem is defined as to find a minimum set of substrings that distinguish between all strings in a given set of strings \({\cal S}\). In a biological sense the given strings represent a set of genomic sequences and the substrings serve as probes in a hybridisation experiment. In this paper, we study a variant of the string barcoding problem in which the substrings have to be chosen from a particular set of substrings of cardinality n. This variant can be also obtained from more general test set problem, see, e.g., [1] by fixing appropriate parameters. We present almost optimal \(O(n|{\cal S}|\log^3 n)\)-time approximation algorithm for the considered problem. Our approximation procedure is a modification of the algorithm due to Berman et al. [1] which obtains the best possible approximation ratio (1 + ln n), providing \(NP\not\subseteq DTIME(n^{\log\log n})\). The improved time complexity is a direct consequence of more careful management of processed sets, use of several specialised graph and string data structures as well as tighter time complexity analysis based on an amortised argument.


Equivalence Class Time Complexity Equivalence Relation Span Tree Approximation Ratio 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Berman, P., DasGupta, B., Kao, M.Y.: Tight approximability results for test set problems in bioinformatics. Journal of Computer and System Sciences 71(2), 145–162 (2005)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Borneman, J., Chrobak, M., Vedova, G.D., Figueroa, A., Jiang, T.: Probe selection algorithms with applications in the analysis of microbial communities. Bioinformatics 17, 39–48 (2001)Google Scholar
  3. 3.
    DasGupta, B., Konwar, K.M., Mandoiu, I.I., Shvartsman, A.A.: Dna-bar: distinguisher selection for dna barcoding. Bioinformatics 21(16), 3424–3426 (2005)CrossRefGoogle Scholar
  4. 4.
    DasGupta, B., Konwar, K.M., Mandoiu, I.I., Shvartsman, A.A.: Highly scalable algorithms for robust string barcoding. International Journal of Bioinformatics Research and Applications 1(2), 145–161 (2005)CrossRefGoogle Scholar
  5. 5.
    Gerhold, D., Rushmore, T., Caskey, C.T.: DNA chips: promising toys have become powerful tools. Trends Biochem. Sci. 24(5), 168–173 (1999)CrossRefGoogle Scholar
  6. 6.
    Karp, R.M., Miller, R.E., Rosenberg, A.L.: Rapid identification of repeated patterns in strings, trees and arrays. In: Proc. 4th Symposium on Theory of Computing (STOC), pp. 125–136 (1972)Google Scholar
  7. 7.
    Klau, G.W., Rahmann, S., Schliep, A., Vingron, M., Reinert, K.: Optimal robust non-unique probe selection using Integer Linear Programming. Bioinformatics 20, 186–193 (2004)CrossRefGoogle Scholar
  8. 8.
    Lancia, G., Rizzi, R.: The approximability of the string barcoding problem. Algorithms for Molecular Biology 1(12), 1–7 (2006)Google Scholar
  9. 9.
    Rash, S., Gusfield, D.: String Barcoding: Uncovering Optimal Virus Signatures. In: Proc. 6th Annual International Conference on Research in Computational Molecular Biology (RECOMB), pp. 254–261 (2002)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Leszek Gąsieniec
    • 1
  • Cindy Y. Li
    • 2
  • Meng Zhang
    • 3
  1. 1.Department of Computer ScienceUniversity of LiverpoolLiverpoolUK
  2. 2.Histocompatibility and Immunogenetics LaboratoryNational Blood ServiceBristolUK
  3. 3.College of Computer Science and TechnologyJilin UniversityChangchunChina

Personalised recommendations