Noisy Data Make the Partial Digest Problem NP-hard

  • Mark Cieliebak
  • Stephan Eidenbenz
  • Paolo Penna
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2812)


The problem to find the coordinates of n points on a line such that the pairwise distances of the points form a given multi-set of \(n \choose 2\) distances is known as Partial Digest problem, which occurs for instance in DNA physical mapping and de novo sequencing of proteins. Although Partial Digest was – as a combinatorial problem – already proposed in the 1930’s, its computational complexity is still unknown.

In an effort to model real-life data, we introduce two optimization variations of Partial Digest that model two different error types that occur in real-life data. First, we study the computational complexity of a minimization version of Partial Digest in which only a subset of all pairwise distances is given and the rest are lacking due to experimental errors. We show that this variation is NP-hard to solve exactly. This result answers an open question posed by Pevzner (2000). We then study a maximization version of Partial Digest where a superset of all pairwise distances is given, with some additional distances due to inaccurate measurements. We show that this maximization version is NP-hard to approximate to within a factor of \(|D|^{\frac{1}{2} -\varepsilon}\) for any ε >0, where |D| is the number of input distances. This inapproximability result is tight up to low-order terms as we give a trivial approximation algorithm that achieves a matching approximation ratio.


Approximation Ratio Collision Induce Dissociation Pairwise Distance Maximum Clique Hardness Result 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alizadeh, F., Karp, R.M., Newberg, L.A., Weisser, D.K.: Physical mapping of chromosomes: A combinatorial problem in molecular biology. In: Symposium on Discrete Algorithms, pp. 371–381 (1993)Google Scholar
  2. 2.
    Arora, S., Lund, C.: Hardness of approximations. In: Hochbaum, D. (ed.) Approximation Algorithms for NP-Hard Problems, pp. 399–446. PWS Publishing Company (1996)Google Scholar
  3. 3.
    Bafna, V., Edwards, N.: On de novo interpretation of tandem mass spectra for peptide identification. In: 7th Annual International Conference on Computational Biology (RECOMB 2003), pp. 9–18 (2003)Google Scholar
  4. 4.
    Baginsky, S.: Personal communication (2003)Google Scholar
  5. 5.
    Błażewicz, J., Formanowicz, P., Kasprzak, M., Jaroszewski, M., Markiewicz, W.T.: Construction of DNA restriction maps based on a simplified experiment. Bioinformatics 17(5), 398–404 (2001)CrossRefGoogle Scholar
  6. 6.
    Chen, T., Kao, M., Tepel, M., Rush, J., Church, G.M.: A dynamic programming approacht to de novo peptide sequencing via tandem mass spectrometry. In: 11th SIAM-ACM Symposium on Discrete Algorithms (SODA), pp. 389–398 (2000)Google Scholar
  7. 7.
    Cieliebak, M., Eidenbenz, S.: Measurement errors make the partial digest problem np-hard, manuscript, to be published (2003)Google Scholar
  8. 8.
    Dakić, T.: On the turnpike problem. PhD thesis, Simon Fraser University (2000)Google Scholar
  9. 9.
    Dix, T.I., Kieronska, D.H.: Errors between sites in restriction site mapping. Computer Applications in the Biosciences (CABIOS) 4(1), 117–123 (1988)Google Scholar
  10. 10.
    Fasulo, D.: Algorithms for DNA Restriction Mapping. PhD thesis, University of Washington (2000)Google Scholar
  11. 11.
    Fütterer, J.: Personal communication (2002)Google Scholar
  12. 12.
    Håstad, J.: Clique is hard to approximate within n11 − ε. In: Proc. of the Symposium on Foundations of Computer Science (1996)Google Scholar
  13. 13.
    Inglehart, J., Nelson, P.C.: On the limitations of automated restriction mapping. Computer Applications in the Biosciences (CABIOS) 10(3), 249–261 (1994)Google Scholar
  14. 14.
    James, P.: Proteome Research: Mass Spectrometry. Springer, Heidelberg (2001)Google Scholar
  15. 15.
    Lemke, P., Werman, M.: On the complexity of inverting the autocorrelation function of a finite integer sequence, and the problem of locating n points on a line, given the \((^n_2)\) unlabelled distances between them. Preprint 453, Institute for Mathematics and its Application IMA (1988)Google Scholar
  16. 16.
    Newberg, L., Naor, D.: A lower bound on the number of solutions to the probed partial digest problem. Advances in Applied Mathematics (ADVAM) 14, 172–183 (1993)zbMATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Pandurangan, G., Ramesh, H.: The restriction mapping problem revisited. Journal of Computer and System Sciences (JCSS) (2002) (to appear); Special issue on Computational BiologyGoogle Scholar
  18. 18.
    Pevzner, P.: Computational Molecular Biology. MIT Press, Cambridge (2000)zbMATHGoogle Scholar
  19. 19.
    Pevzner, P.A., Waterman, M.S.: Open combinatorial problems in computational molecular biology. In: Proc. of the Third Israel Symposium on Theory of Computing and Systems ISTCS, pp. 158–173. IEEE Computer Society Press, Los Alamitos (1995)CrossRefGoogle Scholar
  20. 20.
    Rosenblatt, J., Seymour, P.: The structure of homometric sets. SIAM Journal of Algorithms and Discrete Mathematics 3(3), 343–350 (1982)zbMATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Searls, D.B.: Formal grammars for intermolecular structure. In: Proceedings of the International IEEE Symposium on Intelligence in Neural and Biological Systems (1995)Google Scholar
  22. 22.
    Setubal, J., Meidanis, J.: Introduction to Computational Molecular Biology. PWS, Boston (1997)Google Scholar
  23. 23.
    Skiena, S.S., Smith, W., Lemke, P.: Reconstructing sets from interpoint distances. In: Sixth ACM Symposium on Computational Geometry, pp. 332–339 (1990)Google Scholar
  24. 24.
    Skiena, S.S., Sundaram, G.: A partial digest approach to restriction site mapping. Bulletin of Mathematical Biology 56, 275–294 (1994)zbMATHGoogle Scholar
  25. 25.
    Woeginger, G.J., Yu, Z.L.: On the equal-subset-sum problem. Information Processing Letters 42, 299–302 (1992)zbMATHCrossRefMathSciNetGoogle Scholar
  26. 26.
    Wright, L.W., Lichter, J.B., Reinitz, J., Shifman, M.A., Kidd, K.K., Miller, P.L.: Computer-assisted restriction mapping: an integrated approach to handling experimental uncertainty. Computer Applications in the Biosciences (CABIOS) 10(4), 435–442 (1994)Google Scholar
  27. 27.
    Zhang, Z.: An Exponential Example for a Partial Digest Mapping Algorithm. Journal of Computational Biology 1(3), 235–239 (1994)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Mark Cieliebak
    • 1
  • Stephan Eidenbenz
    • 2
  • Paolo Penna
    • 1
  1. 1.Institute of Theoretical Computer ScienceETH Zurich 
  2. 2.Los Alamos National Laboratory 

Personalised recommendations