Skip to main content

Noisy Data Make the Partial Digest Problem NP-hard

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 2812))

Abstract

The problem to find the coordinates of n points on a line such that the pairwise distances of the points form a given multi-set of \(n \choose 2\) distances is known as Partial Digest problem, which occurs for instance in DNA physical mapping and de novo sequencing of proteins. Although Partial Digest was – as a combinatorial problem – already proposed in the 1930’s, its computational complexity is still unknown.

In an effort to model real-life data, we introduce two optimization variations of Partial Digest that model two different error types that occur in real-life data. First, we study the computational complexity of a minimization version of Partial Digest in which only a subset of all pairwise distances is given and the rest are lacking due to experimental errors. We show that this variation is NP-hard to solve exactly. This result answers an open question posed by Pevzner (2000). We then study a maximization version of Partial Digest where a superset of all pairwise distances is given, with some additional distances due to inaccurate measurements. We show that this maximization version is NP-hard to approximate to within a factor of \(|D|^{\frac{1}{2} -\varepsilon}\) for any ε >0, where |D| is the number of input distances. This inapproximability result is tight up to low-order terms as we give a trivial approximation algorithm that achieves a matching approximation ratio.

A preliminary version of this paper has been published as Technical Report 381, ETH Zurich, Department of Computer Science, October 2002.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alizadeh, F., Karp, R.M., Newberg, L.A., Weisser, D.K.: Physical mapping of chromosomes: A combinatorial problem in molecular biology. In: Symposium on Discrete Algorithms, pp. 371–381 (1993)

    Google Scholar 

  2. Arora, S., Lund, C.: Hardness of approximations. In: Hochbaum, D. (ed.) Approximation Algorithms for NP-Hard Problems, pp. 399–446. PWS Publishing Company (1996)

    Google Scholar 

  3. Bafna, V., Edwards, N.: On de novo interpretation of tandem mass spectra for peptide identification. In: 7th Annual International Conference on Computational Biology (RECOMB 2003), pp. 9–18 (2003)

    Google Scholar 

  4. Baginsky, S.: Personal communication (2003)

    Google Scholar 

  5. Błażewicz, J., Formanowicz, P., Kasprzak, M., Jaroszewski, M., Markiewicz, W.T.: Construction of DNA restriction maps based on a simplified experiment. Bioinformatics 17(5), 398–404 (2001)

    Article  Google Scholar 

  6. Chen, T., Kao, M., Tepel, M., Rush, J., Church, G.M.: A dynamic programming approacht to de novo peptide sequencing via tandem mass spectrometry. In: 11th SIAM-ACM Symposium on Discrete Algorithms (SODA), pp. 389–398 (2000)

    Google Scholar 

  7. Cieliebak, M., Eidenbenz, S.: Measurement errors make the partial digest problem np-hard, manuscript, to be published (2003)

    Google Scholar 

  8. Dakić, T.: On the turnpike problem. PhD thesis, Simon Fraser University (2000)

    Google Scholar 

  9. Dix, T.I., Kieronska, D.H.: Errors between sites in restriction site mapping. Computer Applications in the Biosciences (CABIOS) 4(1), 117–123 (1988)

    Google Scholar 

  10. Fasulo, D.: Algorithms for DNA Restriction Mapping. PhD thesis, University of Washington (2000)

    Google Scholar 

  11. Fütterer, J.: Personal communication (2002)

    Google Scholar 

  12. Håstad, J.: Clique is hard to approximate within n11 − ε. In: Proc. of the Symposium on Foundations of Computer Science (1996)

    Google Scholar 

  13. Inglehart, J., Nelson, P.C.: On the limitations of automated restriction mapping. Computer Applications in the Biosciences (CABIOS) 10(3), 249–261 (1994)

    Google Scholar 

  14. James, P.: Proteome Research: Mass Spectrometry. Springer, Heidelberg (2001)

    Google Scholar 

  15. Lemke, P., Werman, M.: On the complexity of inverting the autocorrelation function of a finite integer sequence, and the problem of locating n points on a line, given the \((^n_2)\) unlabelled distances between them. Preprint 453, Institute for Mathematics and its Application IMA (1988)

    Google Scholar 

  16. Newberg, L., Naor, D.: A lower bound on the number of solutions to the probed partial digest problem. Advances in Applied Mathematics (ADVAM) 14, 172–183 (1993)

    Article  MATH  MathSciNet  Google Scholar 

  17. Pandurangan, G., Ramesh, H.: The restriction mapping problem revisited. Journal of Computer and System Sciences (JCSS) (2002) (to appear); Special issue on Computational Biology

    Google Scholar 

  18. Pevzner, P.: Computational Molecular Biology. MIT Press, Cambridge (2000)

    MATH  Google Scholar 

  19. Pevzner, P.A., Waterman, M.S.: Open combinatorial problems in computational molecular biology. In: Proc. of the Third Israel Symposium on Theory of Computing and Systems ISTCS, pp. 158–173. IEEE Computer Society Press, Los Alamitos (1995)

    Chapter  Google Scholar 

  20. Rosenblatt, J., Seymour, P.: The structure of homometric sets. SIAM Journal of Algorithms and Discrete Mathematics 3(3), 343–350 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  21. Searls, D.B.: Formal grammars for intermolecular structure. In: Proceedings of the International IEEE Symposium on Intelligence in Neural and Biological Systems (1995)

    Google Scholar 

  22. Setubal, J., Meidanis, J.: Introduction to Computational Molecular Biology. PWS, Boston (1997)

    Google Scholar 

  23. Skiena, S.S., Smith, W., Lemke, P.: Reconstructing sets from interpoint distances. In: Sixth ACM Symposium on Computational Geometry, pp. 332–339 (1990)

    Google Scholar 

  24. Skiena, S.S., Sundaram, G.: A partial digest approach to restriction site mapping. Bulletin of Mathematical Biology 56, 275–294 (1994)

    MATH  Google Scholar 

  25. Woeginger, G.J., Yu, Z.L.: On the equal-subset-sum problem. Information Processing Letters 42, 299–302 (1992)

    Article  MATH  MathSciNet  Google Scholar 

  26. Wright, L.W., Lichter, J.B., Reinitz, J., Shifman, M.A., Kidd, K.K., Miller, P.L.: Computer-assisted restriction mapping: an integrated approach to handling experimental uncertainty. Computer Applications in the Biosciences (CABIOS) 10(4), 435–442 (1994)

    Google Scholar 

  27. Zhang, Z.: An Exponential Example for a Partial Digest Mapping Algorithm. Journal of Computational Biology 1(3), 235–239 (1994)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Cieliebak, M., Eidenbenz, S., Penna, P. (2003). Noisy Data Make the Partial Digest Problem NP-hard . In: Benson, G., Page, R.D.M. (eds) Algorithms in Bioinformatics. WABI 2003. Lecture Notes in Computer Science(), vol 2812. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39763-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39763-2_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20076-5

  • Online ISBN: 978-3-540-39763-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics