Sequencing from Compomers: Using Mass Spectrometry for DNA De-Novo Sequencing of 200+ nt

  • Sebastian Böcker
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2812)


One of the main endeavors in today’s Life Science remains the efficient sequencing of long DNA molecules. Today, most de-novo sequencing of DNA is still performed using electrophoresis-based Sanger Sequencing, based on the Sanger concept of 1977. Methods using mass spectrometry to acquire the Sanger Sequencing data are limited by short sequencing lengths of 15–25 nt.

We propose a new method for DNA sequencing using base-specific cleavage and mass spectrometry, that appears to be a promising alternative to classical DNA sequencing approaches. A single stranded DNA or RNA molecule is cleaved by a base-specific (bio-)chemical reaction using, for example, RNAses. The cleavage reaction is modified such that not all, but only a certain percentage of those bases are cleaved. The resulting mixture of fragments is then analyzed using MALDI-TOF mass spectrometry, whereby we acquire the molecular masses of fragments. For every peak in the mass spectrum, we calculate those base compositions that will potentially create a peak of the observed mass and, repeating the cleavage reaction for all four bases, finally try to uniquely reconstruct the underlying sequence from these observed spectra. This leads us to the combinatorial problem of Sequencing From Compomers and, finally, to the graph-theoretical problem of finding a walk in a subgraph of the de Bruijn graph. Application of this method to simulated data indicates that it might be capable of sequencing DNA molecules with 200+ nt.


Sample Sequence Mass Spectrometry Data Cleavage Reaction Sequence Candidate Edge Transition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Autebert, J.-M., Berstel, J., Boasson, L.: Context-free languages and pushdown automata. In: Rozenberg, G., Salomaa, A. (eds.) Handbook of Formal Languages, vol. 1, pp. 111–174. Springer, Heidelberg (1997)Google Scholar
  2. 2.
    Bains, W., Smith, G.C.: A novel method for nucleic acid sequence determination. J. Theor. Biol. 135, 303–307 (1988)CrossRefGoogle Scholar
  3. 3.
    Böcker, S.: SNP and mutation discovery using base-specific cleavage and MALDI-TOF mass spectrometry. Bioinformatics 19, i44–i53 (2003)Google Scholar
  4. 4.
    Cosnard, M., Duprat, J., Ferreira, A.G.: The complexity of searching in X + Y and other multisets. Information Processing Letters 34, 103–109 (1990)zbMATHCrossRefMathSciNetGoogle Scholar
  5. 5.
    Danćik, V., Addona, T.A., Clauser, K.R., Vath, J.E., Pevzner, P.A.: De novo peptide sequencing via tandem mass spectrometry. J. Comp. Biol. 6(3/4), 327–342 (1999)Google Scholar
  6. 6.
    de Bruijn, N.G.: A combinatorial problem. In: Indagationes Mathematicae. Koninklije Nederlandsche Akademie van Wetenschappen, vol. VIII, pp. 461–467 (1946)Google Scholar
  7. 7.
    Drmanac, R., Labat, I., Brukner, I., Crkvenjakov, R.: Sequencing a megabase plus DNA by hybridization: Theory of the method. Genomics 4, 114–128 (1989)CrossRefGoogle Scholar
  8. 8.
    Hartmer, R., Storm, N., Böcker, S., Rodi, C.P., Hillenkamp, F., Jurinke, C., van den Boom, D.: RNAse T1 mediated base-specific cleavage and MALDI-TOF MS for high-throughput comparative sequence analysis. Nucl. Acids. Res. 31(9), e47 (2003)Google Scholar
  9. 9.
    Karas, M., Hillenkamp, F.: Laser desorption ionization of proteins with molecular masses exceeding 10,000 Daltons. Anal. Chem. 60, 2299–2301 (1988)CrossRefGoogle Scholar
  10. 10.
    Köster, H., Tang, K., Fu, D.-J., Braun, A., van den Boom, D., Smith, C.L., Cotter, R.J., Cantor, C.R.: A strategy for rapid and efficient DNA sequencing by mass spectrometry. Nat. Biotechnol. 14(9), 1084–1087 (1996)CrossRefGoogle Scholar
  11. 11.
    Lysov, Y., Floretiev, V., Khorlyn, A., Khrapko, K., Shick, V., Mirzabekov, A.: DNA sequencing by hybridization with oligonucleotides. Dokl. Acad. Sci. USSR 303, 1508–1511 (1988)Google Scholar
  12. 12.
    Maxam, A.M., Gilbert, W.: A new method for sequencing DNA. Proc. Nat. Acad. Sci. USA 74(2), 560–564 (1977)CrossRefGoogle Scholar
  13. 13.
    Patterson, S.D., Aebersold, R.: Mass spectrometric approaches for the identification of gel-separated proteins. Electrophoresis 16, 1791–1814 (1995)CrossRefGoogle Scholar
  14. 14.
    Pearson, W.R.: Automatic construction of restriction site maps. Nucleic Acids Res. 10, 217–227 (1982)CrossRefGoogle Scholar
  15. 15.
    Pevzner, P.P.: l-tuple DNA sequencing: Computer analysis. J. Biomol. Struct. Dyn. 7, 63–73 (1989)Google Scholar
  16. 16.
    Reich, D.E., Cargill, M., Bolk, S., Ireland, J., Sabeti, P.C., Richter, D.J., Lavery, T., Kouyoumjian, R., Farhadian, S.F., Ward, R., Lander, E.S.: Linkage disequilibrium in the human genome. Nature 411, 199–204 (2001)CrossRefGoogle Scholar
  17. 17.
    Rodi, C.P., Darnhofer-Patel, B., Stanssens, P., Zabeau, M., van den Boom, D.: A strategy for the rapid discovery of disease markers using the MassARRAY system. BioTechniques 32, S62–S69 (2002)Google Scholar
  18. 18.
    Ronaghi, M., Uhlén, M., Nyrén, P.: Pyrosequencing: A DNA sequencing method based on real-time pyrophosphate detection. Science 281, 363–365 (1998)CrossRefGoogle Scholar
  19. 19.
    Sanger, F., Nicklen, S., Coulson, A.R.: DNA sequencing with chainterminating inhibitors. Proc. Nat. Acad. Sci. USA 74(12), 5463–5467 (1977)CrossRefGoogle Scholar
  20. 20.
    Shchepinov, M.S., Denissenko, M., Smylie, K.J., Wörl, R.J., Leppin, A.L., Cantor, C.R., Rodi, C.P.: Matrix-induced fragmentation of P3’-N5’ phosphoramidate-containing DNA: high-throughputMALDI-TOF analysis of genomic sequence polymorphisms. Nucleic Acids Res. 29(18), 3864–3872 (2001)CrossRefGoogle Scholar
  21. 21.
    Skiena, S., Smith, W.D., Lemke, P.: Reconstructing sets from interpoint distances. In: Proceedings of Annual symposium Computational geometry, pp. 332–339 (1990)Google Scholar
  22. 22.
    Skiena, S.S., Sundaram, G.: A partial digest approach to restriction site mapping. Bulletin of Mathematical Biology 56, 275–294 (1994)zbMATHGoogle Scholar
  23. 23.
    van Wintzingerode, F., Böcker, S., Schlötelburg, C., Chiu, N.H.L., Storm, N., Jurinke, C., Cantor, C.R., Göbel, U.B., van den Boom, D.: Base-specific fragmentation of amplified 16S rRNA genes and mass spectrometry analysis: A novel tool for rapid bacterial identification. Proc. Natl. Acad. Sci. USA 99(10), 7039–7044 (2002)CrossRefGoogle Scholar
  24. 24.
    Waterman, M.S.: Introduction to Computational Biology: Maps, sequences and genomes. Chapman & Hall–CRC Press, Boca Raton (1995)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2003

Authors and Affiliations

  • Sebastian Böcker
    • 1
  1. 1.AG Genominformatik, Technische FakultätUniversität BielefeldBielefeldGermany

Personalised recommendations