Skip to main content

Efficient Algorithms for Finding Submasses in Weighted Strings

  • Conference paper
Combinatorial Pattern Matching (CPM 2004)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3109))

Included in the following conference series:

Abstract

We study the Submass Finding Problem: Given a string s over a weighted alphabet, i.e., an alphabet Σ with a weight function \(\mu:\Sigma \to {\mathbb N}\), decide for an input mass M whether s has a substring whose weights sum up to M. If M is indeed a submass, then we want to find one or all occurrences of such substrings. We present efficient algorithms for both the decision and the search problem. Furthermore, our approach allows us to compute efficiently the number of different submasses of s.

The main idea of our algorithms is to define appropriate polynomials such that we can determine the solution for the Submass Finding Problem from the coefficients of the product of these polynomials. We obtain very efficient running times by using Fast Fourier Transform to compute this product. Our main algorithm for the decision problem runs in time \({\mathcal O}({\mu_s} \log {\mu_s})\), where μ s is the total mass of string s. Employing standard methods for compressing sparse polynomials, this runtime can be viewed as \({\mathcal O}({\sigma}(s)\log^2 {\sigma}(s))\), where σ(s) denotes the number of different submasses of s. In this case, the runtime is independent of the size of the individual masses of characters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Edwards, N., Lippert, R.: Generating peptide candidates from amino-acid sequence databases for protein identification via mass spectrometry. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 68–81. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  2. Lu, B., Chen, T.: A suffix tree approach to the interpretation of tandem mass spectra: Applications to peptides of non-specific digestion and post-translational modifications. In: Bioinformatics Suppl. 2 (ECCB), pp.II113–II121 (2003)

    Google Scholar 

  3. Cieliebak, M., Erlebach, T., Lipták, Z., Stoye, J., Welzl, E.: Algorithmic complexity of protein identification: Combinatorics of weighted strings. In: DAM (2004), pp. 27–46 (2004)

    Google Scholar 

  4. Wilf, H.: generatingfunctionology. Academic Press, London (1990)

    MATH  Google Scholar 

  5. Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Proc. of 34th STOC (2002)

    Google Scholar 

  6. Benson, G.: Composition alignment. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 447–461. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  7. Böcker, S.: Sequencing from compomers: Using mass spectrometry for DNA denovo sequencing of 200+ nt. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 476–497. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  8. Böcker, S.: SNP and mutation discovery using base-specific cleavage and MALDITOF mass spectrometry. In: Bioinformatics, Suppl. 1, ISMB, pp.i44–i53 (2003)

    Google Scholar 

  9. Salomaa, A.: Counting (scattered) subwords. In: EATCS 81, pp. 165–179 (2003)

    Google Scholar 

  10. Eres, R., Landau, G.M., Parida, L.: A combinatorial approach to automatic discovery of cluster-patterns. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 139–150. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  11. Apostolico, A., Landau, G., Satta, G.: Efficient text fingerprinting via Parikh mapping. J. of Discrete Algorithms(to appear)

    Google Scholar 

  12. Didier, G.: Common intervals of two sequences. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 17–24. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  13. Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation 19(90), 297–301 (1965)

    Article  MATH  MathSciNet  Google Scholar 

  14. Demaine, E.D., Mitchell, J.S.B., O’Rourke, J.: The open problems project (2004), http://cs.smith.edu/orourke/TOPP/

  15. Erickson, J.: Lower bounds for linear satisfiability problems. In: Proc. of 6th SODA, pp. 388–395 (1995)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bansal, N., Cieliebak, M., Lipták, Z. (2004). Efficient Algorithms for Finding Submasses in Weighted Strings. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusoz, U. (eds) Combinatorial Pattern Matching. CPM 2004. Lecture Notes in Computer Science, vol 3109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27801-6_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-27801-6_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-22341-2

  • Online ISBN: 978-3-540-27801-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics