Efficient Algorithms for Finding Submasses in Weighted Strings

Bansal, Nikhil; Cieliebak, Mark; Lipták, Zsuzsanna

doi:10.1007/978-3-540-27801-6_14

Nikhil Bansal¹⁸,
Mark Cieliebak^19,20 &
Zsuzsanna Lipták²¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3109))

Included in the following conference series:

Annual Symposium on Combinatorial Pattern Matching

609 Accesses
2 Citations

Abstract

We study the Submass Finding Problem: Given a string s over a weighted alphabet, i.e., an alphabet Σ with a weight function \(\mu:\Sigma \to {\mathbb N}\), decide for an input mass M whether s has a substring whose weights sum up to M. If M is indeed a submass, then we want to find one or all occurrences of such substrings. We present efficient algorithms for both the decision and the search problem. Furthermore, our approach allows us to compute efficiently the number of different submasses of s.

The main idea of our algorithms is to define appropriate polynomials such that we can determine the solution for the Submass Finding Problem from the coefficients of the product of these polynomials. We obtain very efficient running times by using Fast Fourier Transform to compute this product. Our main algorithm for the decision problem runs in time \({\mathcal O}({\mu_s} \log {\mu_s})\), where μ _s is the total mass of string s. Employing standard methods for compressing sparse polynomials, this runtime can be viewed as \({\mathcal O}({\sigma}(s)\log^2 {\sigma}(s))\), where σ(s) denotes the number of different submasses of s. In this case, the runtime is independent of the size of the individual masses of characters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Edwards, N., Lippert, R.: Generating peptide candidates from amino-acid sequence databases for protein identification via mass spectrometry. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 68–81. Springer, Heidelberg (2002)
Chapter Google Scholar
Lu, B., Chen, T.: A suffix tree approach to the interpretation of tandem mass spectra: Applications to peptides of non-specific digestion and post-translational modifications. In: Bioinformatics Suppl. 2 (ECCB), pp.II113–II121 (2003)
Google Scholar
Cieliebak, M., Erlebach, T., Lipták, Z., Stoye, J., Welzl, E.: Algorithmic complexity of protein identification: Combinatorics of weighted strings. In: DAM (2004), pp. 27–46 (2004)
Google Scholar
Wilf, H.: generatingfunctionology. Academic Press, London (1990)
MATH Google Scholar
Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Proc. of 34th STOC (2002)
Google Scholar
Benson, G.: Composition alignment. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 447–461. Springer, Heidelberg (2003)
Chapter Google Scholar
Böcker, S.: Sequencing from compomers: Using mass spectrometry for DNA denovo sequencing of 200+ nt. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 476–497. Springer, Heidelberg (2003)
Chapter Google Scholar
Böcker, S.: SNP and mutation discovery using base-specific cleavage and MALDITOF mass spectrometry. In: Bioinformatics, Suppl. 1, ISMB, pp.i44–i53 (2003)
Google Scholar
Salomaa, A.: Counting (scattered) subwords. In: EATCS 81, pp. 165–179 (2003)
Google Scholar
Eres, R., Landau, G.M., Parida, L.: A combinatorial approach to automatic discovery of cluster-patterns. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 139–150. Springer, Heidelberg (2003)
Chapter Google Scholar
Apostolico, A., Landau, G., Satta, G.: Efficient text fingerprinting via Parikh mapping. J. of Discrete Algorithms(to appear)
Google Scholar
Didier, G.: Common intervals of two sequences. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 17–24. Springer, Heidelberg (2003)
Chapter Google Scholar
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation 19(90), 297–301 (1965)
Article MATH MathSciNet Google Scholar
Demaine, E.D., Mitchell, J.S.B., O’Rourke, J.: The open problems project (2004), http://cs.smith.edu/orourke/TOPP/
Erickson, J.: Lower bounds for linear satisfiability problems. In: Proc. of 6th SODA, pp. 388–395 (1995)
Google Scholar

Download references

Author information

Authors and Affiliations

IBM Research, T.J. Watson Research Center, P.O. Box 218, Yorktown Heights, NY, 10598, USA
Nikhil Bansal
Institute of Theoretical Computer Science, ETH Zurich, Clausiusstr. 49, CH-8092, Zurich
Mark Cieliebak
Center for Web Research, Department of Computer Science, University of Chile,
Mark Cieliebak
AG Genominformatik, Technische Fakultät, Universität Bielefeld, Postfach 10 01 31, D-33592, Bielefeld
Zsuzsanna Lipták

Authors

Nikhil Bansal
View author publications
You can also search for this author in PubMed Google Scholar
Mark Cieliebak
View author publications
You can also search for this author in PubMed Google Scholar
Zsuzsanna Lipták
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing Science, Simon Fraser University, 8888 University Drive, V5A 1S6, Burnaby, BC, Canada
Suleyman Cenk Sahinalp
Google Inc., 76 9th Av, 4th Fl., 10011, New York, NY
S. Muthukrishnan
Tom Sawyer Software, 94612, Oakland, CA, USA
Ugur Dogrusoz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bansal, N., Cieliebak, M., Lipták, Z. (2004). Efficient Algorithms for Finding Submasses in Weighted Strings. In: Sahinalp, S.C., Muthukrishnan, S., Dogrusoz, U. (eds) Combinatorial Pattern Matching. CPM 2004. Lecture Notes in Computer Science, vol 3109. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-27801-6_14

Download citation

DOI: https://doi.org/10.1007/978-3-540-27801-6_14
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22341-2
Online ISBN: 978-3-540-27801-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics