Skip to main content

Non-parametric and Semi-parametric Support Estimation Using SEquential RESampling Random Walks on Biomolecular Sequences

  • Conference paper
  • First Online:
Comparative Genomics (RECOMB-CG 2018)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 11183))

Included in the following conference series:

Abstract

Non-parametric and semi-parametric resampling procedures are widely used to perform support estimation in computational biology and bioinformatics. Among the most widely used methods in this class is the standard bootstrap method, which consists of random sampling with replacement. While not requiring assumptions about any particular parametric model for resampling purposes, the bootstrap and related techniques assume that sites are independent and identically distributed (i.i.d.). The i.i.d. assumption can be an over-simplification for many problems in computational biology and bioinformatics. In particular, sequential dependence within biomolecular sequences is often an essential biological feature due to biochemical function, evolutionary processes such as recombination, and other factors.

To relax the simplifying i.i.d. assumption, we propose a new non-parametric/semi-parametric sequential resampling technique that generalizes “Heads-or-Tails” mirrored inputs, a simple but clever technique due to Landan and Graur. The generalized procedure takes the form of random walks along either aligned or unaligned biomolecular sequences. We refer to our new method as the SERES (or “SEquential RESampling”) method.

To demonstrate the performance of the new technique, we apply SERES to estimate support for the multiple sequence alignment problem. Using simulated and empirical data, we show that SERES-based support estimation yields comparable or typically better performance compared to state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Benjamini, Y., Hochberg, Y.: Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Royal Stat. Soc. Ser. B (Methodol) 57(1), 289–300 (1995)

    MathSciNet  MATH  Google Scholar 

  2. Cannone, J.J., et al.: The Comparative RNA Web (CRW) site: an online database of comparative sequence and structure information for Ribosomal, Intron and Other RNAs. BMC Bioinform. 3(15) (2002). http://www.rna.ccbb.utexas.edu

  3. Daskalakis, C., Roch, S.: Alignment-free phylogenetic reconstruction. In: Berger, B. (ed.) RECOMB 2010. LNCS, vol. 6044, pp. 123–137. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-12683-3_9

    Chapter  Google Scholar 

  4. DeLong, E.R., DeLong, D.M., Clarke-Pearson, D.L.: Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 44(3), 837–845 (1988)

    Article  Google Scholar 

  5. Efron, B.: Bootstrap methods: another look at the jackknife. Ann. Stat. 7(1), 1–26 (1979)

    Article  MathSciNet  Google Scholar 

  6. Felsenstein, J.: Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39(4), 783–791 (1985)

    Article  Google Scholar 

  7. Fletcher, W., Yang, Z.: INDELible: a flexible simulator of biological sequence evolution. Mol. Biol. Evol. 26(8), 1879–1888 (2009)

    Article  Google Scholar 

  8. Katoh, K., Standley, D.M., Kazutaka Katoh and Daron: MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30(4), 772–780 (2013)

    Article  Google Scholar 

  9. Kim, J., Ma, J.: PSAR: measuring multiple sequence alignment reliability by probabilistic sampling. Nucleic Acids Res. 39(15), 6359–6368 (2011)

    Article  Google Scholar 

  10. Landan, G., Graur, D.: Heads or tails: a simple reliability check for multiple sequence alignments. Mol. Biol. Evol. 24(6), 1380–1383 (2007)

    Article  Google Scholar 

  11. Landan, G., Graur, D.: Local reliability measures from sets of co-optimal multiple sequence alignments. In: Biocomputing, pp. 15–24. World Scientific (2008)

    Google Scholar 

  12. Liu, K., et al.: SATé-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees. Syst. Biol. 61(1), 90–106 (2012)

    Google Scholar 

  13. Notredame, C., Higgins, D.G., Heringa, J.: T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000)

    Article  Google Scholar 

  14. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    Google Scholar 

  15. Penn, O., Privman, E., Landan, G., Graur, D., Pupko, T.: An alignment confidence score capturing robustness to guide tree uncertainty. Mol. Biol. Evol. 27(8), 1759–1767 (2010)

    Article  Google Scholar 

  16. Rodriguez, F., Oliver, J.L., Marin, A., Medina, J.R.: The general stochastic model of nucleotide substitution. J. Theor. Biol. 142, 485–501 (1990)

    Article  MathSciNet  Google Scholar 

  17. Sela, I., Ashkenazy, H., Katoh, K., Pupko, T.: GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res. 43(W1), W7–W14 (2015)

    Article  Google Scholar 

  18. Yang, Z., Rannala, B.: Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo method. Mol. Biol. Evol. 14(7), 717–724 (1997)

    Article  Google Scholar 

Download references

Acknowledgments

This work has been supported in part by the National Science Foundation (grant nos. CCF-1565719, CCF-1714417, and DEB-1737898 to KJL) and MSU faculty startup funds (to KJL). Computational experiments were performed using the High Performance Computing Center (HPCC) at MSU.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kevin J. Liu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 210 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, W., Smith, J., Hejase, H.A., Liu, K.J. (2018). Non-parametric and Semi-parametric Support Estimation Using SEquential RESampling Random Walks on Biomolecular Sequences. In: Blanchette, M., Ouangraoua, A. (eds) Comparative Genomics. RECOMB-CG 2018. Lecture Notes in Computer Science(), vol 11183. Springer, Cham. https://doi.org/10.1007/978-3-030-00834-5_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00834-5_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00833-8

  • Online ISBN: 978-3-030-00834-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics