Skip to main content

Core Column Prediction for Alignments

  • Chapter
  • First Online:
Parameter Advising for Multiple Sequence Alignment

Part of the book series: Computational Biology ((COBO,volume 26))

  • 508 Accesses

Abstract

In a computed multiple sequence alignment, the coreness of a column is the fraction of its substitutions that are in so-called core columns of the unknown reference alignment of the sequences, where the core columns of the reference alignment are those that are reliably correct. In the absence of knowing the reference alignment, the coreness of a column can only be estimated. This chapter describes the first method for estimating column coreness for protein multiple sequence alignments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    Adapted from publications [27, 29].

References

  1. Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23rd International Conference on Machine Learning (ICML), pp. 97–104 (2006)

    Google Scholar 

  2. Capella-Gutierrez, S., Silla-Martinez, J.M., Gabaldón, T.: trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25(15), 1972–1973 (2009)

    Google Scholar 

  3. Castresana, J.: Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17(4), 540–552 (2000)

    Article  Google Scholar 

  4. Chang, J.M., Tommaso, P.D., Notredame, C.: TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction. Mol. Biol. Evol. 31(6), 1625–1637 (2014)

    Google Scholar 

  5. DeBlasio, D.F., Kececioglu, J.D.: Facet: software for accuracy estimation of protein multiple sequence alignments (version 1.1) (2014). http://facet.cs.arizona.edu

  6. DeBlasio, D., Kececioglu, J.D.: Predicting core columns of protein multiple sequence alignments for improved parameter advising. In: Proceedings of the 16th Workshop on Algorithms in Bioinformatics (WABI), pp. 77–89 (2016)

    Google Scholar 

  7. DeBlasio, D., Kececioglu, J.D.: Core column prediction for protein multiple sequence alignments. Algorithms Mol. Biol. 12, 1–16 (2017)

    Article  Google Scholar 

  8. Dress, A.W., Flamm, C., Fritzsch, G., Grünewald, S., Kruspe, M., Prohaska, S.J., Stadler, P.F.: Noisy: identification of problematic columns in multiple sequence alignments. Algorithms Mol. Biol. 3(7), 1–10 (2008)

    Google Scholar 

  9. Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)

    Book  MATH  Google Scholar 

  10. Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. U. S. A. 89(22), 10915–10919 (1992)

    Article  Google Scholar 

  11. Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292(2), 195–202 (1999)

    Article  Google Scholar 

  12. Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: open source scientific tools for Python (2001). http://www.scipy.org/

  13. Kück, P., Meusemann, K., Dambach, J., Thormann, B., von Reumont, B.M., Wägele, J.W., Misof, B.: Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees. Front. Zool. 7(10), 1–12 (2010)

    Google Scholar 

  14. Moré, J.J., Sorensen, D.C., Hillstrom, K.E., Garbow, B.S.: The MINPACK project. In: Sources and Development of Mathematical Software, pp. 88–111. Prentice-Hall, Englewood Cliffs (1984)

    Google Scholar 

  15. Sela, I., Ashkenazy, H., Katoh, K., Pupko, T.: GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res. 43(W1), W7–W14 (2015)

    Google Scholar 

  16. Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)

    MATH  Google Scholar 

  17. Wheeler, T.J., Kececioglu, J.D.: Multiple alignment by aligning alignments. In: Proceedings of the 15th ISCB Conference on Intelligent Systems for Molecular Biology (ISMB), Bioinformatics, vol. 23(13), pp. i559–i568 (2007)

    Google Scholar 

  18. Wheeler, T.J., Kececioglu, J.D.: Opal: software for aligning multiple biological sequences (version 2.1.0) (2012). http://opal.cs.arizona.edu

  19. Woerner, A., Kececioglu, J.: Faster metric-space nearest-neighbor search using dispersion trees (2017). In preparation

    Google Scholar 

  20. Wu, M., Chatterji, S., Eisen, J.A.: Accounting for alignment uncertainty in phylogenomics. PLoS One 7(1), 1–10 (2012)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

DeBlasio, D., Kececioglu, J. (2017). Core Column Prediction for Alignments. In: Parameter Advising for Multiple Sequence Alignment. Computational Biology, vol 26. Springer, Cham. https://doi.org/10.1007/978-3-319-64918-4_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64918-4_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64917-7

  • Online ISBN: 978-3-319-64918-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics