Abstract
In a computed multiple sequence alignment, the coreness of a column is the fraction of its substitutions that are in so-called core columns of the unknown reference alignment of the sequences, where the core columns of the reference alignment are those that are reliably correct. In the absence of knowing the reference alignment, the coreness of a column can only be estimated. This chapter describes the first method for estimating column coreness for protein multiple sequence alignments.
References
Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: Proceedings of the 23rd International Conference on Machine Learning (ICML), pp. 97–104 (2006)
Capella-Gutierrez, S., Silla-Martinez, J.M., Gabaldón, T.: trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics 25(15), 1972–1973 (2009)
Castresana, J.: Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17(4), 540–552 (2000)
Chang, J.M., Tommaso, P.D., Notredame, C.: TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction. Mol. Biol. Evol. 31(6), 1625–1637 (2014)
DeBlasio, D.F., Kececioglu, J.D.: Facet: software for accuracy estimation of protein multiple sequence alignments (version 1.1) (2014). http://facet.cs.arizona.edu
DeBlasio, D., Kececioglu, J.D.: Predicting core columns of protein multiple sequence alignments for improved parameter advising. In: Proceedings of the 16th Workshop on Algorithms in Bioinformatics (WABI), pp. 77–89 (2016)
DeBlasio, D., Kececioglu, J.D.: Core column prediction for protein multiple sequence alignments. Algorithms Mol. Biol. 12, 1–16 (2017)
Dress, A.W., Flamm, C., Fritzsch, G., Grünewald, S., Kruspe, M., Prohaska, S.J., Stadler, P.F.: Noisy: identification of problematic columns in multiple sequence alignments. Algorithms Mol. Biol. 3(7), 1–10 (2008)
Durbin, R., Eddy, S.R., Krogh, A., Mitchison, G.: Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge (1998)
Henikoff, S., Henikoff, J.G.: Amino acid substitution matrices from protein blocks. Proc. Natl. Acad. Sci. U. S. A. 89(22), 10915–10919 (1992)
Jones, D.T.: Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292(2), 195–202 (1999)
Jones, E., Oliphant, T., Peterson, P., et al.: SciPy: open source scientific tools for Python (2001). http://www.scipy.org/
Kück, P., Meusemann, K., Dambach, J., Thormann, B., von Reumont, B.M., Wägele, J.W., Misof, B.: Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees. Front. Zool. 7(10), 1–12 (2010)
Moré, J.J., Sorensen, D.C., Hillstrom, K.E., Garbow, B.S.: The MINPACK project. In: Sources and Development of Mathematical Software, pp. 88–111. Prentice-Hall, Englewood Cliffs (1984)
Sela, I., Ashkenazy, H., Katoh, K., Pupko, T.: GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters. Nucleic Acids Res. 43(W1), W7–W14 (2015)
Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 207–244 (2009)
Wheeler, T.J., Kececioglu, J.D.: Multiple alignment by aligning alignments. In: Proceedings of the 15th ISCB Conference on Intelligent Systems for Molecular Biology (ISMB), Bioinformatics, vol. 23(13), pp. i559–i568 (2007)
Wheeler, T.J., Kececioglu, J.D.: Opal: software for aligning multiple biological sequences (version 2.1.0) (2012). http://opal.cs.arizona.edu
Woerner, A., Kececioglu, J.: Faster metric-space nearest-neighbor search using dispersion trees (2017). In preparation
Wu, M., Chatterji, S., Eisen, J.A.: Accounting for alignment uncertainty in phylogenomics. PLoS One 7(1), 1–10 (2012)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
DeBlasio, D., Kececioglu, J. (2017). Core Column Prediction for Alignments. In: Parameter Advising for Multiple Sequence Alignment. Computational Biology, vol 26. Springer, Cham. https://doi.org/10.1007/978-3-319-64918-4_9
Download citation
DOI: https://doi.org/10.1007/978-3-319-64918-4_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64917-7
Online ISBN: 978-3-319-64918-4
eBook Packages: Computer ScienceComputer Science (R0)