Abstract
Phylogenetics, the study of evolutionary relationships among groups of organisms, has played an important role in modern biological research, such as genomic comparison, detecting orthology and paralogy, estimating divergence times, reconstructing ancient proteins, identifying mutations likely to be associated with disease, determining the identity of new pathogens, and finding the residues that are important to natural selection. Given an alignment of protein-coding DNA sequences, most methods for detecting natural selection rely on estimating the codon-specific nonsynonymous/synonymous rate ratios (d N ∕ d S ). Here, we describe an approach to modeling variation in the d N ∕ d S by using a conditional autoregressive (CAR) model. The CAR model relaxes the assumption in most contemporary phylogenetic models, i.e., sites in molecular sequences evolve independently. By incorporating the information stored in the Protein Data Bank (PDB) file, the CAR model estimates the d N ∕ d S based on the protein three-dimensional structure. We implement the model in a fully Bayesian approach with all parameters of the model considered as random variables and make use of the NVIDIA’s parallel computing architecture (CUDA) to accelerate the calculation. Our result of analyzing an empirical abalone sperm lysine data is in accordance with the previous findings.
Keywords
- Protein Data Bank
- Deviance Information Criterion
- Dirichlet Process
- Proximity Matrix
- Conditional Autoregressive
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Yang, Z.: A space-time process model for the evolution of DNA sequences. Genetics 139, 993–1005 (1995)
Felsenstein, J. and G. A. Churchill: A Hidden Markov Model approach to variation among sites in rate of evolution. Molecular Biology and Evolution 13, 93–104 (1996)
Koshi, J. M. and R. A. Goldstein: Models of natural mutations including site heterogeneity. Proteins 32, 289–295 (1998)
Liò, P., N. Goldman, J. L. Thorne, and D. T. Jones3: PASSML: combining evolutionary inference and protein secondary structure prediction. Bioinformatics 14, 726–733 (1998)
Liò, P. and N. Goldman: Using protein structural information in evolutionary inference: transmembrane proteins. Molecular Biology and Evolution 16, 1696–1710 (1999)
Robinson, D., D. Jones, H. Kishino, N. Goldman, and J. Thorne: Protein evolution with dependence among codons due to tertiary structure. Molecular Biology and Evolution 20, 1692–1704 (2003)
Rodrigue, N., N. Lartillot, D. Bryant, and H. Philippe: Site interdependence attributed to tertiary structure in amino acid sequence evolution. Gene 347, 207–217 (2005)
Kleinman, C. L., N. Rodrigue, N. Lartillot, and H. Philippe: Statistical potentials for improved structurally constrained evolutionary models. Molecular Biology and Evolution 27, 1546–1560 (2010)
Huelsenbeck, J., S. Jain, S. Frost, and S. Pond: A Dirichlet process model for detecting positive selection in protein-coding DNA sequences. Proceedings of the National Academy of Sciences of the United States of America 103, 6263–6268 (2006)
Besag, J.: Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical Society. Series B (Methodological) 36, 192–236 (1974)
Banerjee, S., B. P. Carlin, and A. E. Gelfand: Hierarchical modeling and analysis for spatial data. Chapman & Hall/CRC, London (2004)
Yang, Z., W. Swanson, and V. Vacquier: Maximum-likelihood analysis of molecular adaptation in abalone sperm lysin reveals variable selective pressures among lineages and sites. Molecular Biology and Evolution 17, 1446–1455 (2000)
Yang, Z. and J. Bielawski: Statistical methods for detecting molecular adaptation. Trends in Ecology and Evolution 15, 496–503 (2000)
Berman, H., K. Henrick, and H. Nakamura: Announcing the worldwide Protein Data Bank. Nature Structural Biology 10, 980–980 (2003)
Kresge, N., V. D. Vacquier, and C. D. Stout: 1.35 and 2.07 A resolution structures of the red abalone sperm lysin monomer and dimer reveal features involved in receptor binding. Acta Crystallographica Section D: Biological Crystallography 56, 34–41 (2000)
Neal, R. M.: Slice sampling. Annals of Statistics 31, 705–741 (2003)
Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller: Equation of state calculations by fast computing machines. The Journal of Chemical Physics 21, 1087–1092 (1953)
Hastings, W. K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57, 97–109 (1970)
Spiegelhalter, D., N. Best, B. Carlin, and A. Linde: Bayesian measures of model complexity and fit (with discussion). Journal of the Royal Statistical Society. Series B (Statistical Methodology) 64, 583–639 (2002)
Geisser, S. and W. F. Eddy: A predictive approach to model selection. Journal of the American Statistical Association 74, 153–160 (1979)
Chen, M.-H., Q.-M. Shao, and J. G. Ibrahim: Monte Carlo methods in Bayesian computation. Springer-Verlag Inc., Berlin, New York (2000)
Gelfand, A. E., J. A. Silander, S. Wu, A. Latimer, P. O. Lewis, A. G. Rebelo, and M. Holder: Explaining species distribution patterns through hierarchical modeling. Bayesian Analysis 1, 41–91 (2006)
Guo, F., D. K. Dey, and K. E. Holsinger: A Bayesian hierarchical model for analysis of single-nucleotide polymorphisms diversity in multilocus, multipopulation samples. Journal of the American Statistical Association 104, 142–154 (2009)
Suchard, M. A. and A. Rambaut: Many-core algorithms for statistical phylogenetics. Bioinformatics 25, 1370–1376 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this paper
Cite this paper
Fan, Y., Wu, R., Chen, MH., Kuo, L., Lewis, P.O. (2013). A Conditional Autoregressive Model for Detecting Natural Selection in Protein-Coding DNA Sequences. In: Hu, M., Liu, Y., Lin, J. (eds) Topics in Applied Statistics. Springer Proceedings in Mathematics & Statistics, vol 55. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7846-1_17
Download citation
DOI: https://doi.org/10.1007/978-1-4614-7846-1_17
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-7845-4
Online ISBN: 978-1-4614-7846-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)