The Relation between Indel Length and Functional Divergence: A Formal Study
Although insertions and deletions (indels) are a common type of evolutionary sequence variation, their origins and their functional consequences have not been comprehensively understood. There is evidence that, on one hand, classical alignment procedures only roughly reflect the evolutionary processes and, on the other hand, that they cause structural changes in the proteins’ surfaces.
We first demonstrate how to identify alignment gaps that have been introduced by evolution to a statistical significant degree, by means of a novel, sound statistical framework, based on pair hidden Markov models (HMMs). Second, we examine paralogous protein pairs in E. coli, obtained by computation of classical global alignments. Distinguishing between indel and non-indel pairs, according to our novel statistics, revealed that, despite having the same sequence identity, indel pairs are significantly less functionally similar than non-indel pairs, as measured by recently suggested GO based functional distances. This suggests that indels cause more severe functional changes than other types of sequence variation and that indel statistics should be taken into additional account to assess functional similarity between paralogous protein pairs.
KeywordsAlignment statistics Deletions Insertions GO Pair Hidden Markov Models
Unable to display preview. Download preview PDF.
- 11.Fechteler, T., Dengler, U., Schomburg, D.: Prediction of protein three-dimensional structures in insertion and deletion regions: a procedure for searching data bases of representative protein fragments using geometric scoring criteria. Journal of Molecular Biology 253, 114–131 (1995)CrossRefGoogle Scholar
- 12.Gerlt, J.A., Babbitt, P.C.: Can sequence determine function? Genome Biology 1(5), reviews0005.1-0005.10 (2000)Google Scholar
- 20.Lunter, G., Rocco, A., Mimouni, N., Heger, A., Caldeira, A., Hein, J.: Uncertainty in homology inferences: Assessing and improving genomic sequence alignment. Genome Research 18 (2007), doi:10.1101/gr.6725608Google Scholar
- 21.Nandan, D., Lopez, M., Ban, F., Huang, M., Li, Y., Reiner, N.E., Cherkasov, A.: Indel-based targeting of essential proteins in human pathogens that have close host orthologue(s): Discovery of selective inhibitors for Leishmania donovani elongation factor-1 − α. Proteins: Structure, Function and Bioinformatics 67, 53–67 (2007)CrossRefGoogle Scholar
- 25.Pesquita, C., Faria, D., Bastos, H., Falco, A.O., Couto, F.M.: Evaluating GO-based semantic similarity measures. In: Proceedings of the 10th Annual Bio-Ontologies Meeting (Bio-Ontologies 2007) (2007)Google Scholar
- 26.Pipenbacher, P., Schliep, A., Schneckener, S., Schönhuth, A., Schomburg, D., Schrader, R.: ProClust: improved clustering of protein sequences with an extended graph-based approach. Bioinformatics 18(Supp.2), 182–191 (2002)Google Scholar
- 33.The Gene Ontology Consortium. Gene Ontology: tool for the unification of biology. Nature Genetics 25, 25–29 (2000)Google Scholar
- 35.The UniProt Consortium. The Universal Protein Resource (UniProt). Nucleic Acids Res. 35, D193-D197 (2007)Google Scholar