Advertisement

Identification of Protein Homologs and Domain Boundaries by Iterative Sequence Alignment

  • Dustin Schaeffer
  • Nick V. Grishin
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1851)

Abstract

Evolutionary domains are protein regions with observable sequence similarity to other known domains. Here we describe how to use common sequence and profile alignment algorithms (i.e., BLAST, HHsearch) to delineate putative domains in novel protein sequences, given a reference library of protein domains. In this case, we use our database of evolutionary domains (ECOD) as a reference, but other domain sequence libraries could be used (e.g., SCOP, CATH). We describe our domain partition algorithm along with specific notes on how to avoid domain indexing errors when working with multiple data sources and software algorithms with differing outputs.

Key words

Protein domains Homologs Sequence alignment 

Notes

Acknowledgments

This work was supported in part by the National Institutes of Health (GM094575 to NVG) and the Welch Foundation (I-1505 to NVG).

References

  1. 1.
    Soding J, Lupas AN (2003) More than the sum of their parts: on the evolution of proteins from peptides. BioEssays 25(9):837–846CrossRefGoogle Scholar
  2. 2.
    Leipe DD, Aravind L, Grishin NV, Koonin EV (2000) The bacterial replicative helicase DnaB evolved from a RecA duplication. Genome Res 10(1):5–16PubMedGoogle Scholar
  3. 3.
    Tyzack JD, Furnham N, Sillitoe I, Orengo CM, Thornton JM (2017) Understanding enzyme function evolution from a computational perspective. Curr Opin Struct Biol 47(Suppl C):131–139.  https://doi.org/10.1016/j.sbi.2017.08.003CrossRefPubMedGoogle Scholar
  4. 4.
    Cheng H, Schaeffer RD, Liao Y, Kinch LN, Pei J, Shi S, Kim BH, Grishin NV (2014) ECOD: an evolutionary classification of protein domains. PLoS Comput Biol 10(12):e1003926.  https://doi.org/10.1371/journal.pcbi.1003926CrossRefPubMedPubMedCentralGoogle Scholar
  5. 5.
    Song N, Sedgewick RD, Durand D (2007) Domain architecture comparison for multidomain homology identification. J Comput Biol 14(4):496–516.  https://doi.org/10.1089/cmb.2007.A009CrossRefPubMedGoogle Scholar
  6. 6.
    Holland TA, Veretnik S, Shindyalov IN, Bourne PE (2006) Partitioning protein structures into domains: why is it so difficult? J Mol Biol 361(3):562–590.  https://doi.org/10.1016/j.jmb.2006.05.060CrossRefPubMedGoogle Scholar
  7. 7.
    Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, Murzin AG (2008) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 36(Database issue):D419–D425PubMedGoogle Scholar
  8. 8.
    Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402CrossRefGoogle Scholar
  9. 9.
    Soding J (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21(7):951–960.  https://doi.org/10.1093/bioinformatics/bti125CrossRefPubMedGoogle Scholar
  10. 10.
    Remmert M, Biegert A, Hauser A, Soding J (2011) HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9:173.  https://doi.org/10.1038/nmeth.1818CrossRefPubMedGoogle Scholar
  11. 11.
    Cheng H, Liao Y, Schaeffer RD, Grishin NV (2015) Manual classification strategies in the ECOD database. Proteins 83(7):1238–1251.  https://doi.org/10.1002/prot.24818CrossRefPubMedPubMedCentralGoogle Scholar
  12. 12.
    Westbrook J, Ito N, Nakamura H, Henrick K, Berman HM (2005) PDBML: the representation of archival macromolecular structure data in XML. Bioinformatics 21(7):988–992.  https://doi.org/10.1093/bioinformatics/bti082CrossRefPubMedGoogle Scholar
  13. 13.
    Fu L, Niu B, Zhu Z, Wu S, Li W (2012) CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28(23):3150–3152.  https://doi.org/10.1093/bioinformatics/bts565CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Dustin Schaeffer
    • 1
  • Nick V. Grishin
    • 1
    • 2
  1. 1.Department of BiophysicsUniversity of Texas Southwestern Medical CenterDallasUSA
  2. 2.Howard Hughes Medical InstituteUniversity of Texas Southwestern Medical CenterDallasUSA

Personalised recommendations