Skip to main content

Hidden Markov Models for Protein Domain Homology Identification and Analysis

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1555))

Abstract

Protein domain identification and analysis are cornerstones of modern proteomics. The tools available to protein domain researchers avail a variety of approaches to understanding large protein domain families. Hidden Markov Models (HMM) form the basis for identifying and categorizing evolutionarily linked protein domains. Here I describe the use of HMM models for predicting and identifying Src Homology 2 (SH2) domains within the proteome.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

References

  1. Liu BA, Shah E, Jablonowski K, Stergachis A, Engelmann B, Nash PD (2011) The SH2 domain-containing proteins in 21 species establish the provenance and scope of phosphotyrosine signaling in eukaryotes. Sci Signal 4(202):ra83. doi:10.1126/scisignal.2002105

    Article  PubMed  PubMed Central  Google Scholar 

  2. Liu BA, Nash PD (2012) Evolution of SH2 domains and phosphotyrosine signalling networks. Philos Trans R Soc Lond B Biol Sci 367(1602):2556–2573. doi:10.1098/rstb.2012.0107

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Finn RD, Coggill P, Eberhardt RY et al (2016) The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res 44(Database issue):D279–D285. doi:10.1093/nar/gkv1344

    Article  PubMed  Google Scholar 

  4. Sigrist CJA, de Castro E, Cerutti L, Cuche BA, Hulo N, Bridge A, Bougueleret L, Xenarios I (2012) New and continuing developments at PROSITE. Nucleic Acids Res. doi:10.1093/nar/gks1067

    Google Scholar 

  5. Gough J, Karplus K, Hughey R, Chothia C (2001) Assignment of homology to genome sequences using a Library of Hidden Markov Models that represent all proteins of known structure. J Mol Biol 313(4):903–919

    Article  CAS  PubMed  Google Scholar 

  6. Hunter S, Jones P, Mitchell A, Apweiler R, Attwood TK, Bateman A, Bernard T, Binns D, Bork P, Burge S, de Castro E, Coggill P, Corbett M, Das U, Daugherty L, Duquenne L, Finn RD, Fraser M, Gough J, Haft D, Hulo N, Kahn D, Kelly E, Letunic I, Lonsdale D, Lopez R, Madera M, Maslen J, McAnulla C, McDowall J, McMenamin C, Mi H, Mutowo-Muellenet P, Mulder N, Natale D, Orengo C, Pesseat S, Punta M, Quinn AF, Rivoire C, Sangrador-Vegas A, Selengut JD, Sigrist CJ, Scheremetjew M, Tate J, Thimmajanarthanan M, Thomas PD, Wu CH, Yeats C, Yong SY (2012) InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res 40(Database issue):D306–D312. doi:10.1093/nar/gkr948

    Article  CAS  PubMed  Google Scholar 

  7. Triplet T, Shortridge M, Griep M, Stark J, Powers R, Revesz P (2010) PROFESS: a PROtein Function, Evolution, Structure and Sequence database. Database (Oxford) 2010:baq011

    Article  Google Scholar 

  8. Whelan S, de Bakker PIW, Quevillon E, Rodriguez N, Goldman N (2006) PANDIT: an evolution-centric database of protein and associated nucleotide domains with inferred trees. Nucleic Acids Res 34(Database issue):D327–D331. doi:10.1093/nar/gkj087

    Article  CAS  PubMed  Google Scholar 

  9. Liu BA, Engelmann BW, Jablonowski K, Higginbotham K, Stergachis AB, Nash PD (2012) SRC Homology 2 Domain Binding Sites in Insulin, IGF-1 and FGF receptor mediated signaling networks reveal an extensive potential interactome. Cell Commun Signal 10(1):27. doi:10.1186/1478-811X-10-27

    Article  PubMed  PubMed Central  Google Scholar 

  10. Liu BA, Jablonowski K, Raina M, Arce M, Pawson T, Nash PD (2006) The human and mouse complement of SH2 domain proteins-establishing the boundaries of phosphotyrosine signaling. Mol Cell 22(6):851–868. doi:10.1016/j.molcel.2006.06.001

    Article  PubMed  Google Scholar 

  11. Smith TF, Waterman MS (1981) Identification of common molecular subsequences. J Mol Biol 147(1):195–197

    Article  CAS  PubMed  Google Scholar 

  12. Dijkstra EW (1959) A note on two problems in connexion with graphs. Numer Math 1:260–271

    Article  Google Scholar 

  13. Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. In: Dayhoff MO (ed) Atlas of protein sequece and structure, supplement 3. National Biomedical Research Foundation, Washington, DC, pp 345–352

    Google Scholar 

  14. Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89(22):10915–10919

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Bairoch A, Apweiler R, Wu CH, Barker WC, Boeckmann B, Ferro S, Gasteiger E, Huang H, Lopez R, Magrane M, Martin MJ, Natale DA, O'Donovan C, Redaschi N, Yeh LS (2005) The Universal Protein Resource (UniProt). Nucleic Acids Res 33(Database issue):D154–D159. doi:10.1093/nar/gki070

    Article  CAS  PubMed  Google Scholar 

  16. Hubbard T, Andrews D, Caccamo M, Cameron G, Chen Y, Clamp M, Clarke L, Coates G, Cox T, Cunningham F, Curwen V, Cutts T, Down T, Durbin R, Fernandez-Suarez XM, Gilbert J, Hammond M, Herrero J, Hotz H, Howe K, Iyer V, Jekosch K, Kahari A, Kasprzyk A, Keefe D, Keenan S, Kokocinsci F, London D, Longden I, McVicker G, Melsopp C, Meidl P, Potter S, Proctor G, Rae M, Rios D, Schuster M, Searle S, Severin J, Slater G, Smedley D, Smith J, Spooner W, Stabenau A, Stalker J, Storey R, Trevanion S, Ureta-Vidal A, Vogel J, White S, Woodwark C, Birney E (2005) Ensembl 2005. Nucleic Acids Res 33(Database issue):D447–D453. doi:10.1093/nar/gki138

    Article  CAS  PubMed  Google Scholar 

  17. Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell CM, Hart J, Landrum MJ, McGarvey KM, Murphy MR, O'Leary NA, Pujar S, Rajput B, Rangwala SH, Riddick LD, Shkeda A, Sun H, Tamez P, Tully RE, Wallin C, Webb D, Weber J, Wu W, DiCuccio M, Kitts P, Maglott DR, Murphy TD, Ostell JM (2014) RefSeq: an update on mammalian reference sequences. Nucleic Acids Res 42(Database issue):D756–D763. doi:10.1093/nar/gkt1114

    Article  CAS  PubMed  Google Scholar 

  18. Letunic I, Doerks T, Bork P (2009) SMART 6: recent updates and new developments. Nucleic Acids Res 37(Database issue):D229–D232. doi:10.1093/nar/gkn808

    Article  CAS  PubMed  Google Scholar 

  19. Sigrist CJ, Cerutti L, de Castro E, Langendijk-Genevaux PS, Bulliard V, Bairoch A, Hulo N (2010) PROSITE, a protein domain database for functional characterization and annotation. Nucleic Acids Res 38(Database issue):D161–D166. doi:10.1093/nar/gkp885

    Article  CAS  PubMed  Google Scholar 

  20. Finn RD, Clements J, Eddy SR (2011) HMMER web server: interactive sequence similarity searching. Nucleic Acids Res 39(Web Server issue):W29–W37. doi:10.1093/nar/gkr367

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23(21):2947–2948. doi:10.1093/bioinformatics/btm404

    Article  CAS  PubMed  Google Scholar 

  22. Maglott D, Ostell J, Pruitt KD, Tatusova T (2005) Entrez Gene: gene-centered information at NCBI. Nucleic Acids Res 33(Database Issue):D54–D58. doi:10.1093/nar/gki031

    Article  CAS  PubMed  Google Scholar 

  23. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215:403–410

    Article  CAS  PubMed  Google Scholar 

  24. Sievers F, Wilm A, Dineen DG, Gibson TJ, Karplus K, Li W, Lopez R, McWilliam H, Remmert M, Söding J, Thompson JD, Higgins D (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7:539. doi:10.1038/msb.2011.75

    Article  PubMed  PubMed Central  Google Scholar 

  25. Katoh K, Misawa K, Kuma K, Miyata T (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30(14):3059–3066

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Loytynoja A, Goldman N (2005) An algorithm for progressive multiple alignment of sequences with insertions. Proc Natl Acad Sci U S A 102:10557–10562

    Article  PubMed  PubMed Central  Google Scholar 

  27. Felsenstein J (1989) PHYLIP - Phylogeny Inference Package (Version 3.2). Cladistics 5:164–166

    Google Scholar 

  28. Page RD (2002) Visualizing phylogenetic trees using TreeView. Curr Protoc Bioinformatics 00:6.2:6.2.1–6.2.15

    Google Scholar 

  29. Perrière G, Gouy M (1996) WWW-Query: an on-line retrieval system for biological sequence banks. Biochimie 78:364–369

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

The knowledge amassed to write this chapter was based on work supported by the University of Chicago Cancer Research Foundation Women’s Board and Piers Nash’s laboratory at the University of Chicago Ben May Department for Cancer Research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Karl Jablonowski .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media LLC

About this protocol

Cite this protocol

Jablonowski, K. (2017). Hidden Markov Models for Protein Domain Homology Identification and Analysis. In: Machida, K., Liu, B. (eds) SH2 Domains. Methods in Molecular Biology, vol 1555. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-6762-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-6762-9_3

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-6760-5

  • Online ISBN: 978-1-4939-6762-9

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics