Skip to main content

Systematic Exploration of an Efficient Amino Acid Substitution Matrix: MIQS

  • Protocol
  • First Online:

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1415))

Abstract

Amino acid sequence comparisons to find similarities between proteins are fundamental sequence information analyses for inferring protein structure and function. In this study, we improve amino acid substitution matrices to identify distantly related proteins. We systematically sampled and benchmarked substitution matrices generated from the principal component analysis (PCA) subspace based on a set of typical existing matrices. Based on the benchmark results, we identified a region of highly sensitive matrices in the PCA subspace using kernel density estimation (KDE). Using the PCA subspace, we were able to deduce a novel sensitive matrix, called MIQS, which shows better detection performance for detecting distantly related proteins than those of existing matrices. This approach to derive an efficient amino acid substitution matrix might influence many fields of protein sequence analysis. MIQS is available at http://csas.cbrc.jp/Ssearch/.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

Abbreviations

AUC :

Area under the ROC curve

BLOSUM :

Block substitution matrix

KDE :

Kernel density estimation

MIQS :

Matrix to improve quality in similarity search

PCA :

Principal component analysis

ROC :

Receiver operating characteristic

VTML :

Variable time maximum likelihood

References

  1. Tomii K, Kanehisa K (1996) Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. Protein Eng 9(1):27–36

    Article  CAS  PubMed  Google Scholar 

  2. Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. In: Dayhoff MO (ed) Atlas of protein sequence and structure. National Biomedical Research Foundation, Washington, DC, pp 345–352, Vol 5 (Suppl. 3)

    Google Scholar 

  3. Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8(3):275–282

    CAS  PubMed  Google Scholar 

  4. Gonnet GH, Cohen MA, Benner SA (1992) Exhaustive matching of the entire protein sequence database. Science 256(5062):1443–1445

    Article  CAS  PubMed  Google Scholar 

  5. Benner SA, Cohen MA, Gonnet GH (1994) Amino acid substitution during functionally constrained divergent evolution of protein sequences. Protein Eng 7(11):1323–1332

    Article  CAS  PubMed  Google Scholar 

  6. Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89(22):10915–10919

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Altschul SF, Madden TL, Schäffer AA et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Pearson WR (1991) Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 11(3):635–650

    Article  CAS  PubMed  Google Scholar 

  9. Henikoff S, Henikoff JG (1993) Performance evaluation of amino acid substitution matrices. Proteins 17(1):49–61

    Article  CAS  PubMed  Google Scholar 

  10. Price GA, Crooks GE, Green RE et al (2005) Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap. Bioinformatics 21(20):3824–3831

    Article  CAS  PubMed  Google Scholar 

  11. Müller T, Spang R, Vingron M (2002) Estimating amino acid substitution models: a comparison of Dayhoff’s estimator, the resolvent approach and a maximum likelihood method. Mol Biol Evol 19(1):8–13

    Article  PubMed  Google Scholar 

  12. Yamada K, Tomii K (2014) Revisiting amino acid substitution matrices for identifying distantly related proteins. Bioinformatics 30(3):317–325. doi:10.1093/bioinformatics/btt694

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Tan YH, Huang H, Kihara D (2006) Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences. Proteins 64(3):587–600

    Article  CAS  PubMed  Google Scholar 

  14. Dosztányi Z, Torda AE (2001) Amino acid similarity matrices based on force fields. Bioinformatics 17(8):686–699

    Article  PubMed  Google Scholar 

  15. Andreeva A, Howorth D, Chandonia J-M et al (2008) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 36(Database issue):D419–D425

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Angermüller C, Biegert A, Söding J (2012) Discriminative modelling of context-specific amino acid substitution probabilities. Bioinformatics 28(24):3240–3247. doi:10.1093/bioinformatics/bts622

    Article  PubMed  Google Scholar 

  17. Sillitoe I, Lewis TE, Cuff A et al (2015) CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res 43(Database issue):D376–D381. doi:10.1093/nar/gku947

    Article  PubMed  PubMed Central  Google Scholar 

  18. Remmert M, Biegert A, Hauser A et al (2011) HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9(2):173–175. doi:10.1038/nmeth.1818

    Article  PubMed  Google Scholar 

  19. Petersen TN, Kauppinen S, Larsen S (1997) The crystal structure of rhamnogalacturonase A from Aspergillus aculeatus: a right-handed parallel beta helix. Structure 5(4):533–544

    Article  CAS  PubMed  Google Scholar 

  20. Pickersgill R, Smith D, Worboys K et al (1998) Crystal structure of polygalacturonase from Erwinia carotovora ssp. carotovora. J Biol Chem 273(38):24660–24664

    Article  CAS  PubMed  Google Scholar 

  21. Styczynski MP, Jensen KL, Rigoutsos I et al (2008) BLOSUM62 miscalculations improve search performance. Nat Biotechnol 26(3):274–275. doi:10.1038/nbt0308-274

    Article  CAS  PubMed  Google Scholar 

  22. Pearson WR (2013) Selecting the right similarity-scoring matrix. Curr Protoc Bioinformatics Suppl. 43:3.5.1–3.5.9

    Google Scholar 

  23. Kinjo AR, Nishikawa K (2004) Eigenvalue analysis of amino acid substitution matrices reveals a sharp transition of the mode of sequence conservation in proteins. Bioinformatics 20(16):2504–2508

    Article  CAS  PubMed  Google Scholar 

  24. Overington J, Donnelly D, Johnson MS et al (1992) Environment-specific amino acid substitution tables: Tertiary templates and prediction of protein folds. Protein Sci 1(2):216–226

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Prlic A, Bliven S, Rose PW et al (2010) Pre-calculated protein structure alignments at the RCSB PDB website. Bioinformatics 26(23):2983–2985. doi:10.1093/bioinformatics/btq572

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgments

This work was partially supported by Platform Project for Supporting in Drug Discovery and Life Science Research (Platform for Drug Discovery, Informatics, and Structural Life Science) from the Ministry of Education, Culture, Sports, Science, and Technology (MEXT) and Japan Agency for Medical Research and Development (AMED). We thank Drs. Somlata Gupta, Kumiko Nakada-Tsukui, and Tomoyoshi Nozaki of NIID for discussions related to IMD/I-BAR domains in E. histolytica. We thank Toshiyuki Oda for conducting the HHblits search.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kentaro Tomii .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this protocol

Cite this protocol

Tomii, K., Yamada, K. (2016). Systematic Exploration of an Efficient Amino Acid Substitution Matrix: MIQS. In: Carugo, O., Eisenhaber, F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 1415. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3572-7_11

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-3572-7_11

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-3570-3

  • Online ISBN: 978-1-4939-3572-7

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics