Systematic Exploration of an Efficient Amino Acid Substitution Matrix: MIQS

Tomii, Kentaro; Yamada, Kazunori

doi:10.1007/978-1-4939-3572-7_11

Systematic Exploration of an Efficient Amino Acid Substitution Matrix: MIQS

Kentaro Tomii⁴ &
Kazunori Yamada^4,5

Protocol
First Online: 27 April 2016

4214 Accesses
2 Citations

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1415))

Abstract

Amino acid sequence comparisons to find similarities between proteins are fundamental sequence information analyses for inferring protein structure and function. In this study, we improve amino acid substitution matrices to identify distantly related proteins. We systematically sampled and benchmarked substitution matrices generated from the principal component analysis (PCA) subspace based on a set of typical existing matrices. Based on the benchmark results, we identified a region of highly sensitive matrices in the PCA subspace using kernel density estimation (KDE). Using the PCA subspace, we were able to deduce a novel sensitive matrix, called MIQS, which shows better detection performance for detecting distantly related proteins than those of existing matrices. This approach to derive an efficient amino acid substitution matrix might influence many fields of protein sequence analysis. MIQS is available at http://csas.cbrc.jp/Ssearch/.

This is a preview of subscription content, log in via an institution.

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

Abbreviations

AUC :: Area under the ROC curve
BLOSUM :: Block substitution matrix
KDE :: Kernel density estimation
MIQS :: Matrix to improve quality in similarity search
PCA :: Principal component analysis
ROC :: Receiver operating characteristic
VTML :: Variable time maximum likelihood

References

Tomii K, Kanehisa K (1996) Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins. Protein Eng 9(1):27–36
Article CAS PubMed Google Scholar
Dayhoff MO, Schwartz RM, Orcutt BC (1978) A model of evolutionary change in proteins. In: Dayhoff MO (ed) Atlas of protein sequence and structure. National Biomedical Research Foundation, Washington, DC, pp 345–352, Vol 5 (Suppl. 3)
Google Scholar
Jones DT, Taylor WR, Thornton JM (1992) The rapid generation of mutation data matrices from protein sequences. Comput Appl Biosci 8(3):275–282
CAS PubMed Google Scholar
Gonnet GH, Cohen MA, Benner SA (1992) Exhaustive matching of the entire protein sequence database. Science 256(5062):1443–1445
Article CAS PubMed Google Scholar
Benner SA, Cohen MA, Gonnet GH (1994) Amino acid substitution during functionally constrained divergent evolution of protein sequences. Protein Eng 7(11):1323–1332
Article CAS PubMed Google Scholar
Henikoff S, Henikoff JG (1992) Amino acid substitution matrices from protein blocks. Proc Natl Acad Sci U S A 89(22):10915–10919
Article CAS PubMed PubMed Central Google Scholar
Altschul SF, Madden TL, Schäffer AA et al (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25(17):3389–3402
Article CAS PubMed PubMed Central Google Scholar
Pearson WR (1991) Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 11(3):635–650
Article CAS PubMed Google Scholar
Henikoff S, Henikoff JG (1993) Performance evaluation of amino acid substitution matrices. Proteins 17(1):49–61
Article CAS PubMed Google Scholar
Price GA, Crooks GE, Green RE et al (2005) Statistical evaluation of pairwise protein sequence comparison with the Bayesian bootstrap. Bioinformatics 21(20):3824–3831
Article CAS PubMed Google Scholar
Müller T, Spang R, Vingron M (2002) Estimating amino acid substitution models: a comparison of Dayhoff’s estimator, the resolvent approach and a maximum likelihood method. Mol Biol Evol 19(1):8–13
Article PubMed Google Scholar
Yamada K, Tomii K (2014) Revisiting amino acid substitution matrices for identifying distantly related proteins. Bioinformatics 30(3):317–325. doi:10.1093/bioinformatics/btt694
Article CAS PubMed PubMed Central Google Scholar
Tan YH, Huang H, Kihara D (2006) Statistical potential-based amino acid similarity matrices for aligning distantly related protein sequences. Proteins 64(3):587–600
Article CAS PubMed Google Scholar
Dosztányi Z, Torda AE (2001) Amino acid similarity matrices based on force fields. Bioinformatics 17(8):686–699
Article PubMed Google Scholar
Andreeva A, Howorth D, Chandonia J-M et al (2008) Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res 36(Database issue):D419–D425
CAS PubMed PubMed Central Google Scholar
Angermüller C, Biegert A, Söding J (2012) Discriminative modelling of context-specific amino acid substitution probabilities. Bioinformatics 28(24):3240–3247. doi:10.1093/bioinformatics/bts622
Article PubMed Google Scholar
Sillitoe I, Lewis TE, Cuff A et al (2015) CATH: comprehensive structural and functional annotations for genome sequences. Nucleic Acids Res 43(Database issue):D376–D381. doi:10.1093/nar/gku947
Article PubMed PubMed Central Google Scholar
Remmert M, Biegert A, Hauser A et al (2011) HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 9(2):173–175. doi:10.1038/nmeth.1818
Article PubMed Google Scholar
Petersen TN, Kauppinen S, Larsen S (1997) The crystal structure of rhamnogalacturonase A from Aspergillus aculeatus: a right-handed parallel beta helix. Structure 5(4):533–544
Article CAS PubMed Google Scholar
Pickersgill R, Smith D, Worboys K et al (1998) Crystal structure of polygalacturonase from Erwinia carotovora ssp. carotovora. J Biol Chem 273(38):24660–24664
Article CAS PubMed Google Scholar
Styczynski MP, Jensen KL, Rigoutsos I et al (2008) BLOSUM62 miscalculations improve search performance. Nat Biotechnol 26(3):274–275. doi:10.1038/nbt0308-274
Article CAS PubMed Google Scholar
Pearson WR (2013) Selecting the right similarity-scoring matrix. Curr Protoc Bioinformatics Suppl. 43:3.5.1–3.5.9
Google Scholar
Kinjo AR, Nishikawa K (2004) Eigenvalue analysis of amino acid substitution matrices reveals a sharp transition of the mode of sequence conservation in proteins. Bioinformatics 20(16):2504–2508
Article CAS PubMed Google Scholar
Overington J, Donnelly D, Johnson MS et al (1992) Environment-specific amino acid substitution tables: Tertiary templates and prediction of protein folds. Protein Sci 1(2):216–226
Article CAS PubMed PubMed Central Google Scholar
Prlic A, Bliven S, Rose PW et al (2010) Pre-calculated protein structure alignments at the RCSB PDB website. Bioinformatics 26(23):2983–2985. doi:10.1093/bioinformatics/btq572
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgments

This work was partially supported by Platform Project for Supporting in Drug Discovery and Life Science Research (Platform for Drug Discovery, Informatics, and Structural Life Science) from the Ministry of Education, Culture, Sports, Science, and Technology (MEXT) and Japan Agency for Medical Research and Development (AMED). We thank Drs. Somlata Gupta, Kumiko Nakada-Tsukui, and Tomoyoshi Nozaki of NIID for discussions related to IMD/I-BAR domains in E. histolytica. We thank Toshiyuki Oda for conducting the HHblits search.

Author information

Authors and Affiliations

Biotechnology Research Institute for Drug Discovery, National Institute of Advanced Industrial Science and Technology (AIST), 2-4-7 Aomi, Koto-Ku, Tokyo, 135-0064, Japan
Kentaro Tomii & Kazunori Yamada
Graduate School of Information Sciences, Tohoku University, 6-3-9 Aramaki-Aza-Aoba, Aoba-ku, Sendai, 980-8579, Japan
Kazunori Yamada

Authors

Kentaro Tomii
View author publications
You can also search for this author in PubMed Google Scholar
Kazunori Yamada
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kentaro Tomii .

Editor information

Editors and Affiliations

Max F. Perutz Laboratories GmbH, Universität Wien, Wien, Austria
Oliviero Carugo
Technology and Research (A*STAR), Agency for Science, Singapore, Singapore
Frank Eisenhaber

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Tomii, K., Yamada, K. (2016). Systematic Exploration of an Efficient Amino Acid Substitution Matrix: MIQS. In: Carugo, O., Eisenhaber, F. (eds) Data Mining Techniques for the Life Sciences. Methods in Molecular Biology, vol 1415. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3572-7_11

Download citation

DOI: https://doi.org/10.1007/978-1-4939-3572-7_11
Published: 27 April 2016
Publisher Name: Humana Press, New York, NY
Print ISBN: 978-1-4939-3570-3
Online ISBN: 978-1-4939-3572-7
eBook Packages: Springer Protocols

Publish with us

Policies and ethics