The Role of Independent Test Set in Modeling of Protein Folding Kinetics

Štambuk, Nikola; Konjevoda, Paško

doi:10.1007/978-1-4419-7046-6_28

Nikola Štambuk³ &
Paško Konjevoda

Part of the book series: Advances in Experimental Medicine and Biology ((AEMB,volume 696))

2613 Accesses
6 Citations

Abstract

The testing of a bioinformatics algorithm on the training set is not the best indicator of its future performance because of the misleadingly optimistic results. The optimal method of testing is the calculation of error rate on an independent dataset (test set). We have tested the validity of the FOLD-RATE method for the prediction of protein folding rate constants [ln(k _f)] using sequences, structural class information and experimentally verified folding rate constants of the Protein Folding Database (PFD). PFD is a publicly accessible repository of thermodynamic and kinetic data of interest for the researchers of different profiles, standardized by the International Foldeomics Consortium. Our results show that when the standardized PFD dataset is used to test a protein fold rate prediction method, the estimation of validity may differ significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Fulton KF, Devlin GL, Jodun RA et al (2005) PFD: a database for the investigation of protein folding kinetics and stability. Nucleic Acids Res 33:D279–D283
Article PubMed CAS Google Scholar
Gromiha MM, Thangakani AM, Selvaraj S (2006) FOLD-RATE: prediction of protein folding rates from amino acid sequence. Nucleic Acids Res 34:W70–W74
Article PubMed CAS Google Scholar
Huang K (2005) Lectures on statistical physics and protein folding. World Scientific, New Jersey
Book Google Scholar
Maxwell KL, Wildes D, Zarrine-Afsar A et al (2005) Protein folding: defining a “standard” set of experimental conditions and a preliminary kinetic data set of two-state proteins. Prot Sci 14:602–616
Article CAS Google Scholar
Nölting B (2006) Protein folding kinetics: biophysical methods. Springer, Berlin
Google Scholar
Young DC (2009) Computational drug design: a guide for computational and medicinal chemists. Wiley, Hoboken
Google Scholar
Fulton KF, Bate MA, Faux NG et al (2007) Protein folding database (PFD 2.0): an online environment for the International Foldeomics Consortium. Nucleic Acids Res 35:D304–D307
Article PubMed CAS Google Scholar
Gromiha MM, Oobatake M, Sarai A (1999) Important amino acid properties for enhanced thermostability from mesophilic to thermophilic proteins. Biophys Chem 82:51–67
Article PubMed CAS Google Scholar
Gromiha MM, Oobatake M, Kono H, Uedaira H, Sarai A (2000) Importance of surrounding residues for protein stability of partially buried mutations. J Biomol Struct Dyn 18:281–295
PubMed CAS Google Scholar
R Development Core Team (2005). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3–900051–07–0. URL: http://www.R-project.org.
Krouwer JS, Monti KL (1995) A simple, graphical method to evaluate laboratory assays. Eur J Clin Chem Clin Biochem 33:525–527
PubMed CAS Google Scholar
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Elsevier, San Francisco
Google Scholar
Editorial (2008) Community cleverness required. Nature 455:1
Google Scholar
Pocernich M (2006) R’s role in the climate change debate. R News 6:17–18
Google Scholar

Download references

Acknowledgements

The support of the Croatian Ministry of Science, Education and Sports is gratefully acknowledged (grant no. 098–0982929–2524).

Author information

Authors and Affiliations

NMR Center, Ruđer Bošković Institute, Bijenička cesta 54, HR-10002, Zagreb, Croatia
Nikola Štambuk

Authors

Nikola Štambuk
View author publications
You can also search for this author in PubMed Google Scholar
Paško Konjevoda
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nikola Štambuk .

Editor information

Editors and Affiliations

Dept. Computer Science, University of Georgia, Athens, 30602-7404, Georgia, USA
Hamid R. Arabnia
, Department of Computer Science, Lamar University, Beaumont, 77710, Texas, USA
Quoc-Nam Tran

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Štambuk, N., Konjevoda, P. (2011). The Role of Independent Test Set in Modeling of Protein Folding Kinetics. In: Arabnia, H., Tran, QN. (eds) Software Tools and Algorithms for Biological Systems. Advances in Experimental Medicine and Biology, vol 696. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7046-6_28

Download citation

DOI: https://doi.org/10.1007/978-1-4419-7046-6_28
Published: 15 March 2011
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-7045-9
Online ISBN: 978-1-4419-7046-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics