Abstract
The testing of a bioinformatics algorithm on the training set is not the best indicator of its future performance because of the misleadingly optimistic results. The optimal method of testing is the calculation of error rate on an independent dataset (test set). We have tested the validity of the FOLD-RATE method for the prediction of protein folding rate constants [ln(k f )] using sequences, structural class information and experimentally verified folding rate constants of the Protein Folding Database (PFD). PFD is a publicly accessible repository of thermodynamic and kinetic data of interest for the researchers of different profiles, standardized by the International Foldeomics Consortium. Our results show that when the standardized PFD dataset is used to test a protein fold rate prediction method, the estimation of validity may differ significantly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Fulton KF, Devlin GL, Jodun RA et al (2005) PFD: a database for the investigation of protein folding kinetics and stability. Nucleic Acids Res 33:D279–D283
Gromiha MM, Thangakani AM, Selvaraj S (2006) FOLD-RATE: prediction of protein folding rates from amino acid sequence. Nucleic Acids Res 34:W70–W74
Huang K (2005) Lectures on statistical physics and protein folding. World Scientific, New Jersey
Maxwell KL, Wildes D, Zarrine-Afsar A et al (2005) Protein folding: defining a “standard” set of experimental conditions and a preliminary kinetic data set of two-state proteins. Prot Sci 14:602–616
Nölting B (2006) Protein folding kinetics: biophysical methods. Springer, Berlin
Young DC (2009) Computational drug design: a guide for computational and medicinal chemists. Wiley, Hoboken
Fulton KF, Bate MA, Faux NG et al (2007) Protein folding database (PFD 2.0): an online environment for the International Foldeomics Consortium. Nucleic Acids Res 35:D304–D307
Gromiha MM, Oobatake M, Sarai A (1999) Important amino acid properties for enhanced thermostability from mesophilic to thermophilic proteins. Biophys Chem 82:51–67
Gromiha MM, Oobatake M, Kono H, Uedaira H, Sarai A (2000) Importance of surrounding residues for protein stability of partially buried mutations. J Biomol Struct Dyn 18:281–295
R Development Core Team (2005). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3–900051–07–0. URL: http://www.R-project.org.
Krouwer JS, Monti KL (1995) A simple, graphical method to evaluate laboratory assays. Eur J Clin Chem Clin Biochem 33:525–527
Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Elsevier, San Francisco
Editorial (2008) Community cleverness required. Nature 455:1
Pocernich M (2006) R’s role in the climate change debate. R News 6:17–18
Acknowledgements
The support of the Croatian Ministry of Science, Education and Sports is gratefully acknowledged (grant no. 098–0982929–2524).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this chapter
Cite this chapter
Štambuk, N., Konjevoda, P. (2011). The Role of Independent Test Set in Modeling of Protein Folding Kinetics. In: Arabnia, H., Tran, QN. (eds) Software Tools and Algorithms for Biological Systems. Advances in Experimental Medicine and Biology, vol 696. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7046-6_28
Download citation
DOI: https://doi.org/10.1007/978-1-4419-7046-6_28
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-7045-9
Online ISBN: 978-1-4419-7046-6
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)