Skip to main content

The Role of Independent Test Set in Modeling of Protein Folding Kinetics

  • Chapter
  • First Online:
Software Tools and Algorithms for Biological Systems

Part of the book series: Advances in Experimental Medicine and Biology ((AEMB,volume 696))

Abstract

The testing of a bioinformatics algorithm on the training set is not the best indicator of its future performance because of the misleadingly optimistic results. The optimal method of testing is the calculation of error rate on an independent dataset (test set). We have tested the validity of the FOLD-RATE method for the prediction of protein folding rate constants [ln(k f )] using sequences, structural class information and experimentally verified folding rate constants of the Protein Folding Database (PFD). PFD is a publicly accessible repository of thermodynamic and kinetic data of interest for the researchers of different profiles, standardized by the International Foldeomics Consortium. Our results show that when the standardized PFD dataset is used to test a protein fold rate prediction method, the estimation of validity may differ significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Fulton KF, Devlin GL, Jodun RA et al (2005) PFD: a database for the investigation of protein folding kinetics and stability. Nucleic Acids Res 33:D279–D283

    Article  PubMed  CAS  Google Scholar 

  2. Gromiha MM, Thangakani AM, Selvaraj S (2006) FOLD-RATE: prediction of protein folding rates from amino acid sequence. Nucleic Acids Res 34:W70–W74

    Article  PubMed  CAS  Google Scholar 

  3. Huang K (2005) Lectures on statistical physics and protein folding. World Scientific, New Jersey

    Book  Google Scholar 

  4. Maxwell KL, Wildes D, Zarrine-Afsar A et al (2005) Protein folding: defining a “standard” set of experimental conditions and a preliminary kinetic data set of two-state proteins. Prot Sci 14:602–616

    Article  CAS  Google Scholar 

  5. Nölting B (2006) Protein folding kinetics: biophysical methods. Springer, Berlin

    Google Scholar 

  6. Young DC (2009) Computational drug design: a guide for computational and medicinal chemists. Wiley, Hoboken

    Google Scholar 

  7. Fulton KF, Bate MA, Faux NG et al (2007) Protein folding database (PFD 2.0): an online environment for the International Foldeomics Consortium. Nucleic Acids Res 35:D304–D307

    Article  PubMed  CAS  Google Scholar 

  8. Gromiha MM, Oobatake M, Sarai A (1999) Important amino acid properties for enhanced thermostability from mesophilic to thermophilic proteins. Biophys Chem 82:51–67

    Article  PubMed  CAS  Google Scholar 

  9. Gromiha MM, Oobatake M, Kono H, Uedaira H, Sarai A (2000) Importance of surrounding residues for protein stability of partially buried mutations. J Biomol Struct Dyn 18:281–295

    PubMed  CAS  Google Scholar 

  10. R Development Core Team (2005). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3–900051–07–0. URL: http://www.R-project.org.

  11. Krouwer JS, Monti KL (1995) A simple, graphical method to evaluate laboratory assays. Eur J Clin Chem Clin Biochem 33:525–527

    PubMed  CAS  Google Scholar 

  12. Witten IH, Frank E (2005) Data mining: practical machine learning tools and techniques. Elsevier, San Francisco

    Google Scholar 

  13. Editorial (2008) Community cleverness required. Nature 455:1

    Google Scholar 

  14. Pocernich M (2006) R’s role in the climate change debate. R News 6:17–18

    Google Scholar 

Download references

Acknowledgements

The support of the Croatian Ministry of Science, Education and Sports is gratefully acknowledged (grant no. 098–0982929–2524).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nikola Štambuk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer Science+Business Media, LLC

About this chapter

Cite this chapter

Štambuk, N., Konjevoda, P. (2011). The Role of Independent Test Set in Modeling of Protein Folding Kinetics. In: Arabnia, H., Tran, QN. (eds) Software Tools and Algorithms for Biological Systems. Advances in Experimental Medicine and Biology, vol 696. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-7046-6_28

Download citation

Publish with us

Policies and ethics