Skip to main content

False Discovery Rate Estimation in Proteomics

  • Protocol
Statistical Analysis in Proteomics

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1362))

Abstract

With the advancement in proteomics separation techniques and improvements in mass analyzers, the data generated in a mass-spectrometry based proteomics experiment is rising exponentially. Such voluminous datasets necessitate automated computational tools for high-throughput data analysis and appropriate statistical control. The data is searched using one or more of the several popular database search algorithms. The matches assigned by these tools can have false positives and statistical validation of these false matches is necessary before making any biological interpretations. Without such procedures, the biological inferences do not hold true and may be outright misleading. There is a considerable overlap between true and false positives. To control the false positives amongst a set of accepted matches, there is a need for some statistical estimate that can reflect the amount of false positives present in the data processed. False discovery rate (FDR) is the metric for global confidence assessment of a large-scale proteomics dataset. This chapter covers the basics of FDR, its application in proteomics, and methods to estimate FDR.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci U S A 100:9440–9445

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  2. Choi H, Nesvizhskii AI (2008) False discovery rates and related statistical concepts in mass spectrometry-based proteomics. J Proteome Res 7:47–50

    Article  CAS  PubMed  Google Scholar 

  3. Nesvizhskii AI (2010) A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. Proteomics 73:2092–2123

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  4. Kall L, Storey JD, MacCoss MJ et al (2008) Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J Proteome Res 7:29–34

    Article  PubMed  Google Scholar 

  5. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57:289–300

    Google Scholar 

  6. Elias JE, Gygi SP (2007) Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods 4:207–214

    Article  CAS  PubMed  Google Scholar 

  7. Choi H, Ghosh D, Nesvizhskii AI (2008) Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling. J Proteome Res 7:286–292

    Article  CAS  PubMed  Google Scholar 

  8. Keller A, Nesvizhskii AI, Kolker E et al (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem 74:5383–5392

    Article  CAS  PubMed  Google Scholar 

  9. Nesvizhskii AI, Keller A, Kolker E et al (2003) A statistical model for identifying proteins by tandem mass spectrometry. Anal Chem 75:4646–4658

    Article  CAS  PubMed  Google Scholar 

  10. Tabb DL (2008) What’s driving false discovery rates? J Proteome Res 7:45–46

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  11. Kall L, Storey JD, MacCoss MJ et al (2008) Posterior error probabilities and false discovery rates: two sides of the same coin. J Proteome Res 7:40–44

    Article  PubMed  Google Scholar 

  12. Yadav AK, Kadimi PK, Kumar D et al (2013) ProteoStats—a library for estimating false discovery rates in proteomics pipelines. Bioinformatics 29:2799–2800

    Article  CAS  PubMed  Google Scholar 

  13. Fitzgibbon M, Li Q, McIntosh M (2008) Modes of inference for evaluating the confidence of peptide identifications. J Proteome Res 7:35–39

    Article  CAS  PubMed  Google Scholar 

  14. Yadav AK, Perez-Riverol Y (2014) ProteoStats: computing false discovery rates in proteomics. BioCode’s notes, computational proteomics & bioinformatics. http://computationalproteomic.blogspot.com/2014/08/proteostats-computing-false-discovery.html

  15. Navarro P, Vazquez J (2009) A refined method to calculate false discovery rates for peptide identification using decoy databases. J Proteome Res 8:1792–1796

    Article  CAS  PubMed  Google Scholar 

  16. Cerqueira FR, Graber A, Schwikowski B et al (2010) MUDE: a new approach for optimizing sensitivity in the target-decoy search strategy for large-scale peptide/protein identification. J Proteome Res 9:2265–2277

    Article  CAS  PubMed  Google Scholar 

  17. Elias JE, Gygi SP (2010) Target-decoy search strategy for mass spectrometry-based proteomics. Methods Mol Biol 604:55–71

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  18. Reiter L, Claassen M, Schrimpf SP et al (2009) Protein identification false discovery rates for very large proteomics data sets generated by tandem mass spectrometry. Mol Cell Proteomics 8:2405–2417

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  19. Perkins DN, Pappin DJ, Creasy DM et al (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551–3567

    Article  CAS  PubMed  Google Scholar 

  20. Yadav AK, Kumar D, Dash D (2011) MassWiz: a novel scoring algorithm with target-decoy based analysis pipeline for tandem mass spectrometry. J Proteome Res 10:2154–2160

    Article  CAS  PubMed  Google Scholar 

  21. Geer LY, Markey SP, Kowalak JA et al (2004) Open mass spectrometry search algorithm. J Proteome Res 3:958–964

    Article  CAS  PubMed  Google Scholar 

  22. Craig R, Beavis RC (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20:1466–1467

    Article  CAS  PubMed  Google Scholar 

  23. Tabb DL, Fernando CG, Chambers MC (2007) MyriMatch: highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J Proteome Res 6:654–661

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  24. Eng JK, Jahan TA, Hoopmann MR (2013) Comet: an open-source MS/MS sequence database search tool. Proteomics 13:22–24

    Article  CAS  PubMed  Google Scholar 

  25. Yadav AK, Kumar D, Dash D (2012) Learning from decoys to improve the sensitivity and specificity of proteomics database search results. PLoS One 7, e50651

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  26. Brosch M, Yu L, Hubbard T et al (2009) Accurate and sensitive peptide identification with Mascot Percolator. J Proteome Res 8:3176–3181

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  27. Spivak M, Weston J, Bottou L et al (2009) Improvements to the percolator algorithm for peptide identification from shotgun proteomics data sets. J Proteome Res 8:3737–3745

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  28. Wright JC, Collins MO, Yu L et al (2012) Enhanced peptide identification by electron transfer dissociation using an improved mascot percolator. Mol Cell Proteomics 11:478–491

    Article  PubMed Central  PubMed  Google Scholar 

  29. Shao C, Sun W, Li F et al (2009) Oscore: a combined score to reduce false negative rates for peptide identification in tandem mass spectrometry analysis. J Mass Spectrom 44:25–31

    Article  CAS  PubMed  Google Scholar 

  30. Ma ZQ, Dasari S, Chambers MC et al (2009) IDPicker 2.0: improved protein assembly with high discrimination peptide identification filtering. J Proteome Res 8:3872–3881

    Article  PubMed Central  CAS  PubMed  Google Scholar 

Download references

Acknowledgement

S.A. is supported by SRF grant and A.K.Y. is supported by Innovative Young Biotechnologist Award (IYBA) grant and DDRC-SFC grant from Department of Biotechnology (DBT), India. Authors acknowledge Manu Kandpal for proofreading the manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amit Kumar Yadav Ph.D. .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this protocol

Cite this protocol

Aggarwal, S., Yadav, A.K. (2016). False Discovery Rate Estimation in Proteomics. In: Jung, K. (eds) Statistical Analysis in Proteomics. Methods in Molecular Biology, vol 1362. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3106-4_7

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-3106-4_7

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-3105-7

  • Online ISBN: 978-1-4939-3106-4

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics