Skip to main content

Proteogenomic Methods to Improve Genome Annotation

  • Protocol
  • First Online:
Quantitative Proteomics by Mass Spectrometry

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1410))

Abstract

Annotation of protein coding genes in sequenced genomes has been routinely carried out using gene prediction programs guided by available transcript data. The advent of mass spectrometry has enabled the identification of proteins in a high-throughput manner. In addition to searching proteins annotated in public databases, mass spectrometry data can also be searched against conceptually translated genome as well as transcriptome to identify novel protein coding regions. This proteogenomics approach has resulted in the identification of novel protein coding regions in both prokaryotic and eukaryotic genomes. These studies have also revealed that some of the annotated noncoding RNAs and pseudogenes code for proteins. This approach is likely to become a part of most genome annotation workflows in the future. Here we describe a general methodology and approach that can be used for proteogenomics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Lander ES, Linton LM, Birren B et al (2001) Initial sequencing and analysis of the human genome. Nature 409(6822):860–921

    Article  CAS  PubMed  Google Scholar 

  2. Venter JC, Adams MD, Myers EW et al (2001) The sequence of the human genome. Science 291(5507):1304–1351

    Article  CAS  PubMed  Google Scholar 

  3. Pruitt KD, Tatusova T, Maglott DR (2007) NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res 35(database issue):D61–65

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  4. Kersey PJ, Duarte J, Williams A et al (2004) The International Protein Index: an integrated database for proteomics experiments. Proteomics 4(7):1985–1988

    Article  CAS  PubMed  Google Scholar 

  5. UniProt: a hub for protein information (2015). Nucleic Acids Res 43(database issue):D204–D212

    Google Scholar 

  6. Gaudet P, Argoud-Puy G, Cusin I et al (2013) neXtProt: organizing protein knowledge in the context of human proteome projects. J Proteome Res 12(1):293–298

    Article  CAS  PubMed  Google Scholar 

  7. Brosch M, Saunders GI, Frankish A et al (2011) Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and “resurrected” pseudogenes in the mouse genome. Genome Res 21(5):756–767

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  8. Kumar D, Yadav AK, Kadimi PK et al (2013) Proteogenomic analysis of Bradyrhizobium japonicum USDA110 using GenoSuite, an automated multi-algorithmic pipeline. Mol Cell Proteomics 12(11):3388–3397

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  9. Gupta N, Benhamida J, Bhargava V et al (2008) Comparative proteogenomics: combining mass spectrometry and comparative genomics to analyze multiple genomes. Genome Res 18(7):1133–1142

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  10. Castellana NE, Payne SH, Shen Z et al (2008) Discovery and revision of Arabidopsis genes by proteogenomics. Proc Natl Acad Sci U S A 105(52):21034–21038

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  11. Kelkar DS, Kumar D, Kumar P et al (2011) Proteogenomic analysis of Mycobacterium tuberculosis by high resolution mass spectrometry. Mol Cell Proteomics 10(12):M111. 011627

    Article  PubMed Central  PubMed  Google Scholar 

  12. Prasad TS, Harsha HC, Keerthikumar S et al (2012) Proteogenomic analysis of Candida glabrata using high resolution mass spectrometry. J Proteome Res 11(1):247–260

    Article  CAS  PubMed  Google Scholar 

  13. Nagarajha Selvan LD, Kaviyil JE, Nirujogi RS et al (2014) Proteogenomic analysis of pathogenic yeast Cryptococcus neoformans using high resolution mass spectrometry. Clin Proteomics 11(1):5

    Article  PubMed Central  PubMed  Google Scholar 

  14. Pawar H, Sahasrabuddhe NA, Renuse S et al (2012) A proteogenomic approach to map the proteome of an unsequenced pathogen—Leishmania donovani. Proteomics 12(6):832–844

    Article  CAS  PubMed  Google Scholar 

  15. Nirujogi RS, Pawar H, Renuse S et al (2014) Moving from unsequenced to sequenced genome: reanalysis of the proteome of Leishmania donovani. J Proteomics 97:48–61

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  16. Chaerkady R, Kelkar DS, Muthusamy B et al (2011) A proteogenomic analysis of Anopheles gambiae using high-resolution Fourier transform mass spectrometry. Genome Res 21(11):1872–1881

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  17. Kelkar DS, Provost E, Chaerkady R et al (2014) Annotation of the zebrafish genome through an integrated transcriptomic and proteomic analysis. Mol Cell Proteomics 13(11):3184–3198

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  18. Kim MS, Pinto SM, Getnet D et al (2014) A draft map of the human proteome. Nature 509(7502):575–581

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  19. Rice P, Longden I, Bleasby A (2000) EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet 16(6):276–277

    Article  CAS  PubMed  Google Scholar 

  20. Elias JE, Gygi SP (2007) Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods 4:207–214

    Article  CAS  PubMed  Google Scholar 

  21. Jeong K, Kim S, Bandeira N (2012) False discovery rates in spectral identification. BMC Bioinformatics 13:S2

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  22. Bonzon-Kulichenko E, Garcia-Marques F, Trevisan-Herraz M et al (2015) Revisiting peptide identification by high-accuracy mass spectrometry: problems associated with the use of narrow mass precursor windows. J Proteome Res 14(2):700–710

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

We thank the Department of Biotechnology (DBT), Government of India, for research support to the Institute of Bioinformatics. Keshava K. Datta is a recipient of Research Fellowship from the University Grants Commission (UGC), Government of India. Anil K. Madugundu is a recipient of BINC-Research Fellowship from DBT.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Harsha Gowda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Science+Business Media New York

About this protocol

Cite this protocol

Datta, K.K., Madugundu, A.K., Gowda, H. (2016). Proteogenomic Methods to Improve Genome Annotation. In: Sechi, S. (eds) Quantitative Proteomics by Mass Spectrometry. Methods in Molecular Biology, vol 1410. Humana Press, New York, NY. https://doi.org/10.1007/978-1-4939-3524-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4939-3524-6_5

  • Published:

  • Publisher Name: Humana Press, New York, NY

  • Print ISBN: 978-1-4939-3522-2

  • Online ISBN: 978-1-4939-3524-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics