Advertisement

Bolt: a New Age Peptide Search Engine for Comprehensive MS/MS Sequencing Through Vast Protein Databases in Minutes

  • Amol PrakashEmail author
  • Shadab Ahmad
  • Swetaketu Majumder
  • Conor Jenkins
  • Ben Orsburn
Research Article

Abstract

Recent increases in mass spectrometry speed, sensitivity, and resolution now permit comprehensive proteomics coverage. However, the results are often hindered by sub-optimal data processing pipelines. In almost all MS/MS peptide search engines, users must limit their search space to a canonical database due to time constraints and q value considerations, but this typically does not reflect the individual genetic variations of the organism being studied. In addition, engines will nearly always assume the presence of only fully tryptic peptides and limit PTMs to a handful. Even on high-performance servers, these search engines are computationally expensive, and most users decide to dial back their search parameters. We present Bolt, a new cloud-based search engine that can search more than 900,000 protein sequences (canonical, isoform, mutations, and contaminants) with 41 post-translation modifications and N-terminal and C-terminal partial tryptic search in minutes on a standard configuration laptop. Along with increases in speed, Bolt provides an additional benefit of improvement in high-confidence identifications. Sixty-one percent of peptides uniquely identified by Bolt may be validated by strong fragmentation patterns, compared with 13% of peptides uniquely identified by SEQUEST and 6% of peptides uniquely identified by Mascot. Furthermore, 30% of unique Bolt identifications were verified by all three software on the longer gradient analysis, compared with only 20% and 27% for SEQUEST and Mascot identifications respectively. Bolt represents, to the best of our knowledge, the first fully scalable, cloud-based quantitative proteomic solution that can be operated within a user-friendly GUI interface. Data are available via ProteomeXchange with identifier PXD012700.

Keywords

Mass spectrometry Proteomics Peptide Mutations Search engine MS/MS Sequencing Variants Cloud Bolt 

Notes

Acknowledgements

We would like to acknowledge Simion Kreimer, Ph.D. (Johns Hopkins University) and Dragana Lagundzin, Ph.D. (University of Nebraska) for their help with the Mascot analysis.

Supplementary material

13361_2019_2306_MOESM1_ESM.xlsx (14 kb)
Supplementary Table 1 (XLSX 13 kb)
13361_2019_2306_MOESM2_ESM.xlsx (9 kb)
Supplementary Table 2 (XLSX 8 kb)

References

  1. 1.
    Hebert, A.S., Richards, A.L., Bailey, D.J., et al.: The one hour yeast proteome. Mol. Cell. Proteomics. 13(1), 339–347 (2013)CrossRefGoogle Scholar
  2. 2.
    Shishkova, E., Hebert, A.S., Coon, J.J.: Now, more than ever, proteomics needs better chromatography. Cell Syst. 3(4), 321–324 (2016)CrossRefGoogle Scholar
  3. 3.
    Zhang, Y., Fonslow, B.R., Shan, B., Baek, M.C., Yates, J.R.: Protein analysis by shotgun/bottom-up proteomics. Chem. Rev. 113(4), 2343–2394 (2013)CrossRefGoogle Scholar
  4. 4.
    Scheltema, R.A., Hauschild, J.-P., Lange, O., Hornburg, D., Denisov, E., Damoc, E., Kuehn, A., Makarov, A., Mann, M.: The Q Exactive HF, a benchtop mass spectrometer with a pre-filter, high-performance quadrupole and an ultra-high-field Orbitrap analyzer. Mol. Cell. Proteomics. 13(12), 3698–3708 (2014)CrossRefGoogle Scholar
  5. 5.
    Doerr, A.: DIA mass spectrometry. Nat. Methods. 12, 35 (2014)CrossRefGoogle Scholar
  6. 6.
    Prakash, A., Peterman, S., Ahmad, S., Sarracino, D., Frewen, B., Vogelsang, M., Byram, G., Krastins, B., Vadali, G., Lopez, M.: Hybrid data acquisition and processing strategies with increased throughput and selectivity: PSMART analysis for global qualitative and quantitative analysis. J. Proteome Res. 13(12), 5415–5430 (2014)CrossRefGoogle Scholar
  7. 7.
    Meier, F., Geyer, P.E., Virreira Winter, S., Cox, J., Mann, M.: BoxCar acquisition method enables single-shot proteomics at a depth of 10,000 proteins in 100 minutes. Nat. Methods. 15(6), 440–448 (2018)CrossRefGoogle Scholar
  8. 8.
    Yates, J.R., Eng, J.K., McCormack, A.L., Schieltz, D.: Method to correlate tandem mass spectra of modified peptides to amino acid sequences in the protein database. Anal. Chem. 67(8), 1426–1436 (1995)CrossRefGoogle Scholar
  9. 9.
    Perkins, D.N., Pappin, D.J., Creasy, D.M., Cottrell, J.S.: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 20(18), 3551–3567 (1999)CrossRefGoogle Scholar
  10. 10.
    Cox, J., Neuhauser, N., Michalski, A., Scheltema, R.A., Olsen, J.V., Mann, M.: Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10(4), 1794–1805 (2011)CrossRefGoogle Scholar
  11. 11.
    Dorfer, V., Pichler, P., Stranzl, T., Stadlmann, J., Taus, T., Winkler, S., Mechtler, K.: MS Amanda, a universal identification algorithm optimized for high accuracy tandem mass spectra. J. Proteome Res. 13(8), 3679–3684 (2014)CrossRefGoogle Scholar
  12. 12.
    Williamson, N.A.: Operational experience of an open-access, subscription-based mass spectrometry and proteomics facility. J. Am. Soc. Mass Spectrom. 29(3), 439–446 (2018)CrossRefGoogle Scholar
  13. 13.
    Friedman, D.B., Andacht, T.M., Bunger, M.K., Chien, A.S., Hawke, D.H., Krijgsveld, J., Lane, W.S., Lilley, K.S., Maccoss, M.J., Moritz, R.L., et al.: The ABRF proteomics research group studies: educational exercises for qualitative and quantitative proteomic analyses. Proteomics. 11(8), 1371–1381 (2011)CrossRefGoogle Scholar
  14. 14.
    Bekker-Jensen, D.B., Kelstrup, C.D., Batth, T.S., Larsen, S.C., Haldrup, C., Bramsen, J.B., Sorensen, K.D., Hoyer, S., Orntoft, T.F., Andersen, C.L., et al.: An optimized shotgun strategy for the rapid generation of comprehensive human proteomes. Cell Syst. 4(6), 587–599 (2017)CrossRefGoogle Scholar
  15. 15.
    Kong, A.T., Leprevost, F.V., Avtonomov, D.M., Mellacheruvu, D., Nesvizhskii, A.I.: MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat. Methods. 14(5), 513–520 (2017)CrossRefGoogle Scholar
  16. 16.
    Solntsev, S.K., Shortreed, M.R., Frey, B.L., Smith, L.M.: Enhanced global post-translational modification discovery with MetaMorpheus. J. Proteome Res. 17(5), 1844–1851 (2018)CrossRefGoogle Scholar
  17. 17.
    Millikin, R.J., Solntsev, S.K., Shortreed, M.R., Smith, L.M.: Ultrafast peptide label-free quantification with FlashLFQ. J. Proteome Res. 17(1), 386–391 (2018)CrossRefGoogle Scholar
  18. 18.
    Perez-Riverol, Y., Csordas, A., Bai, J., et al.: The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucleic Acids Res. 47(D1), D442–D450 (2018)CrossRefGoogle Scholar
  19. 19.
    The, M., MacCoss, M.J., Noble, W.S., Käll, L.: Fast and accurate protein false discovery rates on large-scale proteomics data sets with percolator 3.0. J. Am. Soc. Mass Spectrom. 27(11), 1719–1727 (2016)CrossRefGoogle Scholar
  20. 20.
    Yang, X., Lazar, I.M.: XMAn: a Homo sapiens mutated-peptide database for the MS analysis of cancerous cell states. J. Proteome Res. 13(12), 5486–5495 (2014)CrossRefGoogle Scholar
  21. 21.
    Liu, X., Inbar, Y., Dorrestein, P.C., et al.: Deconvolution and database search of complex tandem mass spectra of intact proteins: a combinatorial approach. Mol. Cell. Proteomics. 9(12), 2772–2782 (2010)CrossRefGoogle Scholar
  22. 22.
    Awan, M.G., Saeed, F.: MS-REDUCE: an ultrafast technique for reduction of big mass spectrometry data for high-throughput processing. Bioinformatics. 32(10), 1518–1526 (2016)CrossRefGoogle Scholar
  23. 23.
    Bern, M.; Kil, Y. J.; Becker, C. Byonic: Advanced peptide and protein identification software. Curr. Protoc. Bioinforma. 2012;13;Unit13.20Google Scholar
  24. 24.
    Craig, R., Beavis, R.C.: TANDEM: matching proteins with tandem mass spectra. Bioinformatics. 20(9), 1466–1467 (2004)CrossRefGoogle Scholar
  25. 25.
    Ma, B., Zhang, K., Hendrie, C., et al.: PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 17(20), 2337–2342 (2003)CrossRefGoogle Scholar
  26. 26.
    Shilov, I.V., Seymour, S.L., Patel, A.A., et al.: The Paragon Algorithm, a next generation search engine that uses sequence temperature values and feature probabilities to identify peptides from tandem mass spectra. Mol. Cell. Proteomics. 6(9), 1638–1655 (2007)CrossRefGoogle Scholar
  27. 27.
    Nesvizhskii, A.I.: Proteogenomics: concepts, applications and computational strategies. Nat. Methods. 11(11), 1114–1125 (2014)CrossRefGoogle Scholar
  28. 28.
    Dorfer, V., Maltsev, S., Winkler, S., Mechtler, K.: CharmeRT: boosting peptide identifications by chimeric spectra identification and retention time prediction. J. Proteome Res. 17(8), 2581–2589 (2018)CrossRefGoogle Scholar

Copyright information

© American Society for Mass Spectrometry 2019

Authors and Affiliations

  1. 1.Optys Tech CorporationShrewsburyUSA
  2. 2.Department of BiologyHood CollegeFrederickUSA
  3. 3.Proteomic und Genomic SciencesBaltimoreUSA

Personalised recommendations