Abstract
The LC-MS/MS shotgun proteomics workflow is widely used to identify and quantify sample peptides and proteins. The technique, however, presents a number of challenges for large-scale use, including the diverse raw data file formats output by mass spectrometers, the large false positive rate among peptide assignments to MS/MS spectra, and the loss of connectivity between identified peptides and the sample proteins that gave rise to them. Here we describe the Trans-Proteomic Pipeline, a freely available open source software suite that provides uniform analysis of LC-MS/MS data from raw data to quantified sample proteins. In a straightforward manner, users can extract MS/MS information from raw data of many instrument formats, submit them to search engines for peptide identification, validate the results to remove false hits, combine together results of multiple search engines, infer sample proteins that gave rise to the identified peptides, and perform quantitation at the peptide and protein levels.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aebersold, R. and Mann, M. (2003) Mass spectrometry-based proteomics. Nature 422, 198–207.
Kohlbacher, O., Reinert, K., Gropl, C., Lange, E., Pfeifer, N., Schulz-Trieglaff, O., and Sturm, M. (2007) TOPP-the OpenMS proteomics pipeline. Bioinformatics 23, e191–e197.
Cox, J. and Mann, M. (2008) MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372.
Keller, A., Eng, J., Zhang, N., Li, X.J., and Aebersold, R. (2005) A uniform proteomics ms/ms analysis platform utilizing open xml file formats. Mol. Syst. Biol. 1, 2005.0017.
TPP Windows Installation Guide. http://tools.proteomecenter.org/wiki/index.php?title=Windows_Installation_Guide.
TPP Source code Installation Guide for Linux. http://tools.proteomecenter.org/wiki/index.php?title=Software:TPP#Source_code_Installation_.28For_Linux_systems.29.
TPP demo. http://tools.proteomecenter.org/wiki/index.php?title=TPP_Demo2009.
TPP training course. http://www.systemsbiology.org/Resources_and_Development/Current_Course_Offerings.
Sashimi site. http://sourceforge.net/projects/sashimi/.
Pedrioli, P.G., Eng, J.K., Hubley, R., Vogelzang, M., Deutsch, E.W., Raught, B., Pratt, B., Nilsson, E., Angeletti, R.H., Apweiler, R., Cheung, K., Costello, C.E., Hermjakob, H., Huang, S., Julian, R.K., Kapp, E., McComb, M.E., Oliver, S.G., Omenn, G., Paton, N.W., Simpson, R., Smith, R., Taylor, C.F., Zhu, W., and Aebersold, R. (2004) A common open representation of mass spectrometry data and its application to proteomics research. Nat. Biotechnol. 22, 1459–1466.
Deutsch, E. (2008) mzML: a single, unifying data format for mass spectrometer output. Proteomics 8, 2776–2777.
Elias, J.E. and Gygi, S.P. (2007) Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214.
MacLean, B., Eng, J.K., Beavis, R.C., and McIntosh, M. (2006) General framework for developing and evaluating database scoring algorithms using the TANDEM search egnine. Bioinformatics 22, 2830–2832.
Geer, L.Y., Markey, S.P., Kowalak, J.A., Wagner, L., Xu, M., Maynard, D.M., Yang, X., Shi, W., and Bryant, S.H. (2004) Open mass spectrometry search algorithm. J. Proteome Res. 3, 958–964.
Tabb, D.L., Fernando, C.G., and Chambers, M.C. (2007) MyriMatch: Highly accurate tandem mass spectral peptide identification by multivariate hypergeometric analysis. J. Proteome Res. 6, 654–661.
Eng, J., McCormack, A.L., and Yates, J.R. (1994) An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein databases. J. Am. Soc. Mass Spectrom. 5, 976–989.
Perkins, D.N., Pappin, D.J., Creasy, D.M., and Cottrell, J.S. (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567.
Tanner, S., Shu, H., Frank, A., Wang, L., Zandi, E., Mumby, M., Pevzner, P.A., and Bafna, V. (2005) Inspect: Fast and accurate identification of post-translationally modified peptides from tandem mass spectra. Anal. Chem. 77, 4626–4639.
Zhang, N., Aerbersold, R., and Schwikowski, B. (2002) ProbID: A probabilistic algorithm to identify peptides through sequence database searching using tandem mass spectral data. Proteomics 10, 1406–1412.
Colinge, J., Masselot, A., Cusin, I., Mahé, E., Niknejad, A., Argoud-Puy, G., Reffas, S., Bederr, N., Gleizes, A., Rey, P.A., and Bougueleret, L. (2004) High-performance peptide identification by tandem mass spectrometry allows reliable automatic data processing in proteomics. Proteomics 4, 1977–1984.
Lam, H., Deutsch, E.W., Eddes, J.S., Eng, J.K., King, N., Stein, S.E., and Aebersold, R. (2007) Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7, 655–667.
Spectral libraries. http://www.peptideatlas.org/speclib/.
Keller, A., Nesvizhskii, A., Kolker, E., and Aebersold, R. (2002) Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392.
Dempster, A., Laird, N., and Rubin, D.B. (1977) Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B39, 1–38.
Malmstrom, J., Lee, H., Nesvizhskii, A., Shteynberg, D., Mohanty, S., Brunner, E., Ye, M., Weber, G., Eckerskorn, C., and Aebersold, R. (2006) Optimized peptide separation and identification for mass spectrometry based proteomics via free-flow electrophoresis. J. Proteome Res. 5, 2241–2249.
Zhang, H., Yi, E.C., Li, X., Mallick, P., Spratt, K., Masselon, C.D., Camp, D.G., Smith, R.D., Kemp, C.J., and Aebersold, R. (2004) High throughput quantitative analysis of serum proteins using glycopeptide capture and liquid chromatography mass spectrometry. Mol. Cell Proteomics 4, 144–155.
Keller, A., Purvine, S., Nesvizhskii, A., Stoliar, S., Goodlett, D., and Kolker, E. (2002). Experimental protein mixture for validating tandem mass spectral analysis. OMICS 6, 207–212.
Shteynberg, D., Deutsch, E.W., Lam, H., Eng, J.K., Sun, Z., Tasman, N., Mendoza, L., Moritz, R., Aebersold, R., and Nesvizhskii, A. Post-processing and validation of tandem mass spectrometry datasets improved by iProphet, in preparation.
Nesvizhskii, A., Keller, A., Kolker, E., and Aebersold, R. (2003). A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 4646–4658.
Firegoose Installation Guide. http://gaggle.systemsbiology.org/docs/geese/firegoose/install/.
Ramos, H., Shannon, P., and Aebersold, R. (2008) The Protein Information and Property Explorer: an easy-to-use, rich-client web application for the management and functional analysis of proteomic data. Bioinformatics 24(18), 2110–2111.
Protein Information and Property Explorer. http://pipe.systemsbiology.net/.
Marzolf, B., Deutsch, E.W., Moss, P., Campbell, D., Johnson, M.H., and Galitski, T. (2006) SBEAMS-Microarray: database software supporting genomic expression analyses for systems biology. BMC Bioinformatics 7, 286.
Han, D.K., Eng, J., Zhou, H., and Aebersold, R. (2003) Quantitative profiling of differentiation-induced microsomal proteins using isotope-coded affinity tags and mass spectrometry. Nat. Biotechnol. 19, 946–951.
Li, X.J., Zhang, H., Ranish, J.A., and Aebersold, R. (2003) Automated statistical analysis of protein abundance ratios from data generated by stable-isotope dilution and tandem mass spectrometry. Anal.Chem. 75, 6648–6657.
Gygi, S.P., Rist, B., Gerber, S.A., Turecek, F., Gelb, M.H., and Aebersold, R. (1999) Quantitative analysis of complex protein mixtures using isotope-coded affinity tags. Nat. Biotechnol. 17, 994–999.
Ong, S.E. and Mann, M. (2007) Stable isotope labeling by amino acids in cell culture for quantitative proteomics. Methods Mol. Biol. 359, 37–52.
Pedrioli, P.G., Raught, B., Zhang, X.D., Rogers, R., Aitchison, J., Matunis, M., and Aebersold, R. (2006) Automated identification of SUMOylation sites using mass spectrometry and SUMmOn pattern recognition software. Nat. Methods 3, 533–539.
Savitzky, A. and Golay, M.J.E. (1964). Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36, 1627–1639.
Li, X.J., Pedrioli, P.G., Eng, J., Martin, D., Yi, E.C., Lee, H., and Aebersold, R. (2004) A tool to visualize and evaluate data obtained by liquid chromatography-electrospray ionization-mass spectrometry. Anal. Chem. 76, 3856–3860.
Nesvizhskii, A.I., Vogelzang, M., and Aebersold, R. (2004) Measuring MS/MS spectrum quality using a robust multivariate classifier. In Proc. 52nd ASMS Conf. Mass Spectrom., Nashville, TN.
Sherwood, C., Eastham, A., Peterson, A., Eng, J.K., Shteynberg, D., Mendoza, L., Deutsch, E., Risler, J., Lee, L.W., Tasman, N., Aebersold, R., Lam, H., and Martin, D.B. (2009) MaRiMba: a software application for spectral library-based MRM transition list assembly. J. Proteome Res. 8(10), 4396–4405.
Acknowledgments
We would like to thank Eric Deutsch and Luis Mendoza for valuable discussions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Keller, A., Shteynberg, D. (2011). Software Pipeline and Data Analysis for MS/MS Proteomics: The Trans-Proteomic Pipeline. In: Wu, C., Chen, C. (eds) Bioinformatics for Comparative Proteomics. Methods in Molecular Biology, vol 694. Humana Press. https://doi.org/10.1007/978-1-60761-977-2_12
Download citation
DOI: https://doi.org/10.1007/978-1-60761-977-2_12
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-60761-976-5
Online ISBN: 978-1-60761-977-2
eBook Packages: Springer Protocols