SpecTrees: An Efficient Without a Priori Data Structure for MS/MS Spectra Identification

David, Matthieu; Fertin, Guillaume; Tessier, Dominique

doi:10.1007/978-3-319-43681-4_6

Matthieu David^15,16,
Guillaume Fertin¹⁵ &
Dominique Tessier¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 9838))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

1414 Accesses
3 Citations

Abstract

Tandem Mass Spectrometry (or MS/MS) is the most common strategy used to identify unknown proteins present in a mixture. It generates thousands of MS/MS spectra per sample, each one having to be compared to a large reference database from which artificial spectra are produced. The goal is to map each experimental spectrum to an artificial one, so as to identify the proteins they come from. However, this comparison step is highly time consuming. Thus, in order to reduce computation time, most methods filter a priori the reference database. This tends to discard potential candidates and leads to frequent errors and lacks of identifications. We have developed an original alternate method, efficient both in terms of memory and computation time, that allows to pairwise compare spectra without any a priori filtering. The core of our method is SpecTrees, a data structure designed towards this goal, that stores all the input spectra without any filtering. It is designed to be easy to implement, and is also highly scalable and incremental. Once SpecTrees is built, one can run its own identification process by extracting from SpecTrees any information of interest, including pairwise spectra comparison. In this paper, we first present SpecTrees, its main features and how to implement it. We then experiment our method on two sets of experimental spectra from the ISB standard 18 proteins mixture, thereby showing its rapidity and its ability to make identifications that other software do not reach.

Supported by GRIOTE project, funded by Région Pays de la Loire.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chi, H., He, K., Yang, B., Chen, Z., Sun, R.-X., Fan, S.-B., Zhang, K., Liu, C., Yuan, Z.-F., Wang, Q.-H., Liu, S.-Q., Dong, M.-Q., He, S.-M.: pFind-Alioth: a novel unrestricted database search algorithm to improve the interpretation of high-resolution MS/MS data. J. Proteomics 125, 89–97 (2015)
Article Google Scholar
Cliquet, F., Fertin, G., Rusu, I., Tessier, D.: Comparison of spectra in unsequenced species. In: Guimarães, K.S., Panchenko, A., Przytycka, T.M. (eds.) BSB 2009. LNCS, vol. 5676, pp. 24–35. Springer, Heidelberg (2009)
Chapter Google Scholar
Craig, R., Beavis, R.C.: TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20(9), 1466–1467 (2004). (Oxford, England)
Article Google Scholar
Eng, J.K., McCormack, A.L., Yates, J.R.: An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 5(11), 976–989 (1994)
Article Google Scholar
Käll, L., Vitek, O.: Computational mass spectrometry-based proteomics. PLoS Comput. Biol. 7(12), e1002277 (2011)
Article Google Scholar
Klimek, J., Eddes, J.S., Hohmann, L., Jackson, J., Peterson, A., Letarte, S., Gafken, P.R., Katz, J.E., Mallick, P., Lee, H., Schmidt, A., Ossola, R., Eng, J.K., Aebersold, R., Martin, D.B.: The standard protein mix database: a diverse dataset to assist in the production of improved peptide and protein identification software tools. J. Proteome Res. 7(1), 96–103 (2008)
Article Google Scholar
Perkins, D.N., Pappin, D.J., Creasy, D.M., Cottrell, J.S.: Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20(18), 3551–3567 (1999)
Article Google Scholar
Pevzner, P.A., Dancik, V., Tang, C.L.: Mutation-tolerant protein identification by mass spectrometry. J. Comput. Biol. 7(6), 777–787 (2000)
Article Google Scholar
Tanner, S., Payne, S.H., Dasari, S., Shen, Z., Wilmarth, P.A., David, L.L., Loomis, W.F., Briggs, S.P., Bafna, V.: Accurate annotation of peptide modifications through unrestrictive database search. J. Proteome Res. 7(1), 170–181 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

LINA UMR CNRS 6241, Université de Nantes, Nantes, France
Matthieu David & Guillaume Fertin
INRA UR1268 Biopolymères Interactions Assemblages, 44316, Nantes, France
Matthieu David & Dominique Tessier

Authors

Matthieu David
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Fertin
View author publications
You can also search for this author in PubMed Google Scholar
Dominique Tessier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthieu David .

Editor information

Editors and Affiliations

AIST and University of Tokyo , Tokyo, Japan
Martin Frith
Aarhus University, Aarhus, Denmark
Christian Nørgaard Storm Pedersen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

David, M., Fertin, G., Tessier, D. (2016). SpecTrees: An Efficient Without a Priori Data Structure for MS/MS Spectra Identification. In: Frith, M., Storm Pedersen, C. (eds) Algorithms in Bioinformatics. WABI 2016. Lecture Notes in Computer Science(), vol 9838. Springer, Cham. https://doi.org/10.1007/978-3-319-43681-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-43681-4_6
Published: 06 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-43680-7
Online ISBN: 978-3-319-43681-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics