Skip to main content
Log in

On Comprehensive Mass Spectrometry Data Analysis for Proteome Profiling of Human Blood Samples

  • Research Article
  • Published:
Journal of Healthcare Informatics Research Aims and scope Submit manuscript

Abstract

To guarantee meaningful interpretation of data in basic and translational medicine, it is critical to ensure the quality of biological samples. Mass spectrometers have become promising instruments to acquire proteomic information that is known to be associated with the quality of samples. However, a universally applicable mass spectrometry data analysis platform for quality assessment remains of great need. We present a comprehensive pattern recognition study to facilitate the development of such a platform. This study involves feature extraction, binary classification, and feature ranking. In this study, we develop classifiers with classification accuracy higher than 90% in distinguishing human serum samples stored for different amounts of time. We also derive fingerprint patterns of serum peptides that can be conveniently used for temporal classification.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Ayache S et al (2006) Effects of storage time and exogenous protease inhibitors on plasma protein levels. Am J Clin Pathol 126(2):174. https://doi.org/10.1309/3WM7XJ7RD8BCLNKX

    Article  Google Scholar 

  2. Baggerly KA, Morris JS, Coombes KR (2004) Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics 20(5):777–785

    Article  Google Scholar 

  3. Ball G et al (2002) An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers. Bioinformatics 18(3):395–404

    Article  MathSciNet  Google Scholar 

  4. Bishop CM (2006) Pattern recognition and machine learning (information science and statistics). Springer-Verlag New York, Inc., Secaucus isbn: 0387310738

    MATH  Google Scholar 

  5. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140

    MATH  Google Scholar 

  6. Carvalho PC et al (2008) Identifying differences in protein expression levels by spectral counting and feature selection. Genet Mol Res 7(2):342

    Article  Google Scholar 

  7. Chaigneau C et al (2007) Serum biobank certification and the establishment of quality controls for biological fluids: examples of serum biomarker stability after temperature variation. Clin Chem Lab Med 45(10):1390–1395

    Article  Google Scholar 

  8. Datta S, DePadilla LM (2006) Feature selection and machine learning with mass spectrometry data for distinguishing cancer and noncancer samples. Stat Methodol 3(1):79–92

    Article  MathSciNet  MATH  Google Scholar 

  9. Jackson DH, Banks RE (2010) Banking of clinical samples for proteomic biomarker studies: a consideration of logistical issues with a focus on pre-analytical variation. Proteomics Clin Appl 4(3):250–270

    Article  Google Scholar 

  10. Jenkins MA (2004) Quality control and quality assurance aspects of the routine use of capillary electrophoresis for serum and urine proteins in clinical laboratories. Electrophoresis 25(10–11):1555–1560

    Article  Google Scholar 

  11. Kozak KR et al (2003) Identification of biomarkers for ovarian cancer using strong anion-exchange ProteinChips: potential use in diagnosis and prognosis. Proc Natl Acad Sci 100(21):12343–12348

    Article  Google Scholar 

  12. Levner I (2005) Feature selection and nearest centroid classification for protein mass spectrometry. BMC Bioinformatics 6(1):1

    Article  MathSciNet  Google Scholar 

  13. Liang K et al (2016) Mesoporous silica chip: enabled peptide profiling as an effective platform for controlling bio-sample quality and optimizing handling procedure. Clin Proteomics 13(1):34. issn: 1559–0275. https://doi.org/10.1186/s12014-016-9134-9

    Article  Google Scholar 

  14. Ostroff R et al (2010) The stability of the circulating human proteome to variations in sample collection and handling procedures measured with an aptamer-based proteomics array. J Proteomics 73(3):649–666

    Article  Google Scholar 

  15. Papadopoulos MC et al (2004) A novel and accurate diagnostic test for human African trypanosomiasis. Lancet 363(9418):1358–1363

    Article  Google Scholar 

  16. Petricoin EF et al (2002) Use of proteomic patterns in serum to identify ovarian cancer. Lancet 359(9306):572–577

    Article  Google Scholar 

  17. Pieragostino D et al (2010) Pre-analytical factors in clinical proteomics investigations: impact of ex vivo protein modifications for multiple sclerosis biomarker discovery. J Proteomics 73(3):579–592. Blood Proteomics, issn: 1874–3919. https://doi.org/10.1016/j.jprot.2009.07.014 http://www.sciencedirect.com/science/article/pii/S1874391909002395

    Article  Google Scholar 

  18. Rai AJ et al (2005) HUPO Plasma Proteome Project specimen collection and handling: towards the standardization of parameters for plasma proteome samples. Proteomics 5(13):3262–3277

    Article  Google Scholar 

  19. Russell SJ et al (2003) Artificial intelligence: a modern approach. Vol. 2. Prentice hall, Upper Saddle River

    Google Scholar 

  20. Sorace JM, Zhan M (2003) A data review and re-assessment of ovarian cancer serum proteomic profiling. BMC Bioinformatics 4(1):1

    Article  Google Scholar 

  21. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Society Ser B (Methodol) 267–288

    MathSciNet  MATH  Google Scholar 

  22. Tibshirani R et al (2004) Sample classification from protein mass spectrometry, by ‘peak probability contrasts’. Bioinformatics 20(17):3034–3044

    Article  Google Scholar 

  23. Veenstra TD et al (2005) Biomarkers: mining the biofluid proteome. Mol Cell Proteomics 4(4):409–418. https://doi.org/10.1074/mcp.M500006-MCP200 eprint: http://www.mcponline.org/content/4/4/409.full.pdf+html. url: http://www.mcponline.org/content/4/4/409.abstract

    Article  Google Scholar 

  24. Villanueva J, Philip J, Chaparro CA, Li Y, Toledo-Crow R, DeNoyer L, Fleisher M, Robbins RJ, Tempst P (2005) Correcting common errors in identifying cancer-specific serum peptide signatures. J Proteome Res 4(4):1060–1072

    Article  Google Scholar 

  25. Wagner M, Naik D, Pothen A (2003) Protocols for disease classification from mass spectrometry data. Proteomics 3(9):1692–1698

    Article  Google Scholar 

  26. Won Y et al (2003) Pattern analysis of serum proteome distinguishes renal cell carcinoma from other urologic diseases and healthy persons. Proteomics 3(12):2310–2316

    Article  Google Scholar 

  27. Wu B et al (2003) Comparison of statistical methods for classification of ovarian cancer using mass spectrometry data. Bioinformatics 19(13):1636–1643

    Article  Google Scholar 

  28. Yasui Y et al (2003) A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection. Biostatistics 4(3):449–463

    Article  MATH  Google Scholar 

  29. Yu JS et al (2005) Ovarian cancer identification based on dimensionality reduction for high-throughput mass spectrometry data. Bioinformatics 21(10):2200–2209

    Article  Google Scholar 

  30. Zhang X et al (2006) Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data. BMC Bioinformatics 7(1):1

    Article  MathSciNet  Google Scholar 

Download references

Funding

This study received financial support from NSF grant DMS#1246818 and an industry grant from the Chinese Academy of Sciences Holding Co., Ltd.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nan Kong.

Appendix

Appendix

Before storage, the samples were left at room temperature for 1 h in order to allow coagulation and then centrifuged at 4° C for 15 min at 1400×g. In order to avoid fluid in the buffy-coat layer, serum was aspirated and collected in polypropylene tubes. After aliquoting, the samples were then stored in one of two conditions, room temperature or 4° C. For both cohorts, each sample’s mass spectrometer data was collected the day the sample was taken and then 1, 2, 5, and 10 days after that. This data was collected using a 1-μL sample that was processed by a mesoporous silicon wafer that was prepared by pre-baking in an oven at 120° C. This sample was spotted on the MALDI target plate and then allowed to air-dry. Afterwards, a 1-μL matrix in 50% acetonitrile containing 0.1% TFA was spotted on the dried sample spot. This sample was allowed to co-crystallize. The mass spectrum data was obtained by using a SHIMADZU AXIMA Resonance MALDI-IT-TOF equipped with a nitrogen laser emitting light at 337 nm. It had an adjustable mass range of 800 to 4000 Da. The positive ion was detected under reflective mode. After taking 500 laser shots, the spectra were usually averaged to find the final sample spectrum. The optimized accelerating voltage was 50 kV.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Manchanda, S., Meyer, M., Li, Q. et al. On Comprehensive Mass Spectrometry Data Analysis for Proteome Profiling of Human Blood Samples. J Healthc Inform Res 2, 305–318 (2018). https://doi.org/10.1007/s41666-018-0022-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s41666-018-0022-0

Keywords

Navigation