Introduction

The analysis of peptides, proteins, and metabolites by liquid chromatography-mass spectrometry (LC-MS) is susceptible to a wide variety of contaminants that can compromise downstream analysis. The introduction of these contaminants may lead to intensive examination of workflows and reagents to identify the source, costing both time and money. In addition to the source of the sample, the complexity of sample preparation and workflows can lead to the introduction of new reagents and materials which may present unknown interferences to downstream analysis (Fig. 1a). One of the most common contaminants observed in proteomic workflows are surfactants like polyethylene glycol (PEG) that are introduced during sample preparation [1, 2]. These contaminants are particularly disruptive as they lead to ion suppression and often interfere with the target ion(s) of interest [3, 4]. Other sources of common contaminants include plasticizers such as phthalate esters [5] and slip agents such as erucamide [6]. In addition to contamination of the liquid phase, there is also the potential for gas phase contamination from the laboratory air environment. Polydimethylcyclosiloxanes are common additives to skin care and cosmetic products and are ubiquitous in the laboratory air environment leading to high background signals in nanoflow LC-MS [7]. Fortunately, this type of contamination can be partially mitigated using active background ion reduction devices. For an extensive review on the sources and types of contamination in LC-MS, see the following review [6].

Figure 1
figure 1

(a) Schematic representation of potential sources of contamination during proteomic workflows. (b) Workflow for utilizing Skyline’s molecular extraction tools to generate a transition list of non-proteinaceous molecules that can be monitored at the MS1 scan level. SPE, solid phase extraction; LC-MS/MS, liquid chromatography-tandem mass spectrometry; PEG, polyethylene glycol

There are currently a variety of peptide standards and tools for assessing the performance of a mass spectrometer as well as quality control metrics [8,9,10]. However, despite the multitude of contamination entry points to proteomic workflows and their prevalence in samples, containers, and reagents, the modern protein chemist does not have the ability to rapidly assess MS data for the presence and levels of known contaminants beyond the manual interrogation of raw data. Here, I present an approach for rapidly assessing sample contamination using full-scan MS1 filtering in Skyline with a customizable transition list that provides a starting point for the rapid identification of common contaminants in proteomic workflows. Skyline is an open-source label-free quantitation application originally developed for multiple reaction monitoring experiments [11] and later expanded to full-scan MS1 data [12,13,14]. Skyline features tools for viewing graphical displays of extracted ion chromatograms and is capable of processing data from most major vendors [15], making the approach described here to monitor common contaminants widely accessible.

Experimental

Non-Proteinaceous Transition List in Skyline

The list of molecular contaminants used in the current version (Supplemental Table S1) was compiled from a collection of reviews and reports on interferences and contaminants in mass spectrometry [1, 6, 7, 16]. Inserting a non-protein transition list into Skyline requires several pieces of information: molecule list name, precursor name, molecular formula, adduct ion (e.g., H+, Na+, NH4+), precursor mass-to-charge, and charge state. All molecules were listed as singly charged based on previous reports [6, 17]. For polymers such as PEG, the molecular list name remains constant while the precursor name varies with polymer length. Total PEG contamination, as with other polymers, is then viewed by highlighting only the molecular list name in the Skyline transition tree.

MS1 Filtering in Skyline

Skyline is an open-source software application that is freely available for download [11, 15]. For additional details and tutorials, visit the Skyline website (http://proteome.gs.washington.edu/software/skyline). Full-scan (MS1) features were set to Orbitrap for precursor mass analyzer, resolving power of 120,000 at 400 m/z and one isotopic peak. Instrument scan range was set to 350–1500 m/z. Raw data files were imported directly into Skyline (v4.1.0.11714) and ion intensity chromatograms are displayed for a single isotopic peak. The Skyline contamination template file can be viewed and downloaded via the Panorama Public data repository: https://panoramaweb.org/labkey/contaminants.url.

Instrumentation

Data was acquired utilizing a Waters nanoACQUITY M-class system (Waters, Milford, MA) in-line with an Orbitrap Fusion tribrid mass spectrometer (Thermo Fisher Scientific, San Jose, CA) equipped with a Digital PicoView nanospray source (DPV550, New Objective, Woburn, MA). Samples were separated on a 150 mm × 75 μm C18 charged surface hybrid column with 1.7-μm particle size (Waters, Milford, MA) at a flow rate of 300 nL/min. Data was acquired in positive ion mode using a top speed method at an MS1 resolution of 120,000.

Results and Discussion

Characterization of proteins and peptides by mass spectrometry utilizes a wide variety of sample preparation methods from intact protein analysis to diverse procedures requiring isolation and homogenization of tissues for generating a protein matrix (Fig. 1a). Protein mixtures can then undergo a number of procedures such as enrichment or depletion followed by proteolytic digestion. The resulting peptide mixtures can then be further processed by fractionation or labeling prior to a desalting step before analysis by LC-MS/MS. Each stage or reagent in the workflow is a potential source of contamination and mitigation of potential interfering compounds is a time-consuming and difficult process. To rapidly assess mass spectrometry data for known sources of contamination, a molecular library was developed using previously compiled databases [6, 16] and the open-source application Skyline [11] (Fig. 1b). The molecular transition list consists of 64 parent molecules and 800 molecular species (Supplemental Table S1). This transition list contains commonly observed contaminants in proteomic-based workflows including surfactants like PEG and Triton X-100, plasticizers such as diisoocytl phthalate, slip agents like erucamide, polysiloxanes commonly found in beauty products, and bittering agents like denatonium from low-purity solvents (Table 1). In addition to the protonated form of the molecule, ammoniated or sodiated forms are also included in some cases. Using this template in Skyline allows one to rapidly assess their samples for known contaminants that may interfere with downstream analysis.

Table 1 Abbreviated list of common types of contaminants routinely observed in proteomic workflows that can be monitored using Skyline

To demonstrate the utility of this approach, a raw data file with regularly spaced peaks in the chromatogram was examined (Fig. 2a). The extracted MS1 scan (350–1500 m/z) from this region of the gradient displays two ion series separated by repeating units of 44.026 (Fig. 2b). This ion series is a hallmark of polymer contamination and both ion series from the chromatogram correspond to the protonated and ammoniated form of PEG ([C2H4O]nH2O+H+ and [C2H4O]nH2O+NH4+, respectively). The raw data file was then imported into Skyline containing the molecular contaminant transition list (Supplemental Table S1) and the MS1 peak area was extracted for each molecular species corresponding to PEG1–20 (Fig. 2c). The graphical display in Skyline demonstrates the sample is heavily contaminated with PEG polymers ranging from PEG8 ([C2H4O]8H2O+H+-371.2276+) to PEG20 ([C2H4O]20H2O+H+- 899.5421+) with individual peaks spread across several minutes of the gradient. Another common contaminant observed in proteomic workflows is the detergent Triton X-100 often used for solubilization of biological samples. In contrast to PEG contamination which tends to elute with regularly spaced peaks spread across the gradient (Fig. 2a), polymers of Triton X-100 elute as one broad peak (Fig. 2d). Similar to PEG, Triton X-100 also displays a molecular ion series separated by 44.026 Da and in this case both the protonated and ammoniated forms are also present, C14H22O[C2H4O]n+H+ and C14H22O[C2H4O]n+NH4+, respectively (Fig. 2e). Extraction of the MS1 scan in Skyline reveals a series of overlapping peaks that co-elute within a few minutes of each other (Fig. 2f). These two examples demonstrate the feasibility of using Skyline for assessing sample integrity during proteomic-based workflows for non-protein-based contaminants. In addition, once a species is added to the list of molecules to monitor, one no longer needs to undertake the tedious task of matching up ions manually from published databases.

Figure 2
figure 2

Monitoring contaminants in Skyline. (a) Base peak of evenly spaced peaks observed during an LC-MS/MS acquisition containing PEG. (b) Extracted MS1 scan of chromatogram shown in (a) above, with two ion series separated by repeating units of ethylene oxide at 44.026 Da. The protonated form of PEG; ([C2H4O]nH2O+H+) is separated by black arrows and the blue arrows represent the ammoniated forms of PEG ([C2H4O]nH2O+NH4+). (c) Chromatograms, peak intensities, and retention times for PEG molecules extracted and visualized in Skyline. Peak labels represent the number of ethylene oxide units. (d) Triton X (may contain Triton X-100, X-114, X-405, or X-45) base peak observed during an LC-MS/MS acquisition. (e) Extracted MS1 scan of chromatogram shown in (d) above with two ion series separated by repeating units of ethylene glycol 44.026 Da. The protonated form of Triton X-100 (C14H22O[C2H4O]n+H+) is separated by black arrows and the blue arrow represents the ammoniated form of Triton X-100 (C14H22O[C2H4O]n+NH4+). (f) Chromatograms, peak intensities, and retention times for Triton X molecules extracted and visualized in Skyline. Peak labels TX5–18 represent the number of ethylene oxide units

Conclusion

Although several tools and approaches have been developed to assess instrument performance metrics such as reproducibility and sensitivity, little effort has been done to help researchers rapidly interrogate the integrity of their samples for molecular interferences. The current work provides an approach for rapidly assessing contamination of mass spectrometry data by non-proteinaceous molecules saving both time and valuable resources. The current molecular transition list is not meant to be comprehensive, but rather a starting point for which one can easily modify and adapt to various analytical needs. Although this approach does not identify unknown species, I have found that utilizing mass to formula calculators [18] can readily serve this purpose. Finally, by adapting an open-source vendor-neutral software platform like Skyline, this approach is easily adaptable to most proteomic workflows and mass spectrometry platforms.