Introduction

Recently, there has been a growing interest in protein glycosylation analysis. The main driving force may be the widespread pharmaceutical use of recombinant proteins as therapeutics, most of which are glycosylated. Since changes in glycosylation may alter the physical properties of these biologicals, as well as their biological activity and immunogenicity, site-specific, in-depth glycosylation analysis of these proteins is essential [1]. In addition to assisting with proper protein folding, protein processing, and controlling protein survival in the circulation, extracellular glycosylation plays a crucial role in cell adhesion [2,3,4], influences intracellular processes [3, 5,6,7,8], and may be altered by disease [9, 10], and glycosylation defects may cause disease [9, 11]. Site-specific alteration of glycosylation has been implicated in receptor activation [12, 13], and an interplay between mucin-type O-glycosylation of fibroblast growth factor 23 on Thr-178 and its phosphorylation at Ser-180 seems to control phosphate balance [14].

Mass spectrometry, with or without MS/MS analysis, has been used for glycopeptide characterization for decades [15,16,17,18,19,20,21]. There are numerous studies where purified glycopeptides, both N-linked [15, 18, 22, 23] and O-linked [16, 19, 24], have been characterized using intact mass measurements, enzymatic digestion, and collision-induced dissociation. Glycosidic bonds are weaker than peptide bonds. Thus, CID spectra are usually dominated by glycan fragments, generally displaying little information about the underlying amino acid sequence. This usually does not represent a significant problem in single-protein analysis when there are only a few potential glycosylation sites, but is a serious problem in mixture analysis, especially in global studies. Radical-based fragmentation techniques ECD [25] and ETD [26] offer an alternative approach where the almost exclusive observation of peptide backbone cleavages provides information for both the peptide sequence and modification site assignments. In these fragmentation modes, the side-chains usually remain intact, although the precursor ion may lose some glycan units, especially sialic acids [27]. Recently, a combination of energy- and radical-based fragmentation, EThcD [28], has been implemented in high-end Orbitrap mass spectrometers (Orbitrap Fusion and Lumos). In EThcD analyses, ETD activation is performed first; then, all fragments as well as the surviving precursor ions are subjected to mild HCD activation, before the resulting products are measured. Based on studies conducted on unmodified [28] and phosphopeptides [29], fragments produced by the ETD process do not fragment further at the HCD energies used (except radical z. ions may yield w ions [30]), but the activated precursor ions are fragmented. Although glycopeptides are more prone to fragmentation upon collisional activation than phosphopeptides, secondary fragmentation of c/z. ions (for peptide fragment ion nomenclature, see [31]) has not been reported for either N- [32,33,34] or O-glycopeptides [35]. Hence, this activation method seems to be ideal for glycopeptide characterization as c/z. ions generated by ETD enable peptide sequence identification and modification site assignment while supplemental HCD activation yields information on the glycan structure in the form of B/Y ions (for carbohydrate fragment ion nomenclature, see [36]) and may provide additional sequence coverage by generating additional b/y ions.

In practice, ion trap CID using resonance-activation yields information about the glycan structure and the peptide size [27, 37]; beam-type CID (HCD), by producing multiple collisions, may produce a “balanced” spectrum with informative glycan and peptide fragments or may lead to comprehensive peptide fragmentation and break the glycans to smithereens [38,39,40]; ETD may deliver data that are good enough for both peptide and site identification, provided the modifying oligosaccharide is listed in the queried glycan database [27, 41,42,43,44,45]. In general, each activation method provides some, but not all, of the clues needed to decipher the glycopeptide structure, though good quality EThcD is getting close to the goal (for a recent review of protein O-glycosylation including MS/MS characteristics of O-glycopeptides, see [46]). Mass spectrometry provides limited information about the glycan structure. Normally, LC-MS/MS data do not reveal the ring linkage positions and the linkage stereochemistry cannot be deciphered using mass spectrometry. Isomeric oligosaccharide building blocks may not be distinguishable, although beam-type CID (HCD) fragmentation has recently been shown to be able to distinguish GlcNAc and GalNAc residues based on their different fragment ion intensity profile [41]. Automated glycopeptide assignments identify the glycan composition and link it to a glycan listed in the database. Glycan assignment is often based on supporting information about the glycans present, such as glycan analysis, knowledge of the potential glycan structures in the sample, or by using glycan structure-specific purification prior to MS.

Data acquired with different activation techniques are currently searched independently by search engines. There have been some attempts to use ETD and CID/HCD data together, but only for N-glycopeptides, and the identification was based on either the HCD data [47,48,49] or ETD spectra [46], while the other dataset provided confirmation [43,44,45] or structural information about the glycan [50]. We hope that eventually software that handles all available data in an interactive fashion will be developed, but currently, researchers manually integrate information to produce optimal data interpretation [51,52,53].

Some large-scale N-glycopeptide studies have used beam-type CID (HCD) data [54,55,56,57,58,59], whereas others have employed ETD [42, 60,61,62]. For success in high-throughput O-glycosylation studies, the use of ETD is essential [42,43,44, 63]. The first large-scale studies using EThcD to study N-glycosylation [32,33,34] and O-glycopeptides [35] have recently been published.

In the present study, we applied EThcD for the large-scale analysis of a very complex N- and O-glycopeptide mixture, human urine, with primary focus on O-glycosylation. In order to obtain a comprehensive view of the modifying glycans, tryptic glycopeptides were enriched with lectin weak affinity chromatography using wheat germ agglutinin (WGA), a lectin that has been shown to bind a wide array of glycan structures [42, 64]. The glycopeptide mixtures were analyzed by LC-MS/MS using HCD product-ion-dependent EThcD data acquisition. Since reliable O-glycopeptide assignment, permitting multiple different oligosaccharide structures simultaneously, is still a very hard task to tackle, we used two search engines, Byonic [65] and Protein Prospector (http://prospector.ucsf.edu). Both software search for b/y and c/z. peptide fragments (for nomenclature, see [31]). Prospector also considers the products of hydrogen migration, (c − 1. and z + 1 ions) but ignores the glycan fragments. Byonic considers B and additional glycan oxonium ions, as well as Y fragments resulting from glycan fragmentation (for nomenclature, see [36]), and looks for two additional peptide fragments: a- and b-H2O. Obviously, the search engines also use different scoring systems. Using two different tools should lead to more glycopeptide identifications (as it indeed did) and adds confidence to the shared assignments.

In addition, confidently assigned spectra were inspected manually in order to establish rules about the EThcD fragmentation of glycopeptides. It was obvious at first glance that glycan fragmentation is prominent but somewhat different from the corresponding HCD spectra of the same glycopeptide precursors. Thus, we investigated how general this phenomenon is and how glycopeptide characterization could benefit from it.

The focus of the present study was not the comprehensive characterization of urinary glycopeptides or comparing the performance of different search engines, but rather to evaluate whether a new analysis approach could help to identify more components, and to draw attention to existing problems that are usually ignored.

Experimental

Sample Preparation

One hundred milliliters of urine from three healthy volunteers (sample A, 46-year-old female; sample B, 46-year-old male; sample C, 26-year-old male) were used for the studies. Samples were collected with appropriate consents approved by the regulatory and ethical authorities (ethics approval number of the Hungarian Scientific and Research Ethics Committee: 1011/16). Protein concentration of the samples (100 ± 10 μg/ml) was determined by the Bradford assay.

Cell debris was removed by centrifugation (5000g, 10 min) and the resulting urine was concentrated on 10-kDa MWCO ultracentrifugation devices (Millipore). The concentrate was supplemented with guanidine hydrochloride (to a final concentration of 6 M), followed by reduction with 20 μl DTT (500 mM in 25 mM ammonium bicarbonate) and alkylation with 40 μl iodoacetamide (500 mM in 25 mM ammonium bicarbonate). The mixtures were washed with guanidine (6 M in 25 mM ammonium bicarbonate) then with ammonium bicarbonate (25 mM) followed by incubation with 100 μg trypsin (37 °C, 12 h). Glycopeptides were enriched by two rounds of affinity chromatography using a homemade column packed with wheat germ agglutinin immobilized on POROS [36] as described in [66]. Two fractions were collected per injection: a “shoulder” fraction (fraction 1) and a “GlcNAc” fraction (fraction 2, eluted with 200 μl GlcNAc (200 mM in 150 mM ammonium bicarbonate)), representing weakly and more strongly bound glycopeptides, respectively. After the first round of enrichment, the GlcNAc fraction was desalted on C18 SepPak cartridges (Waters); the fractions were combined and subjected to a second round of enrichment. After the second round of enrichment, the fraction 2 was desalted, and all samples were dried down.

Mass Spectrometry

The glycopeptide mixtures were analyzed by LC-MS/MS using an Acquity UPLC MClass System (Waters) on-line coupled to an Orbitrap Fusion Lumos Tribrid Mass Spectrometer (Thermo Scientific) operating in positive ion mode. Five percent of the isolated peptide mixtures were injected for each LC-MS/MS analysis. Fractions 1 and 2 were analyzed separately. After trapping at 3% B (Waters Acquity UPLC MClass Symmetry C18 180 μm × 20 mm column, 5-μm particle size, 100-Å pore size; flow rate 10 μl/min; solvent A, 0.1% formic acid/water; solvent B, 0.1% formic acid/ACN; flow rate 300 nl/min), peptides were separated using a linear gradient of 10 to 30% B in 60 min (Waters Acquity UPLC MClass BEH C18 75 μm × 250 mm column, 1.7-μm particle size, 130-Å pore size).

Each MS survey scan (m/z 380–1580, R = 60,000, acquired in profile mode) was followed by a maximum 3-s cycle collecting MS/MS data of precursors in the order of decreasing charge state (z = 3–5) then by increasing m/z (minimum intensity 106). Precursor ions were isolated with the quadrupole (isolation window 2 Da). HCD data (AGC target 50000, normalized collision energy (NCE) 28%) were acquired for each precursor, while EThcD data acquisition (AGC target 300000, supplemental activation energy 15% NCE) was triggered by the presence of diagnostic sugar oxonium ion m/z 204.0867 (for N-acetylhexosamine, HexNAc) among the 20 most abundant fragment ions of the HCD spectrum, with a mass tolerance of 10 ppm. All MS/MS spectra were acquired in the Orbitrap (R = 15,000, centroid mode). Dynamic exclusion was enabled (maximum 1 HCD and EThcD spectra/precursor in 30 s).

Data Interpretation

Proteome Discoverer (Thermo Scientific, v2.2.0.388) was used to generate separate HCD and EThcD peaklists from the raw data in mgf format. A minimum peak count of 10 was required to retain the MS/MS spectrum. EThcD peaklists were filtered using the MS-Filter program of Protein Prospector [67] for the presence of sialic acid oxonium ion m/z = 292.1027 (mass tolerance 10 ppm) within the 80 most abundant fragment ions (since most spectra feature less fragment ions), then searched using the Protein Prospector Batch Tag Web (v5.16.0.) and Byonic (v2.13.17) search engines. Protein Prospector used the 40 most abundant ions from each half of the spectral mass range in the database searches. Byonic uses practically all the observed peaks. N- and O-glycopeptides were searched separately with the following parameters: database human subset of the Swissprot database (2017.9.19.version, 20,219 sequences) concatenated with a randomized sequence for each protein entry; enzyme: semitrypsin with maximum 1 missed cleavage site; mass accuracy: 5 ppm for precursor ions and 10 ppm for fragment ions specified as monoisotopic values; fixed modification: carbamidomethylation (Cys); variable modifications: acetylation (protein N-terminus), cyclization (peptide N-terminal Gln), and oxidation (Met); and maximum number of variable modifications per peptide: 2. For O-glycopeptide identifications, the HexNAcHex, HexNAcHexNeuAc, HexNAcHexNeuAc2, and HexNAc2Hex2NeuAc2 glycan structures were also considered as “common” variable modifications. For N-glycopeptide searches, the “57 human N-glycans” database was considered as additional “rare” variable modification (1 N-glycan per peptide). Acceptance criteria for identifications are as follows: Protein Prospector searches: maximum FDR values: fraction 1: 5 and 1%, fraction 2: 10 and 5% for protein and peptide identifications, respectively, and SLIP score ≥ 6 for estimation of site assignment reliability [68]; Byonic searches: maximum protein FDR value: 1%, Pep2D score < 0.1 [69].

MS-Filter was further used to screen for the presence of specific carbohydrate oxonium ions as described in the “Results and Discussion” section. All ions were searched with ± 10-ppm mass tolerance using the instrument type specification as ESI-EThcD-high-res except for m/z 1313.4625 that was searched with instrument type ESI-Q-high-res.

Results and Discussion

Affinity-based glycopeptide enrichment was performed from tryptic digests of three individual human urine samples (labeled A, B and C), using wheat germ agglutinin, which binds a wide array of glycopeptides [40]. Using human serum tryptic digests, we have previously observed that singly glycosylated O-glycopeptides tend to elute at the end of the flow-through fraction, and the background of non-glycosylated peptides is high. Multiple modified O-glycopeptides are predominantly present in the fraction eluted with GlcNAc (unpublished data). In order to maximize peptide spectrum matches (PSMs), we collected and analyzed these fractions separately (referred to as “fraction 1” and “fraction 2”). During LC-MS/MS analysis, the presence of the diagnostic HexNAc oxonium ion, m/z 204, in the HCD spectrum triggered EThcD data acquisition. Since glycopeptides are usually larger than unmodified peptides, not only singly but also doubly charged ions were excluded from precursor ion selection. In the EThcD experiments, the default value of supplemental activation energy (15%) was used. Approximately 45,000 EThcD spectra were acquired (Online Resource 1).

Since the enrichment method has been reported as non-discriminative [40, 59, 60, 63], the glycan structures present could not be predicted. Initial screening of the MS/MS data for the diagnostic monosaccharide oxonium ion m/z 292.103 of N-acetylneuraminic acid indicated a predominance of sialylated glycopeptides (Online Resource 2). Thus, we focused on such structures, and database searches were performed with the m/z 292-filtered peaklists. For O-glycopeptide identification, two search engines were used, Protein Prospector and Byonic. Glycan structures representing the di-, mono-, and nonsialylated core-1 and the disialylated core-2 O-glycans were considered as potential modifications, presuming that urine has a similar O-glycan distribution to plasma [70]. A previous study using sialic acid-based enrichment of urinary glycoproteins also indicated the dominance of core-1 and core-2 O-glycans [49]. For N-glycopeptide identification, only Byonic was used, and a larger N-glycan database representing the major plasma N-glycans was considered. This glycan database contains all the N-glycans identified on urinary glycoproteins in an earlier study [49]. All O-glycopeptide identifications meeting the acceptance criteria are listed in Online Resource 3, while N-glycopeptide identifications are presented in Online Resource 4. More O-glycopeptides were identified than N-glycopeptides. O-glycopeptide identifications are in good agreement with previous results: 67% of the identified O-glycosylated sequences reported by Halim [49] were identified in the present study except that we also report on the number of sialic acids present. The overlap of N-glycopeptides is much less impressive (20%). The overall spectral identification rate was rather low. Fraction 1 yielded more assignments. For the O-glycopeptides, the search engines performed quite similarly: Byonic yielded 556 PSMs in comparison to 552 delivered by Protein Prospector, with 343 shared identifications. Combined with the 328 N-glycopeptide assignments, ~ 5% of the fragmentation spectra are accounted for. Fraction 2 yielded less, 326 and 304 PSMs by Byonic and Prospector, respectively, with 173 shared identifications. The O-glycopeptide identifications combined with the 109 assignments from the N-glycosylation searches cover ~ 2.5% of the data. Since the later eluting species most likely feature larger and/or more oligosaccharide structures, these results are not entirely unexpected. However, the overall success rate is disheartening. There are numerous analysis approaches one could try to increase identifications. For example, only two O-glycans/peptide were permitted, even though sequences containing up to 12 GalNAc modifications have been reported from Simple Cell experiments [41]. We have previously reported triply and quadruply O-glycosylated peptides from human serum, albeit after removing the sialic acids [71]. One could also search for additional glycoforms in glycoproteins already confidently assigned in the mixture. Searches could be performed with relaxed enzyme specificity, since the presence of other proteolytic activity was amply detected. In addition, deamidation of Asn and Gln residues most likely occurred during the sample preparation; thus, additional variable modifications could be introduced. However, opening up the search space introduces additional issues, and as we present below, the reliability of glycopeptide assignments, at least for O-glycopeptides, has not been solved even on this “conservative” analysis level.

We scrutinized the data to find some explanations for our low success rate and in the hope to increase the number of reliable assignments. Firstly, we found that the majority of the EThcD precursors displayed low charge density: over 90% of the precursor ions were m/z > 900. Identification rate was ~ 13% for precursors m/z < 900; higher m/z precursors yielded identification rates of ~ 2%. This suggests that ETD efficiency is a limiting factor and the collisional activation in this particular setting does not deliver sufficient information about the amino acid sequence. In many EThcD spectra, radical-based fragmentation was limited or non-existent; ions representing glycan fragmentation as a result of the supplemental HCD activation dominated. This was especially true for glycopeptides bearing larger glycan structures, or for multiply glycosylated peptides. Figure 1 depicts the EThcD spectrum of an unidentified O-glycopeptide. From the data, one can decipher that the peptide bears a hexasaccharide HexNAc2Hex2NeuAc2, probably representing a disialylated core-2 O-glycan. However, the low number of potential sequence ions prevents the identification of the underlying amino acid sequence. Another group, performing N-glycopeptide analysis, reported similar observations [34].

Figure 1
figure 1

EThcD spectrum of an unidentified glycopeptide m/z 827.037(3+) bearing a disialo mucin-type core-2 structure featuring very informative glycan fragments, which help to assign the modifying sugar, but the spectrum does not contain sufficient data for peptide identification. Oxonium ions are in blue, while sugar losses from charge-reduced precursors are marked in red (2+) or green (1+). Y0 represents the unmodified peptide, while Y1 indicates the peptide modified with the core GalNAc only. The diamond symbol (♦) indicates the charge-reduced ion of a co-eluting doubly charged precursor, indicating multiple components are being co-fragmented. One glycan fragment, m/z 495, does not represent the normal sialylated core-2 structure. Its presence may indicate (i) precursor ion interference (as mentioned above); (ii) an unexpected structure; (iii) a co-eluting isomeric glycoform; or (iv) sialic acid migration, i.e., a rearrangement reaction. The peaklist of this spectrum is included as Online Resource 5

We also found that precursor ion interference is the rule rather than the exception when using high-sensitivity MS in such complex mixtures. Ion clusters corresponding to the charge-reduced forms of a co-eluting, different-charge state precursor ion was detected in the majority of the EThcD spectra (e.g., see Fig. 1). Automated data interpretation also indicated precursor ion interference: in spite of the m/z 204 HCD product ion trigger and the presence of the neuraminic acid-specific oxonium ion at m/z 292, unmodified peptides were confidently assigned from the 292-filtered EThcD peaklists (Online Resources 3 and 4; slides 30–32 of Online Resource 5). In addition, from the same peaklists, quite a large number of N-glycopeptides featuring neutral structures were identified in both fractions (Online Resource 4).

The abundance of glycan oxonium ions and the detection of even the intact hexasaccharide (Fig. 1) seemed to be unusual in comparison to HCD spectra. In order to confirm such differences and to assess the prevalence of ions representing larger structures (Table 1), we performed a statistical analysis of the dataset using the MS-Filter tool in Protein Prospector [61]. Since it is likely that common O-glycans may yield diagnostic fragment ions, filtering for such fragments is an obvious path to investigate. In addition, ascertaining the presence of certain structures on multiply modified O-glycopeptides would be a great improvement.

Table 1 Structure and monoisotopic m/z values of the sugar oxonium ions used in the filtering process

The oxonium ions of HexNAc or sialic acid are usually abundant upon collisional activation, while the m/z 657 fragment representing a HexNAcHexNeuAc structure is much weaker. At the same time, our personal observation suggested that both m/z 292 and m/z 657 are more intense in EThcD. Thus, we tested how many spectra featured these masses within the 20 most abundant ions. Filtering with the sialic acid, oxonium ion delivered about the same number of matching spectra in both activation methods (Online Resource 2). However, the trisaccharide ion was present in about 50% less HCD spectra than EThcD spectra (Table 2).

Table 2 Heatmap of percentage of the MS/MS spectra featuring the specified fragment ions within 10 ppm among the specified number of most abundant ions. A, B, and C represent different samples, while 1 and 2 represent fraction 1 and 2 of affinity enrichment, respectively

The increased frequency of the trisaccharide (B3) ion in EThcD spectra confirmed our hunch about the improved survival rate of larger glycan fragments. The detection of intact O-glycans should improve the assignment of O-glycopeptides, since the differentiation between sequences modified by a hexasaccharide or by two smaller glycans (such as two trisaccharides or a di- and tetrasaccharide) often proves impossible. The disialylated core-1 O-glycan, GalNAc(NeuAc)GalNeuAc can produce a B ion at m/z 948, while the disialylated core-2 O-glycan GalNAc(GalNeuAc)GlcNAcGalNeuAc would yield a B ion at m/z 1313. These ions were detected in a subset of the spectra (Table 2), but their intensities were lower than that of the trisaccharide oxonium ion (albeit much higher compared to HCD). The frequency of m/z 948 seemed to plateau only when considering the top 80 most intense peaks. The B fragment of the intact hexasaccharide was the least common; its frequency did not reach 1% (obviously, this number reflects not only the fragility of the structure but also the lower occurrence of this glycoform). We searched for other core-2-specific candidate ions. A fragment at m/z 407 representing HexNAc2 was detected in a reasonable number of spectra (Table 2). The disialylated core-2 hexasaccharide could also yield HexNAc2Hex, HexNAc2Hex2, HexNAc2HexNeuAc, and HexNAc2Hex2NeuAc at m/z 569, 731, 860, and 1022, respectively. However, none of these were observed at a significant level.

Hoping that the characteristic oxonium ions would help to identify most tetra- and hexasaccharide-bearing glycoforms, database searches were performed with shorter, pre-screened peaklists. For tetrasaccharides, the presence of m/z 292 and the intact B ion at m/z 948 were required. To attempt to find hexasaccharide-bearing glycopeptides, we also required the detection of m/z 657 along with the diagnostic internal fragment (m/z 407) or the B ion representing the intact glycan (m/z 1313). This additional confirmation was needed, as at high masses, there is an increased chance of interference from peptide fragments, while at lower masses, precursor ion interference seems to be higher. Since O-glycopeptides frequently feature multiple glycosylations [71], probably also with different glycan structures, the smaller, potentially underlying glycans were also listed as variable modifications in these searches. The search with the m/z 407 pre-filtered list yielded glycopeptides with many kinds of glycans (data not shown), several of which cannot produce an m/z 407 ion, confirming that low-mass oxonium ions are frequently non-specifically observed due to precursor ion interference. Using the intact glycan masses as specific markers for the presence of glycans of such sugar compositions proved to be of limited benefit. More than 90% of the confident assignments featured the targeted glycan (Online Resources 3 and 5). However, the assignment rate was no higher compared to the results from the peaklists that were only filtered for m/z 292 (see the Summary sheet of Online Resource 3). Moreover, depending on the dataset and the search engine used, 33–50% of the PSMs linked to tetrasaccharide-modified sequences were lost when the pre-screened peaklists were used. This means that these spectra did not feature the diagnostic B fragment. However, we do not know how reliable these assignments are, despite the probability-based measures used by the search engines (see some examples, from the hexasaccharide dataset, Online Resource 5). Protein Prospector benefited more from using a filtered dataset: 83 and 28 novel PSMs were assigned to tetrasaccharide-modified glycopeptides from the pre-screened peaklists, when analyzing fractions 1 and 2, respectively (Online Resource 3). Byonic practically did not gain by using the 948-filtered peaklist: it added 3 and 11 novel PSMs to the assignments, from the fractions 1 and 2, respectively. This could be explained by the fact that Byonic also scores the glycan fragments. The inferior results for the stronger binding glycopeptide mixture (fractions 2) may be explained by the more extensive glycosylation.

Since the core-2 structure is probably the rarest O-glycan structure considered in the present study and thus, should represent the smallest spectrum pool, we illustrate the difficulties encountered in data interpretation with these data. Querying the peaklists filtered for the presence of m/z 292 allowed assignment of 110 individual PSMs to hexasaccharide-bearing O-glycopeptides in fraction 1 (Online Resource 3). A significant portion of these data indicated multiple glycosylations, including modification by two hexasaccharides. In numerous cases, such as when there were adjacent potential modification sites, there was not sufficient information to decide between the different potential glycoforms (a single hexasaccharide as opposed to, for example, two trisaccharides). Additionally, since only two modifications per peptide were permitted, sequences reported modified with two hexasaccharides could actually be glycopeptides more extensively decorated with smaller glycans, an ambiguity produced by the analysis approach. From the peaklist filtered for m/z 1313 (95 spectra), Protein Prospector delivered 22 PSMs (23%) representing 16 unique sequences (see list in Online Resource 5). All but one of these was modified by the hexasaccharide, although for a few sequences the software could not make a confident site assignment because of adjacent potential modifications sites (for the best spectrum, see Fig. 2). From the same data, Byonic delivered 17 PSMs representing 12 unique glycopeptides. The overlapping with Prospector identifications was only 12 PSMs related to 7 unique glycopeptides (Online Resource 5). Both search engines identified an unmodified peptide from a mixture spectrum. Interestingly, for this data, the glycan scoring did not work to Byonic’s advantage, as the search engine assigned doubly modified peptides and also listed a trisaccharide-modified sequence. The relevant EThcD spectra, their assignments, and also the corresponding HCD data are presented in Online Resource 5. These data illustrate that glycopeptide assignments may benefit from using the corresponding HCD data. In our data, some of the HCD spectra provide unambiguous peptide sequence assignment, some spectra do not offer any supporting information besides the accurate Y0 mass, some indicate that multiple components were present, and some indicate that the EThcD assignment is incorrect.

Figure 2
figure 2

EThcD spectrum of m/z 746.660 (3+) identified as 342AVAVTLQSH350 from Protein YIPF3 (Q9GZM5) bearing a disialylated mucin-type core-2 glycan at Thr-5 (Thr-346 in the protein sequence). The oxonium ions (in blue) and reducing end, sugar-loss fragments (2+ in red; 1+ in green) are labeled with the appropriate CFG symbols, while the peptide fragments are assigned according to the nomenclature [31], except the radical z ions were originally called z+1 fragments. The m/z of the precursor ion is within 5 ppm of the calculated value, while all the fragments were measured within 10 ppm. “Pep” indicates the bare peptide backbone. Y0 refers to the unmodified peptide, while Y1 indicates the peptide modified with one N-acetyl-galactosamine. “–Ac” denotes the neutral loss of an acetyl group. The peaklist of this spectrum is included as Online Resource 7

Finally, we have also investigated the presence of the intact tetra- and hexasaccharide glycan oxonium ions in the EThcD spectra of confidently identified (score ≥ 200, Delta Mod score ≥ 20 for Byonic identifications; score ≥ 15, SLIP score ≥ 6 for Protein Prospector identifications) glycopeptides. Only singly glycosylated peptides were considered in order to minimize interference from ambiguous site assignments that frequently also translates into ambiguous glycan structure identification. Fifty-nine percent of the tetrasaccharide-related spectra displayed m/z 948, while 13% of the hexasaccharide-related glycopeptide data contained m/z 1313 within the top 80 most abundant fragment ions. These findings indicate that larger glycan structures are unstable even under mild collisional activation.

In summary, the non-reducing end glycan fragments may aid database searches and may strengthen the reliability of glycopeptide assignments. Primarily, they improve the characterization of O-glycosylation where the same mass addition to a peptide sequence may correspond to a single larger or multiple smaller glycans. Such a contribution is probably less significant in N-glycosylation analysis, although it may help to identify antenna fucosylation or less common structural features. Thus, scoring glycan fragments is beneficial in the evaluation of EThcD spectra for determining glycosylation state, albeit that complementary information on the peptide part is also necessary. Hence, glycopeptide data should be interpreted similarly to cross-linked peptides—only those identifications that contain sufficient evidence to identify both parts of the structure should be considered reliable. The Byonic search engine does predict and score B/Y ions. However, several EThcD spectra meeting the acceptance criteria displayed exclusively glycan fragmentation. On the other hand, Protein Prospector currently does not score glycan fragmentation, meaning both the score and the reliability of the glycopeptide identifications might be under-estimated.

Reducing end Y fragments might contribute tremendously to the proper assignment of glycopeptides. Unfortunately, there is no current software implementation for their de novo recognition in O-glycosylation, especially when multiple, even different, glycans may be present. There is software where N-glycopeptide assignment heavily relies on the identification of Y1 from ion trap CID [72] or HCD data [47]. Both software also incorporated glycan fragmentation scoring in order to decipher structures. According to two recent publications, N-glycopeptide glycans can even be sequenced de novo from HCD data [48, 49]. Unfortunately, the X; X + 203; X + 365 pattern that helps to identify the Y1 fragment of N-glycopeptides may also represent Y0, Y1, and Y2 in mucin-type O-glycopeptides, or partially gas-phase deglycosylated peptide fragments in these molecules. Thus, sophisticated informatic tools are required to utilize reducing end fragments even in the structure confirmation of the most promising candidates.

In comparison to ETciD, EThcD data provided additional information on the modifying glycans. On the other hand, the efficiency of the ETD fragmentation is still lower than that of collision-based fragmentation methods. In a recent study, the EThcD settings were altered for the more efficient characterization of N-glycopeptides [34]. The ETD activation time was shortened and the supplemental activation energy was increased. This improved the peptide fragmentation data (more y and b ions were detected) at the expense of glycan characterization. This approach may be counterproductive for O-glycosylation analysis, as the larger oxonium ions that allow differentiation between single and multiple glycosylations may not survive, and gas-phase elimination of the glycan(s) may also occur. Fragment ions generated by ETD activation are lower charge state than the precursor; therefore, following supplemental activation normally does not induce secondary fragmentation [28]. However, glycosidic bonds are highly labile upon collisional activation; therefore, we wanted to rule out the possibility that secondary fragmentation biases automated data interpretation. We have selected high-scoring EThcD spectra assigned to O-glycopeptides with only one potential modification site (to exclude the possibility that the spectrum is acquired on a mixture of co-eluting glycoforms) then checked if there are any unassigned fragment ions that might be interpreted as loss of sialic acid(s) (− 291 and − 582 Da) or the complete O-glycan (− 947 Da for the tetra-, and − 1312 Da for the hexasaccharide) from peptide sequence ions. We did not observe secondary fragmentation with the low supplemental activation (NCE 15%) applied in the present study (data not shown).

Conclusions

Glycopeptides represent two or more covalently linked biomolecules, and not only does each part have to be reliably identified, but also their linkage position has to be determined. This is especially problematic in O-glycosylation analysis, where multiple different oligosaccharides may be present on the peptide studied, and the mass addition can be translated into a series of different glycan combinations (although isomeric N-glycans also exist). This elevated complexity, compared to an unmodified peptide, demands the use of multiple activation methods to provide comprehensive analysis. Glycan fragmentation (observed in CID, HCD, and EThcD MS/MS spectra) should be scored, perhaps independently in the database searches, just like the cross-linked peptides are in some search engines [73].

Moreover, if software were available that reliably assessed accuracy of precursor charge state and monoisotopic mass determination from combined MS1 scans, and then combined information from HCD and EThcD data, i.e., oxonium ions, peptide fragment ions, and B and Y ions—as we currently do manually—then the rate of confident glycopeptide identification could be significantly higher.

We have observed higher m/z glycan oxonium ions in EThcD spectra of glycopeptides acquired with mild supplemental activation (NCE 15%) compared to HCD spectra employing a higher collision energy. These ions are specific for some glycan structures, so it could be used for confirming correct glycan identification in automated data interpretation. However, as these ions tend to be of low intensity, their presence cannot be considered as a pre-requisite for targeted LC-MS/MS data acquisition (such as the m/z 204 HCD product-ion-dependent ETD approach for general glycopeptide data acquisition [74]). On the other hand, the detection of the intact glycan oxonium ions such as m/z 948 for the disialylated core-1 O-glycan, or m/z 1313 for the disialylated core-2 O-glycan can be used as an orthogonal post-identification validator.