Journal of The American Society for Mass Spectrometry

, Volume 29, Issue 9, pp 1802–1811 | Cite as

An Automated, High-Throughput Method for Interpreting the Tandem Mass Spectra of Glycosaminoglycans

  • Jiana Duan
  • I. Jonathan Amster
Focus: Application of Photons and Radicals for MS: Research Article


The biological interactions between glycosaminoglycans (GAGs) and other biomolecules are heavily influenced by structural features of the glycan. The structure of GAGs can be assigned using tandem mass spectrometry (MS2), but analysis of these data, to date, requires manually interpretation, a slow process that presents a bottleneck to the broader deployment of this approach to solving biologically relevant problems. Automated interpretation remains a challenge, as GAG biosynthesis is not template-driven, and therefore, one cannot predict structures from genomic data, as is done with proteins. The lack of a structure database, a consequence of the non-template biosynthesis, requires a de novo approach to interpretation of the mass spectral data. We propose a model for rapid, high-throughput GAG analysis by using an approach in which candidate structures are scored for the likelihood that they would produce the features observed in the mass spectrum. To make this approach tractable, a genetic algorithm is used to greatly reduce the search-space of isomeric structures that are considered. The time required for analysis is significantly reduced compared to an approach in which every possible isomer is considered and scored. The model is coded in a software package using the MATLAB environment. This approach was tested on tandem mass spectrometry data for long-chain, moderately sulfated chondroitin sulfate oligomers that were derived from the proteoglycan bikunin. The bikunin data was previously interpreted manually. Our approach examines glycosidic fragments to localize SO3 modifications to specific residues and yields the same structures reported in literature, only much more quickly.

Graphical Abstract


Glycosaminoglycan Fourier transform mass spectrometry Tandem mass spectrometry Automated interpretation 


Glycosaminoglycans (GAGs) are linear, polydisperse carbohydrates consisting of a repeating uronic sugar and amino sugar copolymer. GAGs serve a multitude of roles in biology including cell-cell and cell-matrix interactions, generation of energy, changes in proteins binding conformation, and molecular recognition [1, 2, 3]. Certain GAGs have also been observed as potential biomarkers for disease states [4]. The degree of GAG-protein binding has been shown to be highly dependent on their structure and, more specifically, the position of modifications within their generic repeating copolymer chain [5, 6].

Despite the simple polymeric backbone in GAGs, a single sugar residue can exhibit varying levels of three key modifications, namely O-sulfation, N-deacetylation/sulfation, and uronic sugar stereochemistry [2]. Moreover, the biosynthesis of GAGs is not template driven, resulting in non-uniform dispersion of these modifications across the chain [7, 8]. Database-derived approaches are widely used for protein mass spectra assignment (either top-down or bottom-up) due to the predictability of amino acid sequences from genome sequences but fail when applied to biomolecules whose production is not template-derived [9, 10]. In contrast to the approaches that are successful for protein/peptide analysis, a de novo approach is required for the computer-based analysis of the tandem mass spectra of GAGs.

Considerable progress has been made in GAG analysis using mass spectrometry [1, 11]. At the MS1 level, a parts per million accurate mass measurement, using high-resolution instruments such as Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS), allows assignment of composition, from which GAG chain length, number of modifications, and types of modification can be assigned [12]. Tandem MS (MS2) of GAGs using various ion activation methods, such as collision-induced dissociation (CID) [13, 14, 15], infrared multiphoton dissociation [16, 17, 18, 19], electron-detachment dissociation (EDD) [16, 18, 19, 20, 21, 22, 23, 24], and negative-electron transfer dissociation (NETD) [25, 26, 27], yields structurally informative fragment ions [28]. Glycosidic bond fragmentation provides monosaccharide composition, while cross-ring fragmentation is used to assign the location of modifications within each residue [29]. Because this is a de novo analytical approach, complete structure analysis requires an information-rich mass spectrum that contains sufficient fragment peaks to fully assign all the variable features. Recent developments in ion activation for GAGs have led to a variety of approaches to produce informative MS2 spectra [21, 23, 28, 30]. However, the interpretation of such complex mass spectra is generally a tedious manual process that relies upon the expertise of the data analyst. A better understanding of the structural features that promote GAG activity would benefit from an automated, accurate and high-throughput analytical process.

The complexity of the data sets and the time required for analysis increases dramatically as the chain length and the number of modifications increase. Two families of GAGs, heparin/heparan sulfate (Hp/HS) and chondroitin/dermatan sulfate (CS/DS), often contain large numbers of labile sulfate modifications. For these compounds, conventional MS2 methods are often inadequate for complete structural determination, either because they do not produce a comprehensive set of fragment ions required to assign all variable features or because they lead to decomposition products that confound the analysis [8, 31]. For example, fragmentation can be accompanied by decomposition of sulfomodifications, producing peaks that are reduced in mass by multiples of 80 mass units but match the mass of standard glycosidic fragments of their counterparts with fewer sulfate modifications [28, 32]. If one does not recognize the peaks that arise from such decomposition, incorrect structural assignments will result. Common de novo strategies that have been successful for protein sequencing [25, 33, 34, 35] will inevitably be exposed to substantially more false positives due to the high-likelihood of SO3 loss fragments in GAG MS and MS2. Na+/H+ exchange has been shown to decrease SO3 loss and makes characterization of highly sulfate species possible [30]; however, SO3 loss is almost always observed in MS2 spectra.

An alternative to the above approach to interpretation is to generate a list of possible fragment peaks for a candidate structure and to score the match with the experimental data. This process can be repeated for all possible isomers having a given elemental composition. Comparison of the experimental MS2 against the theoretical fragment list allows us to rank each permutation based on closeness-of-fit to the experimental results. This method becomes impractical to perform manually when the number of possible permutations for a composition exceeds the capability to examine the data. For example, Arixtra, a heparin with five monosaccharides, is the largest highly sulfated GAG to have complete mass spectral characterization [30]. The number of total possible permutations for a GAG scales logarithmically with the respect to chain length. For both chondroitin/dermatan sulfate and heparan sulfate/heparin, the number of permutations based on chain length and number of modifications is calculated as n-choose-k combinations, where n is the number of possible modifiable sites and k is the number of modifications:
$$ {N}_{\mathrm{total}}\propto \log {N}_{\mathrm{chain}\ \mathrm{length}} $$
$$ \left(\genfrac{}{}{0pt}{}{n}{k}\right)=\frac{n!}{k!\left(n-k\right)!} $$

Tools for comparison of user-input structures with fragment peaks from tandem MS have been developed [12, 36, 37], but the requirement for a known starting structure limits applicability for high-throughput analysis.

To address this bottleneck for high-throughput sequencing of GAGs, efforts in computer-assisted methods look to improve upon the speed of analysis and to reduce the amount of user-input and supervision. Several software packages have been developed to overcome modern challenges in GAG analysis although a few require addition steps at the experimental level for optimal software performance. The heparin/HS oligosaccharide sequencing tool (HOST) [38] is a computational tool designed for sequencing heparin/HS oligosaccharides using enzymatic digestion combined with ESI-MSn. The method scores and returns the best matching sequences of GAGs based on disaccharide composition analysis, yielding predicted compositions and calculating expected fragmentation patterns in silico. Comparisons of theoretical fragments can then be compared to fragmentation of heparin/HS oligosaccharide MSn data and is scored to return the most likely sequence. However, disaccharide analysis requires complete enzymatic digestion of the GAG using heparin lyases I, II and III over multiple hours of incubation (16 h), limiting the method’s overall speed and applicability in a high-throughput GAG analysis platform.

Another piece of software known as GAG-ID [39] has been shown to discriminate and identify 21 synthetic tetrasaccharides eluted from LC-MS/MS using a scoring system based on peak intensities. It is the first of its kind to automated the interpretation of mixtures when coupled to LC-MS/MS but require complete chemical derivatization of the GAG by replacing all labile sulfate modifications with more stable acetyl groups. Much like HOST, derivatization may not be a viable option for universal GAG analysis.

HS-SEQ [40] is a de novo GAG sequencing computation framework that has been used to automate the structural identification of HS of dp4, 5, 6, 8, and 15. The method determines a precursor sequence (unmodified GAG backbone) and uses information from the tandem MS to best assign possible sulfate and acetate modifications. Assignments are made based on confidence values and are used to generate a list of top candidates. This is the first GAG software that requires only the tandem MS for sequence information. While certainly a high-throughput option, the structural assignment conflicts can arise in the form of sulfate loss fragment, internal fragments, or random matches. The authors of HS-SEQ not only note that the software removes the assignments with lower confidence to resolve conflicting assignments but also believe that this may produce false hits when examining samples extracted from biological sources.

The software developed in our laboratory is designed to sequence GAGs of indefinite length by comparing fragments of theoretical structures (in silico) against experimental data without the need for construction of a database, instead using a genetic algorithm optimization technique to limit the number of permutations while keeping analysis time to a maximum of a few minutes. The method assigns structures based on greatest likelihood using fragment ion products as a critical parameter for the genetic algorithm fitness criterion. Fragments that are in direct conflict with the highest scoring structure(s) are not discarded but reviewed again for possible additional components. We have tested this approach on MS2 data from intact CS chains released from the proteoglycan, bikunin. These chains vary in length from 27 to 43 saccharide residues, and vary in the degree of O-sulfomodification from 4 to 7, and thus represent a challenging test of this automated procedure.

Experimental Methods

Mass Spectrometry Analysis

Bikunin GAG MS and MS2 data reported in [41] was used as a proof-of-principle data set for the purposes of testing genetic algorithm efficacy. The monoisotopic peaks were selected via the SNAP algorithm from Bruker DataAnalysis software. Analysis of the MS2 was performed with the software alone and with no user supervision or assistance.

Computational Methods

MS1 analysis of parent ion mass is performed using a composition assignment software module written in the MATLAB coding environment. Monoisotopic peaks and charge states are acquired from Bruker DataAnalysis and deconvoluted to a neutral mass. A composition is derived from one or more neutral mass(es) by searching a data matrix of possible chain lengths, degrees of sulfation, deacetylation, and sodium/hydrogen exchange. The user input also includes the possibility of reducing end modifications, and nonreducing ends that can terminate in unsaturated uronic acids, as is common in enzymatically produced GAG oligomers. Theoretical neutral masses in the spreadsheet are compared against user specified masses with a user-defined mass tolerance. The sequences that match are then used for performing the MS2 analysis.

For MS2 assignment, we implement a genetic algorithm based on fundamental aspects common to all genetic algorithms [42, 43, 44]. For MS2 analysis, the software uses a binary vector to represent glycan structures where on-bits denote an occupied site of SO3 modification. The first step generates two glycan structures at random that fit the expected composition (initialization step) and then proceeds to “breed” these structures into a new generation of candidates (crossover step). The new generation also is subject to potential mutations in their structure in the form of exchanges between their on- and off-bits (mutation step) in an effort to avoid converging upon a local maximum. Theoretical structures created in the crossover and mutation steps are then tested against the experimental MS2 data where the score of each structure is determined based on a closeness-of-fit paradigm (fitness). The scoring system is subject to various factors that will be discussed in detail in future papers. In the case of bikunin, the score of a structure is a naïve model that determines the top candidate based on the number of matching glyocosidic fragments. The primary three steps (crossover, mutation and fitness) are iterated until the maximum fitness value does not change after numerous cycles. The number of iterations required before termination of the algorithm can be defined by the user but is defaulted at a value of 3. The structure(s) containing the highest scores are then examined using additional data interpretation tools that assign fragment peak masses alongside their charge, intensity, and mass error (in ppm).

Experimental MS2 data collected by FTICR is extracted from Bruker Apex user interface software using the SNAP peak-picking algorithm. Monoisotopic peak masses and intensities are extracted in the form of comma-separate value (.csv) files. MATLAB software prompts the user for a .csv file containing mass-to-charge in column 1 and intensity in column 2, with mass-to-charge sorted in ascending order. Parent ion mass and charge must be provided by the user as well as mass information pertaining to a linker region mass on the reducing end. Composition details (chain length and numbers of: sulfation, n-acetylation, Na-H exchange) are calculated from a composition calculation module and then given to the software in the preliminary step before initializing the genetic algorithm.

For bikunin proteoglycan a linker mass of 641.1473 (Gal4S-Gal-Xyl-Serine) was used with the remainder of the bikunin chain length represented as a binary vector.

Software integrates separate functional modules to perform mass calculations of theoretical fragment ions, performing standard genetic algorithm features and scoring theoretical structures against experimental data.

Results and Discussion

As GAG chain length and modification increases, the number of possible structural permutations exceeds a value suitable for practical, computationally efficient search methods. For the chondroitin sulfate oligomers studied here, the number of structural possibilities is as large as 3.7E22 for an oligomer of length 50 (Eq. (2)). The number of possibilities is narrowed down when composition can be assigned and the number of known sulfate modifications is determined. While the paradigm for comparing theoretical structures against experimental data can differ, a minimum number of elements such as fragment type, fragment intensity, and sequence coverage must be considered for complete GAG characterization [45]. Thus, instead of trying to shortcut these facets of analysis, we chose an approach that reduces the total search space. Hundreds of millions of structures may exist for a specific GAG composition, but for a pure sample, only one of these structures is a valid assignment. The impracticality of searching through a massive number of incorrect structures is reduced dramatically when a genetic algorithm search heuristic is applied [44].

The genetic algorithm is an optimization tool that has been used for a wide variety of applications [46, 47, 48, 49, 50, 51]. It mimics the evolutionary process, by using a survival of the fittest mechanism that quickly eliminates large groups of candidates from a pool if they share a feature that does not meet a specific set of criteria [44]. Here we examine the application of this approach to GAG MS2 analysis. We have developed software in the MATLAB coding environment that utilizes the genetic algorithm. GAG sequences are expressed as a binary code where on-bits (1s) and off-bits (0s) represent the presence or absence of modifications, respectively, and can be applied to both CS/DS and HS/Hp GAG classes, Figure 1 [42, 43]. The binary sequence is shortened or lengthened to accommodate the appropriate composition calculated from the parent-ion mass. The number of on- and off-bits in the genome is also adjusted based on the number of modifications observed. The final structure is determined via a genetic algorithm, the workflow for which is shown in Figure 2.
Figure 1

Four-bit binary representation for both CS and HS/Hp glycan disaccharides. Each bit is turned on (assigned 1) if a modification is present and off (assigned 0) if the R-group is a hydrogen. Bit 2 represents R2 which has an acetyl modification instead of hydrogen for an off-bit assignment. In the case of HS where the free-amine is possible, a different numeral can be used to represent the absence of SO3 and acetylation. Additional bits can be introduced to serve as negative control bits as well as a representation for the uronic sugar stereochemistry

Figure 2

(a) Workflow for our MATLAB software. User is asked to input three pieces of information for the software: parent ion mass, mass list from MS2 (charge state deconvolution will be automated), and desired mass accuracy for composition assignment and fragment matching (in ppm). The software automates the remaining steps and calculates compositions from the parent ion mass and generates a list of optimized structures using a genetic algorithm. (User provided information is highlighted in the green box. Automated features are highlighted in blue. Software output is shown in purple.) (b) A demonstration of how genetic operators work on glycan structures. Child candidate modification positions are limited to the modification position of their parents. Mutations, however, are not dependent on parent candidate structure

Improvements in analysis time and search space reduction can be observed using CID MS2 data from several fractions of intact CS chains for the proteoglycan bikunin [41]. The advantage of using these data is threefold. First, the mass spectra are rich in structurally informative fragments. Structural assignment of bikunin from MS2 was done previously with manual de novo analysis of these fragments. Software suitable for analysis should make the same assignments using these fragments without any user supervision. A second advantage is that modifications are limited to a single sulfate group per disaccharide. Sulfate modifications have been shown to only occur on the 4-O position of the amino sugar using enzymatic disaccharide analysis. Reducing the total number of possible modification diminishes the search space dramatically. For example, a CS dp43 with 5 sulfate groups has 20,349 possible structures when only examining the occupancy of the 4-O position but 5,949,147 possible structures when every sulfate position (2-O, 4-O, 6-O) is taken into consideration. A simplified search space allows us to demonstrate proof of principle while still maintaining computational efficiency. Finally, the structures of bikunin fractions have been manually verified and reported in the literature [41]. A common motif among bikunin fractions was observed after manual sequence analysis. We were particularly interested to see if the unsupervised approach with our software also yielded these same patterns. Candidate structures of bikunin GAGs produced in the genetic algorithm cycles are assigned scores based on the number of matched glycosidic fragments in the experimental data. The fitness of a candidate structure is determined using three separate tiers of scoring:
$$ {f}_1=\sum \limits_{i=1}^{dp}{N}_{\mathrm{RE}}-\sum \limits_{i=1}^{dp}{N}_{\mathrm{RE}+\mathrm{SO}3} $$
$$ {f}_2=\sum \limits_{i=1}^{dp}{N}_{\mathrm{NRE}}-\sum \limits_{i=1}^{dp}{N}_{\mathrm{NRE}+\mathrm{SO}3} $$
$$ {f}_3=\sum \limits_{i=1}^{dp}{I}_{\mathrm{glyc}} $$

Unambiguous mass tags such as the linker region dictate that greater emphasis should be placed on the reducing end (Y and Z fragments) and provide a more valid structural assignment. The primary fitness of a score is therefore based on its calculated f1 value, which considers the number of glycosidic fragments from the reducing end (NRE) that are matched in the experimental data. The software then checks to see if any match is potentially a sulfate decomposition peak by adding the mass of an SO3-H exchange (79.9568 Da) and searches the experimental data again for a matching mass. The value of f1 is then reduced by the number of peaks determined to be a product of sulfate decomposition (NRE + SO3).

If the value of f1 is tied among multiple structures, a secondary ranking is then determined with f2, the value of which is based on the number of glycosidic matches from the non-reducing end (B and C fragments). In similar fashion to calculating f1, considerations for potential sulfate decompositions are considered. Non-reducing end fragments are a tier below reducing end fragments since they could potentially match internal fragments due to the lack of an unambiguous mass tag. Incorrect assignment of internal fragments as non-reducing end fragments limits the validity of assignment.

A tertiary score f3 is used after matching glycosidic fragments from both reducing and non-reducing ends. Typically, a small selection of candidate structures (2–4) may end up with equal f1 and f2 values, in which case the summation of the intensities of all matched glycosidic fragments is the tiebreaker. This simple algorithm can and should be continuously fine-tuned for other purposes as software development continues but is sufficient for proof-of-principle purposes.

Eleven bikunin samples of different compositions were tested using the genetic algorithm. Of these 11, the single highest scoring candidate of the genetic algorithm for 9 of these samples matched the structures reported in literature. Without user supervision, the genetic algorithm results also reaffirm the common bikunin motif reported in literature [41], Figure 3. For the remaining two samples, the genetic algorithm software reported multiple top-scoring candidates. MS2 data for these two samples could not unambiguously differentiate these structures; however, the structures reported in literature for these samples were present among the top-scoring candidates. This highlights the importance of data quality for optimal software performance. A lack of informative fragmentation peaks can result in structural ambiguities, but information-rich mass spectra can be interpreted with minimal trouble. However, a genetic algorithm approach has no theoretical minimum for data quality. Spectra not containing sufficient fragmentation for complete glycan characterization can still be interpreted based on available fragment ions and a partial sequence can be generated. Although the spectral quality of bikunin GAG tandem MS is high, more complex and longer chain intact GAGs of proteoglycans may yield less than the full suite of fragments necessary for complete sequencing. In this event, our approach can still be used to determine some portion of the overall glycan structure, as has been done recently for decorin glycans [52].
Figure 3

A list of the highest scoring structures for all MS2 collected on FTICR using the genetic algorithm. The structures provided by the genetic algorithm match the ones reported in literature. The conserved sulfation pattern of bikunin is also observed. For structures dp43-5S and dp43-6S, three structures are tied for highest scores. Alternate structures for these chain lengths are shown in the figure

In addition to matching previously reported structures, a closer examination of other high-scoring candidate structures among samples shows a consistent motif across compositions. Additional structural motifs shown in Figure 4 consistently score within the top five structures of the genetic algorithm. These alternate structures are the ones consisting of similar f1 and f2 scores but have low-intensity values for some of their fragment matches (affecting the value of f3). The high degree of similarity between the primary component identified in literature and the alternate structures may be a result of (A) our scoring method being favored towards reducing end fragments, (B) assigning low-intensity noise peaks as glycosidic fragments, or (C) the possibility of a mixture containing some minor components.
Figure 4

The highest scoring structure assigned to the all bikunin compositions (except d35-7S) provided, where the bracketed region is a variable stretch of unmodified disaccharides, is outlined in blue. Two alternative structures are also frequently observed and outlined in black. The structures appear in the top five highest-scoring candidates for all compositions. For chain length dp43 (both 5SO3 and 6SO3), the highest score is tied among all three structures. Diagnostic fragments to confidently differentiate between these differences are absent

The speed of analysis between using the genetic algorithm versus the exhaustive search of every possible permutation of a composition is shown in Figure 5. Here we see that the genetic algorithm has found the correct answer within a small fraction of the time (0.9–2.5% on average) required to examine every possible structure with the assumption that sulfation only occurs on the 4-O position of the N-acetylgalactosamine. Decrease in search time is primarily due to a reduction in the frequency in which unlikely features are eliminated from the genetic algorithm gene pool. As reported [41], bikunin’s sulfation occurs near the reducing end. Isomeric structures that contain sulfate groups in the non-reducing end ranked lowest in the scoring process, resulting in rapid elimination of a test structure and all structures of similar sulfation patterns with one single iteration. A greater number of iterations were spent refining high-scoring structures once poorly scored structures have been eliminated from consideration. The algorithm is designed to rerun the entire genetic process from scratch multiple times in order to avoid plateauing at local maxima. Convergence upon the same highest scoring structure 5 times was the baseline criterion for an acceptable structural assignment. The repetition number is a user-adjustable parameter, as well.
Figure 5

Speed comparison between the genetic algorithm and exhaustive search method. The bar graph shows the amount of time in hours (left y-axis) it requires for a standard desktop PC (2.4 GHz processor, 4 GB RAM) to exhaustively search through all possible combinations of a specific composition. The line plot shows the percentage of time (right y-axis) that is required for the genetic algorithm to arrive at the correct answer. Overall search space is reduced dramatically as the number of permutations per composition increases

Of particular significance, the efficiency of this approach is found to increase as the total number of permutations increases. For a pure sample, only a single structure can be assigned to the MS2 spectrum, but the number of structures with drastically different modification patterns increases with respect to chain length. An increase in chain length also increases the number of GAG structures that could potentially share a feature not observed in the MS2. Structures containing these features drop out of the algorithm as possible options once a single structure of that particular type is scored.

Calculations shown here are run on a 2.4-GHz dual-core processor with 4 GB of RAM, a standard laptop or desktop computer. Speed of calculations can increase with more powerful processors such as a GPU workstation or computer cluster. It is important to note that the genetic algorithm in MATLAB is operated with separate function calls at each step of the algorithm’s cycle. Parallelization of these function calls is particularly attractive for samples of higher chain length and, in theory, could make spectra interpretation no longer the bottleneck for structural elucidation of GAGs. Additional GAG structures determined using this genetic-algorithm based GAG analysis software have been reported [53].


The software performance is limited by two factors: (1) the quality of the MS2 data and (2) the specificity of the fitness function. The former limitation can be reduced by using a high-performance instrument such as FTICR or Orbitrap mass spectrometers. Some fragment mass values differ by less than 1 Da, increasing the possibility of ambiguity in low-performance instruments. High-resolution mass spectra with single digit or lower ppm mass error minimize margins for incorrect assignment. Acquisition condition must also be optimized for glycan fragmentation and ideally limit production of confounding fragments such as SO3 loss or internal cleavages.

The latter factor, specificity of the fitness function in the genetic algorithm, is one that can be fine-tuned to GAG analysis by tandem mass spectrometry. The fitness function presented in this paper is simple, arbitrary, and based on the basics of glycan analysis. This approach works for the examples selected here because only glycosidic bond cleavage was assigned. Higher level structure analysis based on cross-ring cleavages requires a more sophisticated fitness function. A more complete and non-arbitrary scoring algorithm is being developed that assigns statistical weights and importance factors to various fragment peaks. Additional, peak intensity, while not considered heavily in this iteration of the code, can also signify important characteristics in GAG structure. Details for creating an optimized scoring algorithm will be discussed in future work.

Peak picking for GAG fragmentation is not discussed in this paper but is an important consideration moving forward. Bikunin fragment peaks were selected by the SNAP algorithm using averagine and manually validated; this approach is practical for lowly sulfated samples but averaging is insufficient for highly sulfated compounds due to contributions of sulfur to the A + 2 isotope peak. A fully automated and GAG-specific peak picking system is currently in development.

The software is applicable for GAGs that are both lowly sulfated such as bikunin and moderate and highly sulfated samples for both CS/DS and HS/Hp samples. Short-chain HS with more than one SO3 modification per disaccharide and long-chain chondroitin sulfate such as decorin with approximate 1 SO3 per disaccharide have been determined using our software [52, 53].

The uronic sugar stereochemistry is a variable modification in GAGs that is difficult to observe using just mass spectrometry. EDD data of heparin and heparan sulfate GAGs has produced a small subset of diagnostic fragments capable of distinguishing between glucuronic and iduronic acid epimers [22]. Chemometric applications have yielded a diagnostic fragment ratio that can definitively determine the C5 stereochemistry [54]. Application of this ratio can be integrated into the software after basic structural features have been assigned using the approach presented here.

Funding Information

The authors gratefully acknowledge funding from the National Institute of Health, grants P41GM103390 and R21HL136271.


  1. 1.
    Xie, B., Costello, C.E.: Carbohydrate structure determination by mass spectrometry. Carbohydr. Chem. Biol. Med. Appl. 29–57 (2008)Google Scholar
  2. 2.
    Gandhi, N.S., Mancera, R.L.: The structure of glycosaminoglycans and their interactions with proteins. Chem. Biol. Drug Des. 72, 455–482 (2008)CrossRefPubMedGoogle Scholar
  3. 3.
    Rabenstein, D.L.: Heparin and heparan sulfate: structure and function. Nat. Prod. Rep. 19, 312–331 (2002)CrossRefPubMedGoogle Scholar
  4. 4.
    Ohtsubo, K., Marth, J.D.: Glycosylation in cellular mechanisms of health and disease. Cell. 126, 855–867 (2006)CrossRefPubMedGoogle Scholar
  5. 5.
    Zhao, Y.J., Singh, A., Li, L.Y., Linhardt, R.J., Xu, Y.M., Liu, J., Woods, R.J., Amster, I.J.: Investigating changes in the gas-phase conformation of Antithrombin III upon binding of Arixtra using traveling wave ion mobility spectrometry (TWIMS). Analyst. 14, 6980–6989 (2015)CrossRefGoogle Scholar
  6. 6.
    Zhao, Y.J., Singh, A., Xu, Y.M., Zong, C.L., Zhang, F.M., Boons, G.J., Liu, J., Linhardt, R.J., Woods, R.J., Amster, I.J.: Gas-phase analysis of the complex of fibroblast growth factor 1 with heparan sulfate: a traveling wave ion mobility spectrometry (TWIMS) and molecular modeling study. J. Am. Soc. Mass Spectrom. 28, 96–109 (2017)CrossRefPubMedGoogle Scholar
  7. 7.
    Thanawiroon, C., Rice, K.G., Toida, T., Linhardt, R.J.: Liquid chromatography/mass spectrometry sequencing approach for highly sulfated heparin-derived oligosaccharides. J. Biol. Chem. 279, 2608–2615 (2004)CrossRefPubMedGoogle Scholar
  8. 8.
    Jones, C.J., Beni, S., Limtiaco, J.F.K., Langeslay, D.J., Larive, C.K.: Heparin characterization: challenges and solutions. Annu. Rev. Anal. Chem. 4(4), 439–465 (2011)CrossRefGoogle Scholar
  9. 9.
    Elias, J.E., Haas, W., Faherty, B.K., Gygi, S.P.: Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nat. Methods. 2, 667–675 (2005)CrossRefPubMedGoogle Scholar
  10. 10.
    Cox, J., Neuhauser, N., Michalski, A., Scheltema, R.A., Olsen, J.V., Mann, M.: Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10, 1794–1805 (2011)CrossRefPubMedGoogle Scholar
  11. 11.
    Chi, L.L., Amster, J., Linhardt, R.J.: Mass spectrometry for the analysis of highly charged sulfated carbohydrates. Curr. Anal. Chem. 1, 223–240 (2005)CrossRefGoogle Scholar
  12. 12.
    Cooper, C.A., Gasteiger, E., Packer, N.H.: GlycoMod—a software tool for determining glycosylation compositions from mass spectrometric data. Proteomics. 1, 340–349 (2001)CrossRefPubMedGoogle Scholar
  13. 13.
    Kailemia, M.J., Patel, A.B., Johnson, D.T., Li, L.Y., Linhardt, R.J., Amster, I.J.: Differentiating chondroitin sulfate glycosaminoglycans using collision-induced dissociation; uronic acid cross-ring diagnostic fragments in a single stage of tandem mass spectrometry. Eur. J. Mass Spectrom. 21, 275–285 (2015)CrossRefGoogle Scholar
  14. 14.
    Flangea, C., Serb, A.F., Schiopu, C., Tudor, S., Sisu, E., Seidler, D.G., Zamfir, A.D.: Discrimination of GalNAc (4S/6S) sulfation sites in chondroitin sulfate disaccharides by chip-based nanoelectrospray multistage mass spectrometry. Cent. Eur. J. Chem. 7, 752–759 (2009)Google Scholar
  15. 15.
    Huang, R.R., Pomin, V.H., Sharp, J.S.: LC-MS (n) analysis of isomeric chondroitin sulfate oligosaccharides using a chemical derivatization strategy. J. Am. Soc. Mass Spectrom. 22, 1577–1587 (2011)CrossRefPubMedPubMedCentralGoogle Scholar
  16. 16.
    Leach, F.E., Xiao, Z.P., Laremore, T.N., Linhardt, R.J., Amster, I.J.: Electron detachment dissociation and infrared multiphoton dissociation of heparin tetrasaccharides. Int. J. Mass Spectrom. 308, 253–259 (2011)CrossRefPubMedPubMedCentralGoogle Scholar
  17. 17.
    Bin Oh, H., Leach, F.E., Arungundram, S., Al-Mafraji, K., Venot, A., Boons, G.J., Amster, I.J.: Multivariate analysis of electron detachment dissociation and infrared multiphoton dissociation mass spectra of heparan sulfate tetrasaccharides differing only in hexuronic acid stereochemistry. J. Am. Soc. Mass Spectrom. 22, 582–590 (2011)CrossRefGoogle Scholar
  18. 18.
    Wolff, J.J., Laremore, T.N., Leach, F.E., Linhardt, R.J., Amster, I.J.: Electron capture dissociation, electron detachment dissociation and infrared multiphoton dissociation of sucrose octasulfate. Eur. J. Mass Spectrom. 15, 275–281 (2009)CrossRefGoogle Scholar
  19. 19.
    Wolff, J.J., Laremore, T.N., Busch, A.M., Linhardt, R.J., Amster, I.J.: Influence of charge state and sodium cationization on the electron detachment dissociation and infrared multiphoton dissociation of glycosaminoglycan oligosaccharides. J. Am. Soc. Mass Spectrom. 19, 790–798 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  20. 20.
    Leach, F.E., Ly, M., Laremore, T.N., Wolff, J.J., Perlow, J., Linhardt, R.J., Amster, I.J.: Hexuronic acid stereochemistry determination in chondroitin sulfate glycosaminoglycan oligosaccharides by electron detachment dissociation. J. Am. Soc. Mass Spectrom. 23, 1488–1497 (2012)CrossRefPubMedGoogle Scholar
  21. 21.
    Leach, F.E., Wolff, J.J., Laremore, T.N., Linhardt, R.J., Amster, I.J.: Evaluation of the experimental parameters which control electron detachment dissociation, and their effect on the fragmentation efficiency of glycosaminoglycan carbohydrates. Int. J. Mass Spectrom. 276, 110–115 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  22. 22.
    Wolff, J.J., Chi, L.L., Linhardt, R.J., Amster, I.J.: Distinguishing glucuronic from iduronic acid in glycosaminoglycan tetrasaccharides by using electron detachment dissociation. Anal. Chem. 79, 2015–2022 (2007)CrossRefPubMedPubMedCentralGoogle Scholar
  23. 23.
    Wolff, J.J., Laremore, T.N., Aslam, H., Linhardt, R.J., Amster, I.J.: Electron-induced dissociation of glycosaminoglycan tetrasaccharides. J. Am. Soc. Mass Spectrom. 19, 1449–1458 (2008)CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Wolff, J.J., Laremore, T.N., Busch, A.M., Linhardt, R.J., Amster, I.J.: Electron detachment dissociation of dermatan sulfate oligosaccharides. J. Am. Soc. Mass Spectrom. 19, 294–304 (2008)CrossRefPubMedGoogle Scholar
  25. 25.
    Huang, Y., Yu, X., Mao, Y., Costello, C.E., Zaia, J., Lin, C.: De novo sequencing of heparan sulfate oligosaccharides by electron-activated dissociation. Anal. Chem. 85, 11979–11986 (2013)CrossRefPubMedPubMedCentralGoogle Scholar
  26. 26.
    Leach, F.E., Riley, N.M., Westphall, M.S., Coon, J.J., Amster, I.J.: Negative electron transfer dissociation sequencing of increasingly sulfated glycosaminoglycan oligosaccharides on an orbitrap mass spectrometer. J. Am. Soc. Mass Spectrom. 28, 1844–1854 (2017)CrossRefPubMedPubMedCentralGoogle Scholar
  27. 27.
    Wolff, J.J., Leach, F.E., Laremore, T.N., Kaplan, D.A., Easterling, M.L., Linhardt, R.J., Amster, I.J.: Negative electron transfer dissociation of glycosaminoglycans. Anal. Chem. 82, 3460–3466 (2010)CrossRefPubMedPubMedCentralGoogle Scholar
  28. 28.
    Wolff, J.J., Amster, I.J., Chi, L., Linhardt, R.J.: Electron detachment dissociation of glycosaminoglycan tetrasaccharides. J. Am. Soc. Mass Spectrom. 18, 234–244 (2007)CrossRefPubMedGoogle Scholar
  29. 29.
    Domon, B., Costello, C.E.: A systematic nomenclature for carbohydrate fragmentations in fab-ms ms spectra of glycoconjugates. Glycoconjugate J. 5, 397–409 (1988)CrossRefGoogle Scholar
  30. 30.
    Kailemia, M.J., Li, L.Y., Ly, M., Linhardt, R.J., Amster, I.J.: Complete mass spectral characterization of a synthetic ultralow-molecular-weight heparin using collision-induced dissociation. Anal. Chem. 84, 5475–5478 (2012)CrossRefPubMedPubMedCentralGoogle Scholar
  31. 31.
    Kailemia, M.J., Ruhaak, L.R., Lebrilla, C.B., Amster, I.J.: Oligosaccharide analysis by mass spectrometry: a review of recent developments. Anal. Chem. 86, 196–212 (2014)CrossRefPubMedGoogle Scholar
  32. 32.
    Zaia, J., Costello, C.E.: Tandem mass spectrometry of sulfated heparin-like glycosaminoglycan oligosaccharides. Anal. Chem. 75, 2445–2455 (2003)CrossRefPubMedGoogle Scholar
  33. 33.
    Dancik, V., Addona, T.A., Clauser, K.R., Vath, J.E., Pevzner, P.A.: De novo peptide sequencing via tandem mass spectrometry. J. Comput. Biol. 6, 327–342 (1999)CrossRefPubMedGoogle Scholar
  34. 34.
    Ma, B., Zhang, K.Z., Hendrie, C., Liang, C.Z., Li, M., Doherty-Kirby, A., Lajoie, G.: PEAKS: powerful software for peptide de novo sequencing by tandem mass spectrometry. Rapid Commun. Mass Spectrom. 17, 2337–2342 (2003)CrossRefPubMedGoogle Scholar
  35. 35.
    Taylor, J.A., Johnson, R.S.: Implementation and uses of automated de novo peptide sequencing by tandem mass spectrometry. Anal. Chem. 73, 2594–2604 (2001)CrossRefPubMedGoogle Scholar
  36. 36.
    Campbell, M.P., Hayes, C.A., Struwe, W.B., Wilkins, M.R., Aoki-Kinoshita, K.F., Harvey, D.J., Rudd, P.M., Kolarich, D., Lisacek, F., Karlsson, N.G., Packer, N.H.: UniCarbKB: putting the pieces together for glycomics research. Proteomics. 11, 4117–4121 (2011)CrossRefPubMedGoogle Scholar
  37. 37.
    Maxwell, E., Tan, Y., Tan, Y., Hu, H., Benson, G., Aizikov, K., Conley, S., Staples, G.O., Slysz, G.W., Smith, R.D., Zaia, J.: GlycReSoft: a software package for automated recognition of glycans from LC/MS data. PLoS One. 7, (2012)Google Scholar
  38. 38.
    Saad, O.M., Leary, J.A.: Heparin sequencing using enzymatic digestion and ESI-MSn with HOST: a heparin/HS oligosaccharide sequencing tool. Anal. Chem. 77, 5902–5911 (2005)CrossRefPubMedGoogle Scholar
  39. 39.
    Chiu, Y.L., Huang, R.R., Orlando, R., Sharp, J.S.: GAG-ID: heparan sulfate (HS) and heparin glycosaminoglycan high-throughput identification software. Mol. Cell. Proteomics. 14, 1720–1730 (2015)CrossRefPubMedPubMedCentralGoogle Scholar
  40. 40.
    Hu, H., Huang, Y., Mao, Y., Yu, X., Xu, Y.M., Liu, J., Zong, C.L., Boons, G.J., Lin, C., Xia, Y., Zaia, J.: A computational framework for heparan sulfate sequencing using high-resolution tandem mass spectra. Mol. Cell. Proteomics. 13, 2490–2502 (2014)CrossRefPubMedPubMedCentralGoogle Scholar
  41. 41.
    Ly, M., Leach III, F.E., Laremore, T.N., Toida, T., Amster, I.J., Linhardt, R.J.: The proteoglycan bikunin has a defined sequence. Nat. Chem. Biol. 7, 827–833 (2011)CrossRefPubMedPubMedCentralGoogle Scholar
  42. 42.
    Baeck, T., Schwefel, H.-P.: An overview of evolutionary algorithms for parameter optimization. Evol. Comput. 1, 1–23 (1993)CrossRefGoogle Scholar
  43. 43.
    Fogel, L.J., Owens, A.J., Walsh, M.J.: Artificial intelligence through a simulation of evolution. Proceedings of the Second Cybernetic Sciences Symposium: Biophysics and cybernetic systems. 131–155 (1965)Google Scholar
  44. 44.
    Forrest, S.: Genetic algorithms—principles of natural-selection applied to computation. Science. 261, 872–878 (1993)CrossRefPubMedGoogle Scholar
  45. 45.
    Han, L., Costello, C.E.: Mass spectrometry of glycans. Biochem. Mosc. 78, 710–720 (2013)CrossRefGoogle Scholar
  46. 46.
    Kilgour, D.P.A., Neal, M.J., Soulby, A.J., O’Connor, P.B.: Improved optimization of the Fourier transform ion cyclotron resonance mass spectrometry phase correction function using a genetic algorithm. Rapid Commun. Mass Spectrom. 27, 1977–1982 (2013)CrossRefPubMedGoogle Scholar
  47. 47.
    Das, S., Suganthan, P.N.: Differential evolution: a survey of the state-of-the-art. IEEE Trans. Evol. Comput. 15, 4–31 (2011)CrossRefGoogle Scholar
  48. 48.
    Knowles, J.D., Corne, D.W.: Approximating the nondominated front using the Pareto archived evolution strategy. Evol. Comput. 8, 149–172 (2000)CrossRefPubMedGoogle Scholar
  49. 49.
    Phillips, S.J., Anderson, R.P., Schapire, R.E.: Maximum entropy modeling of species geographic distributions. Ecol. Model. 190, 231–259 (2006)CrossRefGoogle Scholar
  50. 50.
    Tavazoie, S., Hughes, J.D., Campbell, M.J., Cho, R.J., Church, G.M.: Systematic determination of genetic network architecture. Nat. Genet. 22, 281–285 (1999)CrossRefPubMedGoogle Scholar
  51. 51.
    Verdonk, M.L., Cole, J.C., Hartshorn, M.J., Murray, C.W., Taylor, R.D.: Improved protein-ligand docking using GOLD. Proteins Struct. Funct. Genet. 52, 609–623 (2003)CrossRefPubMedGoogle Scholar
  52. 52.
    Yu, Y.L., Duan, J.N., Leach, F.E., Toida, T., Higashi, K., Zhang, H., Zhang, F.M., Amster, I.J., Linhardt, R.J.: Sequencing the dermatan sulfate chain of decorin. J. Am. Chem. Soc. 139, 16986–16995 (2017)CrossRefPubMedGoogle Scholar
  53. 53.
    Singh, A., Kett, W.C., Severin, I.C., Agyekum, I., Duan, J.N., Amster, I.J., Proudfoot, A.E.I., Coombe, D.R., Woods, R.J.: The interaction of heparin tetrasaccharides with chemokine CCL5 is modulated by sulfation pattern and pH. J. Biol. Chem. 290, 15421–15436 (2015)CrossRefPubMedPubMedCentralGoogle Scholar
  54. 54.
    Agyekum, I., Patel, A.B., Zong, C.L., Boons, G.J., Amster, I.J.: Assignment of hexuronic acid stereochemistry in synthetic heparan sulfate tetrasaccharides with 2-O-sulfo uronic acids using electron detachment dissociation. Int. J. Mass Spectrom. 390, 163–169 (2015)CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© American Society for Mass Spectrometry 2018

Authors and Affiliations

  1. 1.Department of ChemistryUniversity of GeorgiaAthensUSA

Personalised recommendations