Elucidation of chromatographic peak shifts in complex samples using a chemometrical approach
Chromatographic retention time peak shifts between consecutive analyses is a well-known fact yet not fully understood. Algorithms have been developed to align peaks between runs, but with no specific studies considering the causes of peak shifts. Here, designed experiments reveal chromatographic shift patterns for a complex peptide mixture that are attributable to the temperature and pH of the mobile phase. These results demonstrate that peak shifts are highly structured and are to a high degree explained by underlying differences in physico-chemical parameters of the chromatographic system and also provide experimental support for the alignment algorithm called the generalized fuzzy Hough transform which exploits this fact. It can be expected that the development of alignment algorithms enters a new phase resulting in increasingly accurate alignment by considering the latent structure of the peak shifts.
KeywordsChromatography Alignment PCA
Chromatographic retention mechanisms have been studied and modeled since the 1970s. In the field of quantitative structure–retention relationships (QSRR), the focus has been on predicting parameters, such as log k-values, log P values, log D-values, and retention factor ratios, which are based on molecular descriptors [1, 2, 3, 4]. Other studies have targeted the effect of pressure on chromatographic selectivity for comparing the relatively new ultra-high-pressure liquid chromatography (UHPLC) to more traditional high-performance liquid chromatography (HPLC) separations operated at lower pressures [5, 6]. However, these approaches cannot be used to investigate the fine-structure of peak shifts arising under nominally identical chromatographic conditions.
Peak shifting is a common phenomenon observed in chromatographic separations that may cause problems identifying which peak is which, especially for datasets with a large number of samples and many peaks per sample. Failure to find the correct peaks that correspond to the same analyte between samples often results in poor statistical analysis. This problem has been coined the correspondence problem [7, 8, 9, 10, 11, 12]. The causes of peak shifting in chromatography have been addressed and described in literature . However, so far, the research focused on solving the correspondence problem has only aimed to make sure the data are properly aligned for statistical analysis, and several retention time alignment algorithms have been reported in literature, such as nearest-neighbor clustering [13, 14], binning [15, 16], and warping [17, 18, 19, 20]. Some of these algorithms have been implemented in chromatography data analysis software, such as msInspect, MZmine, OpenMS, XCMS, or TracMass 2 [10, 21].
This work fuses alignment methodology with retention time modeling to improve the understanding of chromatographic retention in complex systems. For this purpose, a liquid chromatography coupled with mass spectrometry (LC/MS) method for analyzing a complex mixture was setup using an experimental design, with temperature of the column and pH of the mobile phase as variables provoking retention time shifts. The model sample used was a tryptic digest of human serum albumin (HSA), which is a mixture of peptides, resulting from the selective digestion by the enzyme trypsin. This enzyme cleaves proteins exclusively at specific amino acids along the sequence (C-terminal to arginine or lysine). Along with the theoretical peptides, other species may occur in the mixture, such as possible miscleavages and post-translational modifications in the protein. This moderate sample complexity provides a good model for this study, as the peptide elution is well distributed along the chromatogram and the experimental design conditions tested, i.e., pH of the mobile phase and column temperature, influence the retention times with different patterns depending on the peptide. In addition, we also want to demonstrate experimental evidence in favor of generalized fuzzy Hough transform (GFHT) alignment algorithm.
Generalized fuzzy Hough transform
Peak alignment methods such as binning, nearest-neighbor, clustering, warping, or combinations of these may result in ambiguities in peak matching, especially when aligning peaks from very complex mixtures . Recent development of alignment algorithms employing the generalized fuzzy Hough transform (GFHT) has been reported . GFHT is a derivation of the Hough transform, initially applied in image analysis for the detection of patterns, such as lines or features in images. Recently, it was adapted and evolved into the GFHT with the purpose of aligning 1D NMR peak data . More recently, GFHT has also been applied in the alignment of chromatographic peaks, as an additional alignment step in TracMass 2, an open-source program designed to align LC-MS (liquid chromatography coupled with mass spectrometry) data. An important feature that arises with GHFT is that the ambiguities in peak matching that frequently occur when using other alignment methods can be resolved. This is accomplished by pre-calibrating the model on the shifts of peaks with known correspondence by means of principal component regression (PCR). Then, the retention times of the peaks with unknown correspondence can be predicted [8, 9, 10]. This method assumes that peaks shift according to patterns that are attributed to causes related to chromatographic instability. These patterns, however, have not yet been attributed to specific causes and reported in literature. In this work, the column temperature and the pH of the mobile phase were taken as two of the several possible factors that may influence retention time shifts. Then, retention time shifts were modeled using these parameters in a controllable fashion, i.e., at predetermined levels using experimental design and in magnitudes that influence the retention times more than than expected random variations of these factors and other uncontrollable factors.
Experimental design by MLR and PCA
Reagents, materials, and instrumentation
Human serum albumin (HSA), trypsin from bovine pancreas, dithiothreitol, iodoacetamide, ammonium bicarbonate, formic acid, and ammonium formate were purchased from Sigma-Aldrich (Steinheim, Germany). Gradient grade acetonitrile and analytical grade water were purchased from Honeywell Riedel-de Haën (Seelze, Germany). A HPLC-MS system composed of a Thermo Fisher Scientific (Waltham, MA) Q Exective HF orbitrap mass analyzer coupled with a Thermo Fisher Scientific (Waltham, MA) UltiMate BioRS HPLC, equipped with a Thermo Fisher Scientific (Waltham, MA) HyPURITY column (C–18, i.d. 2.1 mm, length 100 mm, particle size 3 μ m) was used in this study. The mass detector operated in full-scan positive mode, for m/z values ranging from 100 to 2000. The HPLC was programmed to run with a flow rate of 0.25 mL min− 1. The organic mobile-phase gradient was set to start at 5%, raising to 45% for 30 min, then to 100% for 10 min, and back to 5% for 5 min. The temperature of the column was set in each run according to the experimental design as described bellow.
Sample preparation (HSA tryptic digest)
To 5 mL of human serum albumin (150 μ M in 100 mM ammonium bicarbonate), 5 mL of dithiothreitol (100 mM) were added and let to react for 30 min at 60 ∘C. After cooling down, 5 mL of iodoacetamide (100 mM) was added and the mixture was kept in the dark for 30 min at room temperature. Five milliliters of trypsin (150 μ M in water) was added to the mixture and let to react overnight at 37.5 ∘C. The sample was divided in small aliquots and preserved at − 20 ∘C. Before analysis, 100 μ L of sample were diluted in 20 mM formic acid/ammonium formate buffer solution with pH 3.75.
Five aqueous (water) and organic (1:9 water/acetonitrile) mobile phases were prepared containing 20 mM formic acid/ammonium formate buffers with pH values 3.25, 3.50, 3.75, 4.00, and 4.25 respectively.
Experimental values of the factors for the five-level full factorial design
TracMass 2 parameters
ZAF Sigma 1
ZAF Sigma 2
Results and discussion
Experimental design regression coefficients
PCA of the retention time data
PCR of the retention time data
Experimental design coefficients by MLR and PCA
An application of experimental design to study the influence of column temperature and pH of the mobile phase on the chromatographic separation of peptides revealed well-defined retention time shift patterns. The temperature affects retention times in a linear fashion, which means that peaks will shift linearly with changes in the column temperature. On the other hand, the pH of the mobile phase affects the retention times in a quadratic fashion. Yet, different peaks were affected to different extents and six patterns of peak shifts were found varying these two factors. Moreover, it was demonstrated that the retention time can be modeled better from the retention times of other compounds by means of PCR than from the experimental design. This provides experimental evidence that supports the previously reported generalized fuzzy Hough transform alignment algorithm, which aligns shifted peaks based on patterns derived from the data, by demonstrating that the GFHT approach to alignment is trustworthy, in the sense that shifts of peaks from other analytes can reliably predict the position of missing or ambiguous peaks from groups of peaks belonging to the same analyte. In all, a better understanding of the shift patterns may contribute to the development of better alignment algorithms.
Compliance with ethical standards
Conflict of interest
The authors declare that they have no conflicts of interest.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.