Introduction

The taeniasis/cysticercosis complex is a zoonosis caused by the presence of the parasite Taenia solium in humans and is considered to be a neglected disease that results in a serious public health and economic burden to developing countries in Africa, Asia, and Latin America (Flisser and Correa 2010; Esquivel-Velazquez et al. 2011b; Del Brutto and García 2015). The adult stage of T. solium develops in the small intestine of humans (taeniasis), producing eggs that are released in the feces. Human cysticercosis is caused by the presence of the larvae of T. solium, which can be acquired by Taenia egg-contaminated water, food, and surfaces (including soil and hands). The most common locations for the larval form are the skeletal muscles, ocular system, and central nervous system (CNS). Neurocysticercosis (NCC) is a public health problem in Brazil, with a total of 1829 NCC deaths in 12,491,280 reported deaths between 2000 and 2011 (Martins-Melo et al. 2016). It is very important that the disease is diagnosed before the stage of evolution to calcification. Computed tomography and magnetic resonance imaging are currently the techniques that assist in the diagnosis of this disease (Del Del Brutto et al. 2017). Antibody detection in serum by immunological methods and its correlation with cerebrospinal fluid (CSF) are key factors in tracking the progression of the disease. Some glycoproteins of the larval form of T. solium and Taenia crassiceps have been characterized and studied for their use in the immunodiagnosis of NCC and/or the development of synthetic or recombinant vaccines (Tsang et al. 1989; Vaz et al. 1997; Peralta et al. 2002; Peralta et al. 2010; Lee et al. 2011; Ferrer et al. 2012; Salazar-Anton and Lindh 2011; Salazar-Anton et al. 2012). The VF of cysticerci is mainly composed of water but also contains calcium, glycoproteins, cholinesterase, and coproporphyrin. This composition confers antigenicity, among other properties, to this fluid (Martinez-Zedillo and Rebolledo-Camacho 1987; Sciutto et al. 2007). Consequently, VF antigens of T. crassiceps are used with high sensitivity for immunodiagnosis in both active and inactive forms of NCC (Bueno et al. 2000; Pardini et al. 2002; Peralta et al. 2002; Espindola et al. 2005). In addition, the cysticerci molecules have been used in vaccines and in immunological studies of the host–parasite interaction (Almeida et al. 2009; Manhani et al. 2011; Parra-Unda et al. 2012; Marzano et al. 2017).

In recent years, proteomic tools have shown great importance and applicability in different biological areas, including parasitology (Verissimo da Costa et al. 2013; Ray et al. 2014). Although a transcriptomic study was performed comparing T. solium and T. crassiceps (García-Montoya et al. 2016), an extensive shotgun proteomics study with comparative analysis has not yet been performed, except for some preliminary work using two-dimensional gel electrophoresis analyses of different stages of the parasite, such as the oncosphere, excretion–secretion proteins, and VF of the cysticerci (Santivanez et al. 2010; Esquivel-Velazquez et al. 2011a; Victor et al. 2012; Navarrete-Perea et al. 2014; Nativel et al. 2016).

The aim of this study was to develop a gel-free strategy to establish a comparative protein and peptide profile of SVE from both T. solium and T. crassiceps by liquid chromatography coupled to mass spectrometry. We also performed a BLAST search and an antigenic prediction analysis of the peptides identified by mass spectrometry that could be candidates for further use as a target in laboratory diagnosis assays and vaccine development.

Materials and methods

T. solium and T. crassiceps saline vesicular protein extraction

SVE was obtained from the larval form of T. crassiceps ORF strain (Freeman 1962) (SVE-Tcra) and of T. solium (SVE-Tso), as described previously by Vaz et al. (1997) and Peralta et al. (2002). Briefly, intact parasites were ruptured (PYREX® Ten Broeck Homogenizer, Thomas Scientific) in five volumes of phosphate buffered saline (PBS, 0.075-M Na2HPO4, 0.0025-M NaH2PO4, 0.14-M NaCl, pH 7.2) and centrifuged at 15,000×g for 60 min at 4 °C. The supernatant was sonicated at 20 kHz, 1 mA for four periods of 60 s each in an ice bath. After an additional centrifugation step, a pool of protease inhibitors (phenylmethylsulfonylfluoride and iodoacetamide; Sigma Chemical Company, St. Louis, MO, USA) was added to the supernatant to reach a final concentration of 0.25 mM. The protein concentration of the extracts was determined using a commercial BCA protein assay reagent (Pierce, Rockford, Illinois, USA), according to the manufacturer’s instructions.

Saline vesicular extract: protein solubilization and digestion protocols

Initially, 200 μg of each crude SVE protein was precipitated with trichloroacetic acid (TCA), as described by Elliott et al. (1993). To evaluate the best solubilization conditions for gel-free proteomic analysis, 100 μg of precipitated protein was resuspended in 50 μL of 50-mM ammonium bicarbonate buffer pH 8.0. One vial with 25 μL of protein solution was supplemented with 25 μL of surfactant RapiGest™ reagent 0.2% (w/v) (Waters Co., Williford, USA), and the mixture was heated at 80 °C for 15 min. The other 25 μL of protein solution was mixed with only 25 μL of ammonium bicarbonate buffer. The protein was reduced by the addition of dithiothreitol solution (Sigma–Aldrich, USA) to a final concentration of 5 mM and was incubated for 30 min at 60 °C. The alkylation was carried out with the addition of iodoacetamide solution (Sigma–Aldrich, USA) to a final concentration of 14 mM and incubated at room temperature for 30 min in the dark. Proteins were digested with the addition of 10 μL of trypsin solution (Promega, Wisconsin, USA) at a ratio of 1:50 (enzyme: substrate) and incubated for 14 h at 37 °C. To remove the surfactant, the RapiGest™ was cleaved by the addition of 20 μL of 5% (v/v) trifluoroacetic acid and incubating for 90 min at 37 °C. The sample was then centrifuged at 16,873×g for 30 min at 6 °C. The supernatant was removed and dried in a speed vac (Savant SPD111V, Thermo Scientific, USA) and purified by an OASYS system (Waters Corporation, UK) using methanol as the organic solvent, following the manufacturer’s instructions. The peptides were resuspended with formic acid to a final concentration of 0.1% in 3% acetonitrile for mass spectrometry analysis or were stored at − 70 °C. This same procedure was performed on the sample without surfactant.

LC–MS/MS analysis

Preliminary evaluation of solubilization and digestion protocols

Two microliters of peptide solutions (500 ng of digested protein with and without the surfactant) was used for the nano-LC-based separation combined with mass spectrometry analysis on an LC-ESI-Q-TOF micromass instrument (Waters Co., Williford, USA) with data-dependent acquisition (DDA). The peptide separation was performed in a nano-ACQUITY system equipped with a Symmetry C18 5-μm diameter, 5 × 300 precolumn and an Atlantis 100 × 100, 1.7-μm diameter analytical reversed-phase C18 column with a solution gradient of 5–50% mobile phase (acetonitrile) over 50 min at flow rate of 350 nL/min. The column temperature was maintained at 35 °C. The lock mass used was phosphoric acid, delivered by an auxiliary pump at a flow rate of 600 nL/min. The conditions for peptide ionization included a source temperature of 80 °C, capillary voltage of 3500 V, positive polarity, and a sample cone voltage of 35 V. Mass spectra were acquired with the TOF mass analyzer operating in the V-mode, and spectra were integrated over 1 s of scanning and with 0.1-s interscan intervals. The MS/MS mass spectra were acquired at a m/z range of 50 to 1700 using the reference mass acquired and the continuous fragmentation mode at a 10-eV collision energy.

The DDA raw data were processed and searched by the Peaks 7.5 software server search engine (Bioinformatics Solutions, Inc., Waterloo Canada), Mascot Distiller (http://www.matrixscience.com/) and ProteinLynx Global Server (PLGS) 2.5.1 (Waters, Inc., Williford, USA) using a tolerance up to ± 0.1 Da for both precursor and fragment ions. A maximum of one missed trypsin cleavage site was chosen. Cysteine carbamide methylation and methionine oxidation were selected as fixed and variable modifications, respectively. Protein identification was performed by searching the mass spectrometric data against the Taenia genus (released in April 2016) UniProt protein database containing reversed sequences with a false discovery rate (FDR) < 1%. Only proteins that were identified by three proteomic software programs were considered to be valid hits. To evaluate the best protocol for the preparation of VF, samples (with and without surfactant) were analyzed using three biological replicates.

Comparative proteomic analyses of saline vesicular extracts of Taenia solium and Taenia crassiceps

Peptide extracts from the sample preparation protocol using RapiGest™ were dried using a speed vac centrifuge and were resuspended in 100 μL of ion exchange loading buffer (5-mM ammonium formate and 5% acetonitrile, pH 3.2) to a final peptide concentration of 1.0 μg/μL. The peptides were analyzed by 2D-LC–MS/MS on a Waters nano-ACQUITY UPLC system coupled to a Synapt G1 HDMS system (Waters Co., Williford, USA).

The 2D-LC–MS was performed using a 180-μm × 20-mm strong cation exchange (SCX) column (nano-ACQUITY UPLC SCX trap column, Waters, Milford, MA, USA) for the first dimension and a trap column (180 μm × 20 mm). The auxiliary pump allowed the SCX column to be equilibrated with loading buffer and the step to be performed. Two fractions (50 and 200-mM ammonium formate, containing 5% acetonitrile) and one fraction (containing 200-mM ammonium formate buffer with 30% acetonitrile) were used to establish a step gradient for cationic exchange. The second dimension of the procedure was performed using a 5-μm Symmetry C18 material trap column (Waters Co., Williford, USA) and a BEH130 C18 reversed-phase analytical column (1.7 μm, 75 μm × 150 mm) (Waters Co., Williford, USA). The peptides were eluted using a linear gradient of 0.1% formic acid in water (mobile phase A) and 0.1% formic acid in acetonitrile (mobile phase B) for 60 min, performed by the BSM at a flow rate of 300 nL/min.

[Glu1]–fibrinopeptide B (GFP) (Sigma–Aldrich, Co., LLC, USA) was used as lock mass correction (100 fmol/μL in 50:50:1, methanol:H2O:acetic acid) for accurate MS post-acquisition measurements, with an injection once every 30 s. The high/low collision energy acquisition mode was established by alternating using 0.8 s with a 0.02-s interscan delay time. In the low-energy MS, data were collected at constant collision energy of 4 eV. In the high-energy MS mode, the collision energy was ramped from 15 to 50 eV. Four biological replicates were used in the comparative analysis of the VF of T. solium and T. crassiceps.

Data processing and analysis

The data analysis of the comparative proteomics of T. crassiceps and T. solium SVE was conducted qualitatively by determining all ions of the two species to identify unique peptides and corresponding proteins that could further be used as antigen candidates. The MSE raw data were processed and analyzed using ProteinLynx 2.5.1 (Waters Corp., Manchester, U.K.) and the IDENTITYE algorithm. The basic search parameters for protein identification were as follows: Taenia UniProtKB databank (released in April 2016), one missed cleavage by trypsin, carbamidomethyl (C), and methionine oxidation as fixed and variable modifications, respectively. The precursor and fragment ion mass error tolerances were 10 and 20 ppm, respectively. The criteria used for a positive protein match were at least three fragment ions per peptide, five fragment ions per protein, and at least one peptide per protein hit. A false-positive discovery rate (FDR) was allowed up to 1%. All identified peptide sequences were analyzed using UniProt-BLAST to verify the total identity and exclusivity to the Taenia genus.

To verify the antigenicity prediction, the peptides sequenced by mass spectrometry containing 100% identity with proteins of the Taenia genus were submitted to the IEDB web server (http://tools.immuneepitope.org/bcell/), which includes the amino acid propensity scales of Hopp–Woods (hydrophilicity), Emini (surface probability), Jameson–Wolf (antigenic index), and Karplus–Schulz (flexibility). We also used the Protean application of DNASTAR Lasergene version 7.2 (DNASTAR, Inc., Madison, WI, USA). Default settings were applied to all of the peptide sequence analysis tools that were used. Only antigenic prediction regions that were found as a positive match by both software programs were selected and annotated.

Gene ontology consortium (http://www.geneontology.org/) analysis was performed to assess the holistic biological role and molecular function of the identified proteins in both species.

Results

Comparison of the use of surfactant and preliminary analysis by DDA mass spectrometry

Overall, the strategy that yielded a higher number of identified proteins and peptide hits was the in-solution digestion using the surfactant buffer protocol, as shown in Fig. 1a, b. It is likely that the surfactant method allowed for the identification of a larger number of proteins in T. solium (41 proteins and 179 total corresponding peptides) compared to the samples extracted in the absence of the surfactant (22 proteins and 92 total corresponding peptides). Likewise, we identified 52 proteins (210 total corresponding peptides) in T. crassiceps surfactant-containing samples, while only 39 proteins (81 total corresponding peptides) were found in T. crassiceps samples without surfactant (Fig. 1a). Finally, the preliminary analysis by DDA acquisition mode resulted in the identification of 144 accumulated protein hits, which are listed and depicted in supplementary Tables 1, 2, 3, and 4.

Fig. 1
figure 1

Distribution of protein (blue) and peptide (red) numbers identified in the vesicular fluids of T. solium and T. crassiceps cysticerci by using in-solution sample preparation procedures with and without surfactant reagent (a). Number of protein hits found with 1 to > 3 peptides using sufactant (b) and without surfactant (c) in T. crassiceps samples

The number of peptides per protein found in each condition and species was also evaluated. In T. crassiceps, 34 proteins were identified with two or more peptides in the protocol that used RapiGest™ (Fig. 1b), while only 14 proteins were identified when RapiGest™ was not added (Fig. 1c). This same positive effect was observed in the improvement of the extraction technique and, consequently, in the number of proteins identified with more than three peptides. When the surfactant was used, this number increased to 16 proteins (Fig. 1b), whereas the non-surfactant protocol allowed the identification of only five proteins (Fig. 1c).

Comparative profiles of proteins and peptides of T. solium and T. crassiceps by MSE high/low acquisition mass spectrometry

Once the best protocol for sample digestion was established, the analysis was expanded by MSE (high/low) acquisition using LC-2D/ESI–MS/MS, allowing identification of the proteins and peptides found in SVE of cysticerci from each species. The cyst of T. solium consists of a scolex of the future tapeworm surrounded by a vesicle formed by the extension of the parasite’s tegument. In the interior, the liquid is composed of mineral salts, proteins, uric acid, urea, creatinine, traces of glucose, and cholesterol, similar to cerebrospinal fluid (CSF). These proteins are synthesized by cysticerci cells and represent, in addition to the excretion/secretion proteins, the proteins involved in metabolism, which will also be present in the adult worm (Gomez et al. 2015). In the SVE of T. crassiceps, no protoscolex proteins are found because the ORF strain does not have this structure, but in T. solium, these proteins may be present in small amounts. The number of proteins found only in the SVE of T. solium was slightly higher than that of T. crassiceps. However, proteins that were defined as secreted proteins were found in both SVEs (Victor et al. 2012; Marzano et al. 2017).

The qualitative analysis by MSE high/low mode acquisition showed 79 proteins in Taenia species; 29 proteins only in T. solium, 11 proteins only in T. crassiceps, and 39 proteins in both (Fig. 2a; supplementary Table 5). Based on a Gene Ontology search, the proteins were located in different compartments, such as the cytoplasm, cytoskeleton, membrane, and nucleus. The biological function analysis showed that the proteins exhibited widespread functions that involved protein folding, cell-movement regulation, cell–cell interactions, cell division, antioxidant regulation, gluconeogenesis, and cell cycle division step regulation (Fig. 2b).

Fig. 2
figure 2

The number of identified proteins by high/low mode acquisition (MSE) Synapt HDMI in T. solium and T. crassiceps cysticerci vesicular extracts (a). Distribution of the biological function and protein cell compartment of the identified proteins (b)

BLAST search and B cell linear epitope prediction

To predict the epitope in each identified protein, all peptide sequences from the MSE analyses were searched against UniProt Knowledge Base (UniProtKB) using BLAST to verify the identity and exclusivity to the Taenia genus. A total of 726 peptides were found only in the T. crassiceps sample. However, only 58 (7.7%) unique peptides were found, as is shown in supplementary Table 6. Some proteins from the related Echinococcus genus exhibited cross-reactivity to Taenia sp. antigens; 258 peptides of T. crassiceps (35.0%) presented 100% identity to Echinococcus genus proteins, whereas 137 peptides (18.6%) showed over 90% identity.

For T. solium SVE, BLAST analysis from 825 peptide queries resulted in 56 (7%) unique identifications, whereas 291 (35.0%) peptides showed 100% identity to Echinococcus genus proteins. Thirty-three peptides were identified in samples of SVE from both Taenia species, and there were a total of 147 unique peptides (supplementary Table 6).

The epitope prediction analysis of those peptides showed that 47 peptides presented at least 80% identity to the amino acid residues and were thus indicated as potentially antigenic (Table 1). The remaining peptides from both SVE samples, which did not match at 100% identity to the Taenia genus sequence database, were not considered for further epitope prediction analysis.

Table 1 List of peptides identified by mass spectrometry MSE high/low mode acquisition with high number (> 80%) of epitope residues predicted by IEDB and Protean softwares

Discussion

Neurocysticercosis diagnosis requires proper interpretation of clinical, neuroimaging and serological data in the correct epidemiological context (Ito et al. 2016; Rajshekhar 2016). Early studies on the cross-reactivity between SVE of T. crassiceps and total extracts of T. solium antigens confirmed that both parasites share epitopes that are present in amounts sufficient to be used as antigen sources in immunological tests. These antigens have been used for the detection of antibodies in the CSF and serum of NCC patients (Peralta et al. 2002, 2010).

The proteomics of T. solium and T. crassiceps are not yet well characterized, with few papers in the literature addressing protein identification in the different forms and structures of the parasite, such as the oncosphere, the cytoskeleton, and excretion-secretion and VF proteins of the cysticerci (Santivanez et al. 2010; Victor et al. 2012; Diaz-Masmela et al. 2013; Navarrete-Perea et al. 2014; Reynoso-Ducoing et al. 2014). Different strategies in the preparation of the protein extracts or even in the analysis procedures, used in the few publication available, may explain the qualitative and quantitative differences of the proteins found in our study. Most of the proteomic analysis studies for this parasite were carried out using an in-gel methodology, but this approach fails to visualize all of the proteins in a complex sample. A typical 1D- or 2D-GE gel can visualize only 30–50% of the entire proteome. In particular, those proteins present in extremely low concentrations or proteins that cannot be separated in-gel due to their physicochemical properties (PI, hydrophobicity, molecular weight) will not be detected. To overcome some of these challenges, several gel-free high-throughput technologies for proteome analysis have been developed (Baggerman et al. 2005; Verissimo da Costa et al. 2013). In the proteomic approach used in this work, the multistep separation strategies have been able to detect a number of low-abundance proteins using single-step gel-free analysis of the SVE of Taenia species with on-line 2D-liquid chromatography. In the first LC cationic exchange step, we fractionated each sample into eight fractions that were then separated by a reversed-phase C8 column. Therefore, we reduced the complexity of the samples while also identifying low-abundance proteins.

By Gene Ontology/UniProt analysis, 24.6% (19/76) of proteins were identified as those described by Victor et al. (2012) as excretory/secretory proteins when SignalP and SecretomeP softwares were used. These approaches were not used in our study because to predict these proteins was not the focus of the study, but were also found as expected, since we worked with a soluble extract of cysticerci. Thus, only Gene Ontology/UniProt analysis was performed by observing the biological function and cellular/subcellular location of all proteins found when the information was available.

The present work compared two sample preparation protocols—with and without adding the surfactant reagent RapiGest SF™. Surfactants are routinely used with considerable efficiency in the preparation and digestion of sample proteins for SDS-PAGE and for in-solution procedures because they can improve the solubility of hydrophobic compounds and increase the number of identified proteins in complex mixtures (Wu et al. 2011). The results showed that the number and coverage of identified proteins were increased when a surfactant was included in the sample preparation.

Proteomic data have contributed significantly to the validation and annotation of proteins and/or genes in a genome project database. However, when a proteome or genome databank is not yet available, such as in the case of T. crassiceps, protein identification can be accomplished by de novo sequencing or through a search against a phylogenetically related species databank. Therefore, we selected the Taenia genus databank to search for and identify peptides and proteins from SVE in both species. This approach allowed for the identification of common and unique T. solium and T. crassiceps peptides, which were compared by a label-free MSE methodology. Some of the identified proteins presented carbohydrate moieties, which are also found in the larval form of T. solium and T. crassiceps. These proteins are used as antigens in the immunodiagnosis of NCC and/or in the development of synthetic or recombinant vaccines against human cysticercosis (Greene et al. 2000; Hancock et al. 2003; Peralta et al. 2010; Atluri et al. 2011; Lee et al. 2011; Salazar-Anton and Lindh 2011; Ferrer et al. 2012; Salazar-Anton et al. 2012).

The BLAST analysis of the peptides sequenced by mass spectrometry resulted in a high identity of proteins and antigens between both genera, which not only are related phylogenetically but also show cross-antigenicity (Ishida et al. 2003; da Silva Ribeiro et al. 2010). It was also observed that the BLAST outcome identified 35% of the total peptides found in the Echinococcus genus. This fact reinforces the difficulty of finding novel specific antigens to improve the immunodiagnostic assay for cysticercosis.

Antigenicity or epitope prediction approaches can be classified into the following categories: prediction of proteasomal cleavage sites, prediction of TAP binders, prediction of MHC-binding regions, and prediction of T and B cell epitopes (Yang and Yu 2009). Computational algorithms offer a fairly accurate and rapid determination of epitope location on an allergen molecule. This approach was used to identify predicted antigens from Taenia. Some of these proteins have been described in T. crassiceps and T. solium parasites as potential targets for immunodiagnostic assay development, whereas others are in the development phase. The proteotypic peptides with antigen prediction were identified in myosin-like protein, paramyosin, filamin, annexin, P27 protein, and immunogenic proteins. The P27 protein has been described as an important molecule in the regulation of intracellular transport and is involved in clathrin-mediated endocytosis, binding to membrane vesicles and inducing tubular conformations (Nhancupe et al. 2013). Recently, several authors have described this protein as a potential target antigen for the serodiagnosis of NCC using western blot and immunodot blot, with levels of sensitivity and specificity estimated to range from 95.6–97.8% and 76.4–86.7%, respectively (Salazar-Anton et al. 2012). Gel-free proteomic analysis enabled the identification of one proteotypic peptide for P27 protein that showed antigenicity prediction (80%). Moreover, three peptides identified from paramyosin with high antigenicity prediction were also experimentally assayed for B cell receptors by two different research groups (López-Moreno et al. 2003; Gazarian et al. 2012).

In recent years, some studies have shown that the 14- and 18-kDa protein fractions of T. crassiceps present immunogenic specificity in the serodiagnosis of cysticercosis. These fractions were described as a composite of glycan chains linked to peptide residues (Peralta et al. 2010). The N-terminal amino acid sequencing data from the 14- and 18-kDa subunits suggested that both are similar and present partial homology to the 10-kDa protein from T. solium (Esquivel-Velazquez et al. 2011a). In the present work, an immunogenic protein (UniProt accession number D5MRS9) was detected through a unique peptide with 100% antigenicity prediction. Some residues of D5MRS9 are also in the 14-kDa protein of T. crassiceps SVE described by Peralta et al. (2010), with a high antigenicity index. The characterization and amino acid sequencing of antigen extracts in this work is an effort toward the better identification of the components shared between T. crassiceps and T. solium that might be recognized by human antibodies. The identified peptide spectra (EPLDDSHVK) of an immunogenic protein (D5MRS9) showed identity similarity (> 88%) with another peptide (EPLDESHVK) identified to a cysteine proteinase (Q7M469) with a change of only one amino acid. This protein has been used in immunoassays and in studies for the development of a vaccine (León et al. 2013). A similar sequence has been described in T. crassiceps and has been used as an antigenic synthetic peptide (Lima et al. 2013). The comparison between the sequences of these three peptides can reinforce the importance of EPLD residues in antigen-antibody complex interactions.

Conclusion

The in-solution digestion approach is novel to Taenia proteomics, and it led to the identification of peptides (and proteins) that have not yet been described. Additionally, there has been no previously reported proteomic analysis of T. crassiceps of this scale. In conclusion, these results are an important contribution to the inference and review of new protein sequences in databases. Furthermore, these findings establish the proteomic profile for the study of candidate biomarkers involved in the diagnosis or pathogenesis of cysticercosis.