Sequence-based identification of Anopheles species in eastern Ethiopia
The recent finding of a typically non-African Anopheles species in eastern Ethiopia emphasizes the need for detailed species identification and characterization for effective malaria vector surveillance. Molecular approaches increase the accuracy and interoperability of vector surveillance data. To develop effective molecular assays for Anopheles identification, it is important to evaluate different genetic loci for the ability to characterize species and population level variation. Here the utility of the internal transcribed spacer 2 (ITS2) and cytochrome oxidase I (COI) loci for detection of Anopheles species from understudied regions of eastern Ethiopia was investigated.
Adult mosquitoes were collected from the Harewe locality (east) and Meki (east central) Ethiopia. PCR and Sanger sequencing were performed for portions of the ITS2 and COI loci. Both NCBI’s Basic Local Alignment Search tool (BLAST) and phylogenetic analysis using a maximum-likelihood approach were performed to identify species of Anopheles specimens.
Two species from the east Ethiopian collection, Anopheles arabiensis and Anopheles pretoriensis were identified. Analyses of ITS2 locus resulted in delineation of both species. In contrast, analysis of COI locus could not be used to delineate An. arabiensis from other taxa in Anopheles gambiae complex, but could distinguish An. pretoriensis sequences from sister taxa.
The lack of clarity from COI sequence analysis highlights potential challenges of species identification within species complexes. These results provide supporting data for the development of molecular assays for delineation of Anopheles in east Ethiopia.
KeywordsAnopheles arabiensis Malaria Phylogenetics Internal transcribed spacer 2 Cytochrome oxidase subunit I Horn of Africa
Basic Local Alignment Search Tool
cytochrome c oxidase subunit 1 gene
high-scoring segment pairs
internal transcribed spacer 2 region
National Center of Biotechnology Information
polymerase chain reaction
Over 1.5 million cases of malaria were reported in Ethiopia in 2017 . While strides to control the transmission of malaria likely contributed to the reduction in overall mortality and incidence over the last several decades , continued understanding of the mosquito vector populations are needed for improved targeted interventions . In east Ethiopia, Anopheles species are still being uncovered. A recent study revealed the presence of Anopheles stephensi, a malaria vector species typically only seen east of the Red Sea . Historically, the number of malaria cases have been low in this region, but the presence of potential malaria vectors and recent reports of sporadic malaria outbreaks warrant further investigation of vector populations.
Due to the global variation of Anopheles species and populations, it is vital to evaluate techniques specific to east Ethiopia to identify various Anopheles species [3, 5]. Once a technique is validated the diversity and distribution of various Anopheles species can be accurately determined and the proper intervention implemented. In Ethiopia, much of the mosquito surveillance and identification is conducted using mosquito morphology, e.g. [6, 7, 8, 9]. Morphological identification can be tedious when processing many specimens and comes with risk for misidentification of species not previously encountered and potentially cryptic species . Genetic analysis can be employed as a high-throughput approach to identify mosquito species. Moreover, because the DNA data are interoperable with previous DNA records and often is linked to rich metadata on location and date of isolation, one can build information about population structure and movement of vector species to improve our understanding of the spatial epidemiology of malaria. Analysis of the nuclear internal transcribed spacer 2 (ITS2) and mitochondrial cytochrome oxidase I (COI, also called CO1, COX1) loci have served as the basis of species identification assays that use allele-specific PCR amplification [10, 11], restriction enzyme digestions , or genetic sequencing-based assays [4, 5, 13]. Identifying the correct locus or loci for the basis of species or population-level analysis is important. Previous studies have highlighted how the analysis of the COI gene poses challenges for discriminating between closely related species such as those which belong to a species complex (for review see Beebe et al. ). In this study, the ITS2 and the COI loci were sequenced and analysed for species identification in Anopheles specimens collected in two sites in east Ethiopia to evaluate the potential of these loci for identification of east Ethiopian Anopheles.
Specimen collection sites, dates, GPS coordinates, and quantities
Date of collection
Number of mosquitoes
November 21, 2016
July 21, 2017
July 28, 2017
August 19, 2017
The three collections in the Harewe locality took place in November 2016, July 2017, and August 2017. The Meki collection took place in August 2017. Mosquitoes were collected indoors and outdoors from 6:00 pm to 6:00 am from each selected areas using standard CDC light traps (John W. Hock, Gainesville, FL, USA). Indoor traps were hung from the ceiling or from roof supports at the foot end of beds where people sleep at night. Outdoor collection traps were placed close to breeding habitats and the body of the trap was suspended about 1.5 m from the ground. A total of 16 CDC light traps were deployed for collection of mosquitoes in each study area.
Collected mosquitoes were kept in paper cups and brought to the field laboratory for identification. At the laboratory, mosquitoes were anesthetized with chloroform and all adult mosquitoes were counted and identified, under steromicroscopes, to at least genus level based on a morphological key .
Amplification and sequencing
Molecular analysis was performed on collected Anopheles mosquitoes to determine species and characterize the genetic variation within species. Species identification was completed using amplification of two genes: ITS2 and COI. Legs were used as DNA templates for PCR. For ITS2 amplification, PCR amplifications were performed as described previously  using the following: primers 5.8S ATCACTCGGCTCGTGGATCG and 28S ATGCTTAAATTTAGGGGGTAGTC for ITS2 . Starting reagents concentrations were as follows: 10 mM for each primer, 2X Promega GoTAQ HotStart master mix (Promega, Madison, Wisconsin), and water for a total reaction volume of 25 µl. PCR amplifications were performed with the following temperature cycling: 95 °C for 2 min, 30 cycles of 95 °C at 30 s, 50 °C at 30 s, 72 °C at 1 min, and final extension of 72 °C at 5 min. The protocol for COI was the same as ITS2 protocol except that the primers used were LCO1490F GGTCAACAAATCATAAAGATATTGG and HCO2198R TAAACTTCAGGGTGACCAAAAAATCA for COI . Temperature cycling for COI PCR was as follows: 95 °C at 1 min, 30 cycles of 95 °C for 30 s, 48 °C for 30 s, 72 °C for 1 min, with a final extension of 72 °C for 10 min. For both ITS2 and COI, eight microliters of PCR product were run on 2% agarose gel for 1 h at 100 V to confirm successful PCR products which were then cleaned using ExoSAP. PCR products were sequenced using Sanger technology with ABI BigDyeTM Terminator v3.1 chemistry (Thermofisher, Santa Clara, CA) according to manufacturer recommendations and run on a 3130 Genetic Analyzer (Thermo Fisher, Santa Clara, CA).
Sequence analysis for species identification
Sequences were cleaned and analysed using CodonCode Aligner Program V. 6.0.2 (CodonCode Corporation, Centerville, MA). ITS2 and COI sequences from Anopheles specimens were submitted as queries to the National Center for Biotechnology Information's (NCBI) web-based Basic Local Alignment Search Tool (BLAST)  against the nucleotide collection in Genbank under default parameters [max High-scoring Segment Pairs (HSP) 250, expect threshold 10, word size 28, optimized for highly similar hits, not specific to any organism]. The Anopheles subject sequences from NCBI that formed HSP with the queries were identified.
Phylogenetic analyses of ITS2 and COI were employed to search for sister taxon relationships between isolates of Anopheles from east Ethiopia and voucher specimens from Anopheles with orthologous sequence data stored in NCBI. Anopheles sequences from east Ethiopia and closest sequence hits in BLAST that had more than 85% sequence coverage were combined into datasets for COI and ITS2 separately. In some cases, there were multiple sequences from the same location and study. In these instances, only representative sequences were taken from those population sets. Alignments were created with MAFFT version 7 under default parameters  and ragged ends were trimmed using Mesquite 3.51 . Phylogenetic relationships with the Ethiopian Anopheles sequences and Anopheles sequences from NCBI were inferred using RAxML  which is based on a maximum likelihood (ML) approach. The GTRGAMMA option that uses GTR model of nucleotide substitution with gamma model of rate of heterogeneity was applied. Both 100 and 1000 replicates were completed with the strategy searching for the heuristically-best-scoring tree and bootstrap analysis in one run. Best scoring trees under ML with bootstrap values from RAxML were viewed and rooted under the outgroup criterion in FigTree  for each locus. Outgroups were chosen based on availability of sequence data for each locus, overall coverage, and its use in previous phylogenetic analyses. For the COI analysis, Anopheles implexus sequence was used as an outgroup based on sequence availability and use in similar analyses of Anopheles species . For the ITS2, a different species, Anopheles christyi, was used as an outgroup primarily because An. implexus sequence was not available. Compatible An. christyi ITS2 sequence was available and this species had been used in a similar analysis .
ITS2 sequence analysis
The ITS2 sequences were analysed for a subset of samples from each collection from Ethiopia (n = 82). All Anopheles gambiae complex specimens from this collection in Ethiopia were identical for ITS2 sequences. When the consensus ITS2 sequence from Ethiopia were searched against NCBI with BLAST, specimens from Ethiopia formed HSP with ~ 99% identity for Anopheles arabiensis.
ITS2 sequence data from eight non-Anopheles gambiae complex specimens from Ethiopia were generated and all sequences were identical. Based on BLAST against NCBI sequences, these ITS2 sequences from Ethiopia formed HSP with ~ 99% identity for Anopheles pretoriensis.
COI sequence analysis
A subset of the samples from each collection from Ethiopia were chosen for PCR amplification and sequencing of a portion the COI gene (n = 37). Sequences were cleaned and trimmed and submitted as queries to NCBI’s BLAST. Of the 37 sequences from Ethiopian specimens, 29 formed HSP with ~ 99% identity for both An. arabiensis and An. gambiae sequences in the NCBI database.
These 29 specimens for which COI sequences had ambiguous HSP with respect to species had coinciding ITS2 sequences (see above) that confirmed their identity as An. arabiensis. The number of unique COI sequences (haplotypes) was determined. COI sequences with at least 578 bp (n = 20) of readable sequence data revealed 12 different haplotypes.
The remaining eight sequences from Ethiopian specimens formed HSP with NCBI data for within 99% for An. pretoriensis sequence vouchers. These eight Ethiopian specimens had coinciding ITS2 data that confirmed their identity as An. pretoriensis. Six of these specimens had at least 611 bp of readable sequence and each had a unique COI haplotype.
Phylogenetic analysis for further species differentiation
ITS2 and COI sequence variation in Anopheles showed different levels of success in identifying Anopheles species found in east Ethiopia. Both loci confidently distinguished specimens belonging to the An. gambiae complex species from those that do not (i.e. An. pretoriensis). ITS2 provides further resolving power to differentiate An. arabiensis specimens from other An. gambiae complex specimens.
For the COI data, An. arabiensis and An. gambiae specimens formed a clade that could not be distinguished as species level subclades (Fig. 1). Previous studies have shown similar results with mtDNA [23, 24]. Anopheles arabiensis and An. gambiae are closely related species so the similarity in sequence may be due to incomplete evolutionary sorting or hybridization between species [14, 23, 24, 25] that reduces power to distinguish species. In contrast, COI sequences proved to be very useful for the species identification of An. stephensi in Ethiopia .
There was interest in whether within species differentiation could be observed for the Ethiopian specimens. While initial sequence analysis showed substantial haplotype variation in the COI locus within the Ethiopian sequences, phylogenetic analysis did not reveal any within-species differentiation, geographic or otherwise for the confirmed An. arabiensis sequences. There was some within-species differentiation for An. pretoriensis sequences. COI locus revealed some differentiation within the An. stephensi grouping . Taken together, these results indicate that COI has some utility related to within species differentiation for some Anopheles species, but not for others.
The results presented in this study confirm the presence of An. pretoriensis in the Harewe locality. This species has been observed in other parts of Ethiopia including the southwest and northern regions [26, 27]. Anopheles pretorienis has not been considered a strong vector of malaria. Indeed, blood meal analysis of two blood-fed An. pretoriensis from this study indicated only bovine feeding (data not shown). However, a recent study showed An. pretoriensis was positive for Plasmodium falciparum in Zambia, suggesting it is important to understand the distribution of this species in Ethiopia as a potential vector . Additionally, subspecies differentiation for one An. pretoriensis specimen in the COI analysis was observed (Fig. 2, bootstrap = 74%). Questions remain whether there is significant evolutionary divergence within the An. pretoriensis species and if it is associated with vector competence.
These findings have implications for the design of molecular assays to differentiate Anopheles species in east Ethiopia. ITS2 has proven to be a more useful sequence-based approach to determine species using simple BLAST analysis. Phylogenetic analysis of the COI can be useful for sequence-based analysis of some Anopheles species found in east Ethiopia, but not for members of the An. gambiae complex. One approach that may improve molecular species identification would be to combine the two loci into a single analysis. Previous studies have employed multiple loci [13, 24] and require the availability of genomic or coordinating database sequences for both loci from the same specimens representing relevant species and populations. If such sequence data are available, phylogenetic analysis that incorporates genes with various rates of evolution often provides better insight into both between and within species diversity.
In conclusion, ITS2 and COI vary in their ability to delineate Anopheles species. The results of the COI analysis of An. arabiensis specimens revealed the potential challenge of using just that locus for molecular species identification of within species complexes. The results of this study contribute to development of molecular assays for Anopheles species identification in east Ethiopia.
TC: conceptualizing the study, molecular analysis, data analysis, and manuscript preparation; SY: the collection of specimens, conceptualizing the study, and manuscript preparation; SH: molecular analysis and sequence analysis; KL: molecular analysis; and DJ: conceptualizing the project and manuscript preparation. All authors read and approved the final manuscript.
The authors would like to thank Sae Hee Choi for assistance with data management.
The authors declare that they have no competing interests.
Consent for publication
The sequences generated and/or analysed during the current study are available in NCBI’s Genbank. https://www.ncbi.nlm.nih.gov/genbank/. Genbank accession numbers for Ethiopia sequences generated in this study are MK628480-MK628508.
Ethics approval and consent to participate
Research reported in this publication was also supported by a UNC Research Opportunities Initiative grant, UNC Charlotte Multicultural Postdoctoral Fellowship from Academic Affairs, and Baylor University.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 1.WHO. World malaria report 2017. Geneva: World Health Organization; 2018.Google Scholar
- 16.Ethiopia National Malaria Indicator Survey 2015. Ethiopian Public Health Institute; 2016Google Scholar
- 17.Gillies MT, Coetzee, M. A supplement to the Anophelinae of Africa South of the Sahara. Publ S Afr Inst Med Res. 1987;55:1–43.Google Scholar
- 18.Folmer O, Black M, Hoeh W, Lutz R, Vrijenhoek R. DNA primers for amplification of mitochondrial cytochrome c oxidase subunit I from diverse metazoan invertebrates. Mol Mar Biol Biotechnol. 1994;3:294–9.Google Scholar
- 20.Maddison WP, Maddison DR. Mesquite: a modular system for evolutionary analysis. 3.51 ed. 2018.Google Scholar
- 22.Rambaut A. FigTree, a graphical viewer of phylogenetic trees. 2007.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.