Generating Reliable Genome Assemblies of Intestinal Protozoans from Clinical Samples for the Purpose of Biomarker Discovery

  • Arthur MorrisEmail author
  • Justin Pachebat
  • Graeme Tyson
  • Guy Robinson
  • Rachel Chalmers
  • Martin Swain
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1211)


Protozoan parasites that cause diarrhoeal diseases in humans take a massive toll on global public health annually, with over 200,000 deaths in children of less than two years old in Asia and Sub-Saharan Africa being attributed to Cryptosporidium alone. They can, in particular, be a serious health risk for immuno-incompetent individuals. Genomics can be a valuable asset in helping combat these parasites, but there are still problems associated with performing whole genome sequencing from human stool samples. In particular there are issues associated with highly uneven sequence coverage of these parasite genomes, which may result in critical errors in the genome assemblies produced using a number of popular assemblers. We have developed an approach using the Gini statistic to better characterise depth of sequencing coverage. Furthermore, we have explored the sequencing biases resulting from Whole Genome Amplification approaches, and have attempted to relate those to the Gini statistic. We discuss these issues in two parasite genera: Cryptosporidium and Cyclospora, and perform an analysis of the sequencing coverage depth over these genomes. Finally we present our strategy to generate reliable genome assemblies of sufficient quality to facilitate discovery of new Variable Number Tandem Repeat (VNTR) biomarkers.


Cryptosporidium Genome assembly Biomarker discovery 


  1. 1.
    Abrahamsen, M.S., et al.: Complete genome sequence of the Apicomplexan, Cryptosporidium parvum. Science 304(5669), 441–445 (2004). Scholar
  2. 2.
    Assefa, S., Keane, T.M., Otto, T.D., Newbold, C., Berriman, M.: ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics 25(15), 1968–1969 (2009). Scholar
  3. 3.
    Bankevich, A., et al.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455–477 (2012). Scholar
  4. 4.
    Benjamini, Y., Speed, T.P.: Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. 40(10), 1–14 (2012). Scholar
  5. 5.
    Benson, G.: Tandem repeats finder: a program to analyse DNA sequences. Nucleic Acids Res. 27(2), 573–578 (1999)CrossRefGoogle Scholar
  6. 6.
    Chalmers, R.M., et al.: Suitability of loci for multiple-locus variable-number of tandem-repeats analysis of Cryptosporidium parvum for inter-laboratory surveillance and outbreak investigations. Parasitology 144(1), 37–47 (2017). Scholar
  7. 7.
    Hadfield, S.J., et al.: Generation of whole genome sequences of new Cryptosporidium hominis and Cryptosporidium parvum isolates directly from stool samples. BMC Genom. 16, 650 (2015). Scholar
  8. 8.
    Hosono, S., et al.: Unbiased whole-genome amplification directly from clinical samples. Genome Res. 13(5), 954–964 (2003). Scholar
  9. 9.
    Ifeonu, O.O., et al.: Annotated draft genome sequences of three species of Cryptosporidium: Cryptosporidium meleagridis isolate UKMEL1, C. baileyi isolate TAMU-09Q1 and C. hominis isolates TU502 2012 and UKH1. Pathogens Dis. (2016).
  10. 10.
    Jones, E., Oliphant, T., Peterson, P., Al, E.: SciPy: open sourcescientific tools for Python (2001)Google Scholar
  11. 11.
    Krzywinski, M., et al.: Circos. Genome Res. 19(9), 1639–1645 (2009). Scholar
  12. 12.
    Kurtz, S., et al.: Versatile and open software for comparing large genomes. Genome Biol. 5(2), R12 (2004). Scholar
  13. 13.
    Lasken, R.S., Egholm, M.: Whole genome amplification: abundant supplies of DNA from precious samples or clinical specimens. Trends Biotechnol. 21(12), 531–535 (2003). Scholar
  14. 14.
    Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009). Scholar
  15. 15.
    Marques, D.F., et al.: Cyclosporiasis in travellers returning to the United Kingdom from Mexico in summer 2017: lessons from the recent past to inform the future. Eurosurveillance (2017). Scholar
  16. 16.
    Monfort, P.: Convergence of EU regions - measures and evolution. Eur. Union Europa(6), 1–32 (2008)Google Scholar
  17. 17.
    Morris, A.V., Pachebat, J., Robinson, G., Chalmers, R., Swain, M.: Identifying and resolving genome misassembly issues important for biomarker discovery in the protozoan parasite, cryptosporidium. In: Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, vol. 3, pp. 90–100. SciTePress (2019).
  18. 18.
    Otto, T.D., Dillon, G.P., Degrave, W.S., Berriman, M.: RATT: rapid annotation transfer tool. Nucleic Acids Res. 39(9), 1–7 (2011). Scholar
  19. 19.
    Otto, T.D., Sanders, M., Berriman, M., Newbold, C.: Iterative correction of reference Nucleotides (iCORN) using second generation sequencing technology. Bioinformatics 26(14), 1704–1707 (2010). Scholar
  20. 20.
    Peng, Y., Leung, H.C.M., Yiu, S.M., Chin, F.Y.L.: IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28(11), 1420–1428 (2012). Scholar
  21. 21.
    Perez-Cordon, G., Robinson, G., Nader, J., Chalmers, R.M.: Discovery of new variable number tandem repeat loci in multiple Cryptosporidium parvum genomes for the surveillance and investigation of outbreaks of cryptosporidiosis. Exp. Parasitol. 169(August), 119–128 (2016). Scholar
  22. 22.
    Puiu, D., Enomoto, S., Buck, G.A., Abrahamsen, M.S., Kissinger, J.C.: CryptoDB: the Cryptosporidium genome resource. Nucleic Acids Res. 32(90001), 329D–331 (2004). Scholar
  23. 23.
    Qvarnstrom, Y., et al.: Draft genome sequences from Cyclospora cayetanensis oocysts purified from a human stool sample. Genome Announc. (2015). Scholar
  24. 24.
    Sow, S.O., et al.: The Burden of Cryptosporidium diarrheal disease among children \(<\)24 months of age in moderate/high mortality regions of Sub-Saharan Africa and South Asia, utilizing data from the Global Enteric Multicenter Study (GEMS). PLoS Negl. Trop. Dis. 10(5), 1–20 (2016). Scholar
  25. 25.
    Swain, M.T., Tsai, I.J., Assefa, S.A., Newbold, C., Berriman, M., Otto, T.D.: A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs. Nat. Protocols 7(7), 1260–84 (2012). Scholar
  26. 26.
    Thorvaldsdóttir, H., Robinson, J.T., Mesirov, J.P.: Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinf. 14(2), 178–192 (2013). Scholar
  27. 27.
    Troell, K., et al.: Cryptosporidium as a testbed for single cell genome characterization of unicellular eukaryotes. BMC Genom. 17(1), 1–12 (2016). Scholar
  28. 28.
    Tsai, I.J., Otto, T.D., Berriman, M.: Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biol. 11(4), R41 (2010). Scholar
  29. 29.
    Xu, P., et al.: The Genome of Cryptosporidium hominis. Lett. Nat. 431(October), 1107–1112 (2004). Scholar
  30. 30.
    Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18(5), 821–829 (2008). Scholar
  31. 31.
    Zhang, L., Cui, X., Schmitt, K., Hubert, R., Navidi, W., Arnheim, N.: Whole genome amplification from a single cell: implications for genetic analysis. Proc. Natl. Acad. Sci. 89(13), 5847–5851 (2006). Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Arthur Morris
    • 1
    Email author
  • Justin Pachebat
    • 1
  • Graeme Tyson
    • 1
  • Guy Robinson
    • 2
  • Rachel Chalmers
    • 2
  • Martin Swain
    • 1
  1. 1.IBERSAberystwyth UniversityAberystwythUK
  2. 2.Cryptosporidium Reference UnitPublic Health WalesSwanseaUK

Personalised recommendations