Advertisement

Generating Reliable Genome Assemblies of Intestinal Protozoans from Clinical Samples for the Purpose of Biomarker Discovery

  • Arthur MorrisEmail author
  • Justin Pachebat
  • Graeme Tyson
  • Guy Robinson
  • Rachel Chalmers
  • Martin Swain
Conference paper
  • 34 Downloads
Part of the Communications in Computer and Information Science book series (CCIS, volume 1211)

Abstract

Protozoan parasites that cause diarrhoeal diseases in humans take a massive toll on global public health annually, with over 200,000 deaths in children of less than two years old in Asia and Sub-Saharan Africa being attributed to Cryptosporidium alone. They can, in particular, be a serious health risk for immuno-incompetent individuals. Genomics can be a valuable asset in helping combat these parasites, but there are still problems associated with performing whole genome sequencing from human stool samples. In particular there are issues associated with highly uneven sequence coverage of these parasite genomes, which may result in critical errors in the genome assemblies produced using a number of popular assemblers. We have developed an approach using the Gini statistic to better characterise depth of sequencing coverage. Furthermore, we have explored the sequencing biases resulting from Whole Genome Amplification approaches, and have attempted to relate those to the Gini statistic. We discuss these issues in two parasite genera: Cryptosporidium and Cyclospora, and perform an analysis of the sequencing coverage depth over these genomes. Finally we present our strategy to generate reliable genome assemblies of sufficient quality to facilitate discovery of new Variable Number Tandem Repeat (VNTR) biomarkers.

Keywords

Cryptosporidium Genome assembly Biomarker discovery 

References

  1. 1.
    Abrahamsen, M.S., et al.: Complete genome sequence of the Apicomplexan, Cryptosporidium parvum. Science 304(5669), 441–445 (2004).  https://doi.org/10.1126/science.1094786. http://www.ncbi.nlm.nih.gov/pubmed/15044751CrossRefGoogle Scholar
  2. 2.
    Assefa, S., Keane, T.M., Otto, T.D., Newbold, C., Berriman, M.: ABACAS: algorithm-based automatic contiguation of assembled sequences. Bioinformatics 25(15), 1968–1969 (2009).  https://doi.org/10.1093/bioinformatics/btp347CrossRefGoogle Scholar
  3. 3.
    Bankevich, A., et al.: SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19(5), 455–477 (2012).  https://doi.org/10.1089/cmb.2012.0021MathSciNetCrossRefGoogle Scholar
  4. 4.
    Benjamini, Y., Speed, T.P.: Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. 40(10), 1–14 (2012).  https://doi.org/10.1093/nar/gks001CrossRefGoogle Scholar
  5. 5.
    Benson, G.: Tandem repeats finder: a program to analyse DNA sequences. Nucleic Acids Res. 27(2), 573–578 (1999)CrossRefGoogle Scholar
  6. 6.
    Chalmers, R.M., et al.: Suitability of loci for multiple-locus variable-number of tandem-repeats analysis of Cryptosporidium parvum for inter-laboratory surveillance and outbreak investigations. Parasitology 144(1), 37–47 (2017).  https://doi.org/10.1017/S0031182015001766CrossRefGoogle Scholar
  7. 7.
    Hadfield, S.J., et al.: Generation of whole genome sequences of new Cryptosporidium hominis and Cryptosporidium parvum isolates directly from stool samples. BMC Genom. 16, 650 (2015).  https://doi.org/10.1186/s12864-015-1805-9. https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-015-1805-9CrossRefGoogle Scholar
  8. 8.
    Hosono, S., et al.: Unbiased whole-genome amplification directly from clinical samples. Genome Res. 13(5), 954–964 (2003).  https://doi.org/10.1101/gr.816903CrossRefGoogle Scholar
  9. 9.
    Ifeonu, O.O., et al.: Annotated draft genome sequences of three species of Cryptosporidium: Cryptosporidium meleagridis isolate UKMEL1, C. baileyi isolate TAMU-09Q1 and C. hominis isolates TU502 2012 and UKH1. Pathogens Dis. (2016).  https://doi.org/10.1093/femspd/ftw080
  10. 10.
    Jones, E., Oliphant, T., Peterson, P., Al, E.: SciPy: open sourcescientific tools for Python (2001)Google Scholar
  11. 11.
    Krzywinski, M., et al.: Circos. Genome Res. 19(9), 1639–1645 (2009).  https://doi.org/10.1186/1471-2105-14-244. http://genome.cshlp.org/content/19/9/1639.shortCrossRefGoogle Scholar
  12. 12.
    Kurtz, S., et al.: Versatile and open software for comparing large genomes. Genome Biol. 5(2), R12 (2004).  https://doi.org/10.1186/gb-2004-5-2-r12. http://genomebiology.com/2004/5/2/R12CrossRefGoogle Scholar
  13. 13.
    Lasken, R.S., Egholm, M.: Whole genome amplification: abundant supplies of DNA from precious samples or clinical specimens. Trends Biotechnol. 21(12), 531–535 (2003).  https://doi.org/10.1016/j.tibtech.2003.09.010CrossRefGoogle Scholar
  14. 14.
    Li, H., Durbin, R.: Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25(14), 1754–1760 (2009).  https://doi.org/10.1093/bioinformatics/btp324CrossRefGoogle Scholar
  15. 15.
    Marques, D.F., et al.: Cyclosporiasis in travellers returning to the United Kingdom from Mexico in summer 2017: lessons from the recent past to inform the future. Eurosurveillance (2017).  https://doi.org/10.2807/1560-7917.ES.2017.22.32.30592CrossRefGoogle Scholar
  16. 16.
    Monfort, P.: Convergence of EU regions - measures and evolution. Eur. Union Europa(6), 1–32 (2008)Google Scholar
  17. 17.
    Morris, A.V., Pachebat, J., Robinson, G., Chalmers, R., Swain, M.: Identifying and resolving genome misassembly issues important for biomarker discovery in the protozoan parasite, cryptosporidium. In: Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies - Volume 3: BIOINFORMATICS, vol. 3, pp. 90–100. SciTePress (2019).  https://doi.org/10.5220/0007397200900100
  18. 18.
    Otto, T.D., Dillon, G.P., Degrave, W.S., Berriman, M.: RATT: rapid annotation transfer tool. Nucleic Acids Res. 39(9), 1–7 (2011).  https://doi.org/10.1093/nar/gkq1268CrossRefGoogle Scholar
  19. 19.
    Otto, T.D., Sanders, M., Berriman, M., Newbold, C.: Iterative correction of reference Nucleotides (iCORN) using second generation sequencing technology. Bioinformatics 26(14), 1704–1707 (2010).  https://doi.org/10.1093/bioinformatics/btq269CrossRefGoogle Scholar
  20. 20.
    Peng, Y., Leung, H.C.M., Yiu, S.M., Chin, F.Y.L.: IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28(11), 1420–1428 (2012).  https://doi.org/10.1093/bioinformatics/bts174CrossRefGoogle Scholar
  21. 21.
    Perez-Cordon, G., Robinson, G., Nader, J., Chalmers, R.M.: Discovery of new variable number tandem repeat loci in multiple Cryptosporidium parvum genomes for the surveillance and investigation of outbreaks of cryptosporidiosis. Exp. Parasitol. 169(August), 119–128 (2016).  https://doi.org/10.1016/j.exppara.2016.08.003CrossRefGoogle Scholar
  22. 22.
    Puiu, D., Enomoto, S., Buck, G.A., Abrahamsen, M.S., Kissinger, J.C.: CryptoDB: the Cryptosporidium genome resource. Nucleic Acids Res. 32(90001), 329D–331 (2004).  https://doi.org/10.1093/nar/gkh050. https://academic.oup.com/nar/article-lookup/doi/10.1093/nar/gkh050CrossRefGoogle Scholar
  23. 23.
    Qvarnstrom, Y., et al.: Draft genome sequences from Cyclospora cayetanensis oocysts purified from a human stool sample. Genome Announc. (2015).  https://doi.org/10.1128/genomeA.01324-15CrossRefGoogle Scholar
  24. 24.
    Sow, S.O., et al.: The Burden of Cryptosporidium diarrheal disease among children \(<\)24 months of age in moderate/high mortality regions of Sub-Saharan Africa and South Asia, utilizing data from the Global Enteric Multicenter Study (GEMS). PLoS Negl. Trop. Dis. 10(5), 1–20 (2016).  https://doi.org/10.1371/journal.pntd.0004729CrossRefGoogle Scholar
  25. 25.
    Swain, M.T., Tsai, I.J., Assefa, S.A., Newbold, C., Berriman, M., Otto, T.D.: A post-assembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs. Nat. Protocols 7(7), 1260–84 (2012).  https://doi.org/10.1038/nprot.2012.068. http://www.nature.com/doifinder/10.1038/nprot.2012.068%5CnCrossRefGoogle Scholar
  26. 26.
    Thorvaldsdóttir, H., Robinson, J.T., Mesirov, J.P.: Integrative genomics viewer (IGV): high-performance genomics data visualization and exploration. Brief. Bioinf. 14(2), 178–192 (2013).  https://doi.org/10.1093/bib/bbs017CrossRefGoogle Scholar
  27. 27.
    Troell, K., et al.: Cryptosporidium as a testbed for single cell genome characterization of unicellular eukaryotes. BMC Genom. 17(1), 1–12 (2016).  https://doi.org/10.1186/s12864-016-2815-y. http://dx.doi.org/10.1186/s12864-016-2815-yCrossRefGoogle Scholar
  28. 28.
    Tsai, I.J., Otto, T.D., Berriman, M.: Improving draft assemblies by iterative mapping and assembly of short reads to eliminate gaps. Genome Biol. 11(4), R41 (2010).  https://doi.org/10.1186/gb-2010-11-4-r41CrossRefGoogle Scholar
  29. 29.
    Xu, P., et al.: The Genome of Cryptosporidium hominis. Lett. Nat. 431(October), 1107–1112 (2004).  https://doi.org/10.1038/nature02990CrossRefGoogle Scholar
  30. 30.
    Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18(5), 821–829 (2008).  https://doi.org/10.1101/gr.074492.107CrossRefGoogle Scholar
  31. 31.
    Zhang, L., Cui, X., Schmitt, K., Hubert, R., Navidi, W., Arnheim, N.: Whole genome amplification from a single cell: implications for genetic analysis. Proc. Natl. Acad. Sci. 89(13), 5847–5851 (2006).  https://doi.org/10.1073/pnas.89.13.5847CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Arthur Morris
    • 1
    Email author
  • Justin Pachebat
    • 1
  • Graeme Tyson
    • 1
  • Guy Robinson
    • 2
  • Rachel Chalmers
    • 2
  • Martin Swain
    • 1
  1. 1.IBERSAberystwyth UniversityAberystwythUK
  2. 2.Cryptosporidium Reference UnitPublic Health WalesSwanseaUK

Personalised recommendations