Consensus Clustering for Binning Metagenome Sequences

Bonet, Isis; Escobar, Adriana; Mesa-Múnera, Andrea; Alzate, Juan Fernando

doi:10.1007/978-3-319-62428-0_23

Isis Bonet¹⁵,
Adriana Escobar¹⁵,
Andrea Mesa-Múnera¹⁵ &
…
Juan Fernando Alzate¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10062))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

1133 Accesses

Abstract

The advances in next-generation sequencing technologies allow researchers to sequence in parallel millions of microbial organisms directly from environmental samples. The result of this “shotgun” sequencing are many short DNA fragments of different organisms, which constitute the basis for the field of metagenomics. Although there are big databases with known microbial DNA that allow us classify some fragments, these databases only represent around 1% of all the species existing in the entire world. For this reason, it is important to use unsupervised methods to group the fragments with the same taxonomic levels. In this paper we focus on the binning step in metagenomics in an unsupervised way. We propose a consensus clustering method based on an iterative clustering process using different lengths of sequences in the databases and a mixture of distance as approach to finding the consensus clustering. The final performance clustering is evaluated according with the purity of clusters. The results achieved by the proposed method outperforms results obtained by simple methods and iterative methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Riesenfeld, C.S., Schloss, P.D., Handelsman, J.: Metagenomics: genomic analysis of microbial communities. Annu. Rev. Genet. 38, 525–552 (2004)
Article Google Scholar
Oulas, A., et al.: Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies. In: Bioinform. Biol. Insights. pp. 75–88 (2015)
Google Scholar
Council, N.R.: The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet. The National Academies Press, Washington (2007)
Google Scholar
Chan, C.-K., et al.: Binning sequences using very sparse labels within a metagenome. BMC Bioinf. 9(1), 215 (2008)
Article Google Scholar
Camacho, C., et al.: BLAST + : architecture and applications. BMC Bioinf. 10(1), 421 (2009)
Article Google Scholar
Huson, D.H., et al.: MEGAN analysis of metagenomic data. Genome Res. 17(3), 377–386 (2007)
Article MathSciNet Google Scholar
McHardy, A.C., et al.: Accurate phylogenetic classification of variable-length DNA fragments. Nat. Methods 4(1), 63–72 (2007)
Article Google Scholar
Diaz, N.N., et al.: TACOA – Taxonomic classification of environmental genomic fragments using a kernelized nearest neighbor approach. BMC Bioinf. 10, 56 (2009)
Article Google Scholar
Rosen, G.L., Reichenberger, E., Rosenfeld, A.: NBC: The Naïve Bayes classification tool webserver for taxonomic classification of metagenomic reads. Bioinf. 27(1), 127–129 (2010)
Article Google Scholar
Mande, S.S., Mohammed, M.H., Ghosh, T.S.: Classification of metagenomic sequences: methods and challenges. Brief Bioinf. 13(6), 669–681 (2012)
Article Google Scholar
Teeling, H., et al.: TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinf. 5(1), 163 (2004)
Article Google Scholar
Reddy, R.M., Mohammed, M.H., Mande, S.S.: MetaCAA: A clustering-aided methodology for efficient assembly of metagenomic datasets. Genomics 103(2–3), 161–168 (2014)
Article Google Scholar
Abe, T., et al.: Informatics for unveiling hidden genome signatures. Genome Res. 13(4), 693–702 (2003)
Article Google Scholar
Chan, C.K.K., et al.: Using growing self-organising maps to improve the binning process in environmental whole-genome shotgun sequencing. J. Biomed. Biotechnol. 2008 (2008)
Google Scholar
Nasser, S., Breland, A., Harris Jr., F.C., Nicolescu, M.: University of Nevada Reno. A Fuzzy Classifier to Taxonomically Group DNA Fragments within a Metagenome (2016). http://www.cse.unr.edu/~monica/Research/Publications/nafips2008.pdf
Leung, H.C., et al.: A robust and accurate binning algorithm for metagenomic sequences with arbitrary species abundance ratio. Bioinformatics 27(11), 1489–1495 (2011)
Article Google Scholar
Wang, Y., et al.: MetaCluster-TA: taxonomic annotation for metagenomic data based on assembly-assisted binning. BMC Genom. 15(1), 1–9 (2014)
Article MathSciNet Google Scholar
Siegel, K., et al.: Puzzlecluster: a novel unsupervised clustering algorithm for binning DNA fragments in metagenomics (2016)
Google Scholar
Wu, Y.W., Ye, Y.: A novel abundance-based algorithm for binning metagenomic sequences using l-tuples. J. Comput. Biol. 18(3), 523–534 (2011)
Article MathSciNet Google Scholar
Brady, A., Salzberg, S.L.: Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat. Methods 6(9), 673–676 (2009)
Article Google Scholar
Li, W., et al.: Ultrafast clustering algorithms for metagenomic sequence analysis. Brief. Bioinf. 13(6), 656–668 (2012)
Article Google Scholar
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. Statistics, Vol. 1, pp. 281–297. University of California Press: Berkeley, California (1967)
Google Scholar
Arthur, D., Vassilvitskii, S.: K-Means ++: The Advantages of Careful Seeding. In: 8th Annual ACM-SIAM Symposium on Discrete Algorithms. New Orleans (2007)
Google Scholar
Witten, I., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2^nd Edition. In: Jim Gray, M.R. (ed). . Morgan Kaufmann, San Francisco, 525 (2005)
Google Scholar
Bonet, I., Montoya, W., Mesa-Múnera, A., Alzate, J.F.: Iterative clustering method for metagenomic sequences. In: Prasath, R., O’Reilly, P., Kathirvalavakumar, T. (eds.) MIKE 2014. LNCS, vol. 8891, pp. 145–154. Springer, Cham (2014). doi:10.1007/978-3-319-13817-6_15
Google Scholar

Download references

Author information

Authors and Affiliations

Universidad Escuela de Ingeniería de Antioquia, Envigado, Antioquia, Colombia
Isis Bonet, Adriana Escobar & Andrea Mesa-Múnera
Centro Nacional de Secuenciación Genómica-CNSG, Facultad de Medicina, Universidad de Antioquia, Medellín, Colombia
Juan Fernando Alzate

Authors

Isis Bonet
View author publications
You can also search for this author in PubMed Google Scholar
Adriana Escobar
View author publications
You can also search for this author in PubMed Google Scholar
Andrea Mesa-Múnera
View author publications
You can also search for this author in PubMed Google Scholar
Juan Fernando Alzate
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Isis Bonet .

Editor information

Editors and Affiliations

Instituto Politécnico Nacional, Unidad Profesional Interdisciplinaria en Ingeniería y Tecnologías Avanzadas, México DF, Mexico
Obdulia Pichardo-Lagunas
INFOTEC Aguascalientes, Aguascalientes, Mexico
Sabino Miranda-Jiménez

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bonet, I., Escobar, A., Mesa-Múnera, A., Alzate, J.F. (2017). Consensus Clustering for Binning Metagenome Sequences. In: Pichardo-Lagunas, O., Miranda-Jiménez, S. (eds) Advances in Soft Computing. MICAI 2016. Lecture Notes in Computer Science(), vol 10062. Springer, Cham. https://doi.org/10.1007/978-3-319-62428-0_23

Download citation

DOI: https://doi.org/10.1007/978-3-319-62428-0_23
Published: 02 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62427-3
Online ISBN: 978-3-319-62428-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics