Skip to main content

Binning Sequences Using Very Sparse Labels Within a Metagenome

  • Living reference work entry
  • First Online:
Encyclopedia of Metagenomics

Synonyms

Binning using seeded GSOM

Definition

Binning is the process to categorize sequences into different groups based on compositional features or sequence similarity or both of them.

Introduction

As metagenomes are typically composed of sequences from various species, how to categorize these sequences into groups can radically affect the accuracy and sensitivity of downstream analyses. Thus, the sequence binning is a critical step in the early process of metagenomic analysis pipeline. Several binning methods employing different strategies have been proposed. For example, BLAST homology search helps to identify sequences of related species; kmer (Sandberg et al. 2001), self-organizing map (SOM) (Abe et al. 2003), and TETRA (Teeling et al. 2004b) cluster sequences by similar compositional features, i.e., oligonucleotide frequency; PhyloPythia (McHardy et al. 2007), a support vector machine implementation, categorizes sequences based on both pattern similarity and oligonucleotide...

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  • Abe T, Kanaya S, Kinouchi M, Ichiba Y, Kozuki T, Ikemura T. Informatics for unveiling hidden genome signatures. Genome Res. 2003;13:693–702.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Alahakoon D, Halgamuge SK, Srinivasan B. Dynamic self-organizing maps with controlled growth for knowledge discovery. IEEE Trans Neural Netw. 2000;11:601–14.

    Article  CAS  PubMed  Google Scholar 

  • Aparicio S, Chapman J, Stupka E, Putnam N, Chia JM, Dehal P, Christoffels A, Rash S, Hoon S, Smit A, et al. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science. 2002;297:1301–10.

    Article  CAS  PubMed  Google Scholar 

  • Batzoglou S, Jaffe DB, Stanley K, Butler J, Gnerre S, Mauceli E, Berger B, Mesirov JP, Lander ES. ARACHNE: a whole-genome shotgun assembler. Genome Res. 2002;12:177–89.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Chan CK, Hsu AL, Halgamuge SK, Tang SL. Binning sequences using very sparse labels within a metagenome. BMC Bioinformatics. 2008a;9:215.

    Article  PubMed Central  PubMed  Google Scholar 

  • Chan CKK, Hsu AL, Tang SL, Halgamuge SK. Using growing self-organising maps to improve the binning process in environmental whole-genome shotgun sequencing. J Biomed Biotechnol. 2008b;2008(513701):p 10. doi:10.1155/2008/513701

    Google Scholar 

  • Clarke J, Wu HC, Jayasinghe L, Patel A, Reid S, Bayley H. Continuous base identification for single-molecule nanopore DNA sequencing. Nat Nanotechnol. 2009;4:265–70.

    Article  CAS  PubMed  Google Scholar 

  • Green P. Documentation for PHRAP. 1996; http://bozeman.mbt.washington.edu/

  • Hsu AL, Halgamuge SK. Enhancement of topology preservation and hierarchical dynamic self-organising maps for data visualisation. Int J Approx Reason. 2003;32:259–79.

    Article  Google Scholar 

  • Hsu AL, Tang S-L, Halgamuge SK. An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data. Bioinformatics. 2003;19:2131–40.

    Article  CAS  PubMed  Google Scholar 

  • Hubert L, Arabie P. Comparing partitions. J Classif. 1985;2:193–218.

    Article  Google Scholar 

  • Kohonen T. The self-organizing map. Proc IEEE. 1990;78:1464–80.

    Article  Google Scholar 

  • Kohonen T. Analysis of processes and large data sets by a self-organizing method. Intell Process Manuf Mater. 1999;1:27–36.

    Google Scholar 

  • Mavromatis K, Ivanova N, Barry K, Shapiro H, Goltsman E, McHardy AC, Rigoutsos I, Salamov A, Korzeniewski F, Land M, et al. Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Methods. 2007;4:495–500.

    Article  CAS  PubMed  Google Scholar 

  • McHardy AC, Martin HG, Tsirigos A, Hugenholtz P, Rigoutsos I. Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods. 2007;4:63–72.

    Article  CAS  PubMed  Google Scholar 

  • Sandberg R, Winberg G, Branden CI, Kaske A, Ernberg I, Coster J. Capturing whole-genome characteristics in short sequences using a naive Bayesian classifier. Genome Res. 2001;11:1404–9.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Teeling H, Meyerdierks A, Bauer M, Amann R, Glockner FO. Application of tetranucleotide frequencies for the assignment of genomic fragments. Environ Microbiol. 2004a;6:938–47.

    Article  CAS  PubMed  Google Scholar 

  • Teeling H, Waldmann J, Lombardot T, Bauer M, Glockner FO. TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics. 2004b;5:163.

    Article  PubMed Central  PubMed  Google Scholar 

  • Van Rijsbergen CJ. Information retrieval. London: Butterworths; 1979.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sen-Lin Tang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this entry

Cite this entry

Tseng, CH., Chan, CK.K., Hsu, A.L., Halgamuge, S.K., Tang, SL. (2014). Binning Sequences Using Very Sparse Labels Within a Metagenome. In: Nelson, K. (eds) Encyclopedia of Metagenomics. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6418-1_750-5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4614-6418-1_750-5

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, New York, NY

  • Online ISBN: 978-1-4614-6418-1

  • eBook Packages: Springer Reference Biomedicine and Life SciencesReference Module Biomedical and Life Sciences

Publish with us

Policies and ethics