Synonyms
Binning using seeded GSOM
Definition
Binning is the process to categorize sequences into different groups based on compositional features or sequence similarity or both of them.
Introduction
As metagenomes are typically composed of sequences from various species, how to categorize these sequences into groups can radically affect the accuracy and sensitivity of downstream analyses. Thus, the sequence binning is a critical step in the early process of metagenomic analysis pipeline. Several binning methods employing different strategies have been proposed. For example, BLAST homology search helps to identify sequences of related species; kmer (Sandberg et al. 2001), self-organizing map (SOM) (Abe et al. 2003), and TETRA (Teeling et al. 2004b) cluster sequences by similar compositional features, i.e., oligonucleotide frequency; PhyloPythia (McHardy et al. 2007), a support vector machine implementation, categorizes sequences based on both pattern similarity and oligonucleotide...
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abe T, Kanaya S, Kinouchi M, Ichiba Y, Kozuki T, Ikemura T. Informatics for unveiling hidden genome signatures. Genome Res. 2003;13:693–702.
Alahakoon D, Halgamuge SK, Srinivasan B. Dynamic self-organizing maps with controlled growth for knowledge discovery. IEEE Trans Neural Netw. 2000;11:601–14.
Aparicio S, Chapman J, Stupka E, Putnam N, Chia JM, Dehal P, Christoffels A, Rash S, Hoon S, Smit A, et al. Whole-genome shotgun assembly and analysis of the genome of Fugu rubripes. Science. 2002;297:1301–10.
Batzoglou S, Jaffe DB, Stanley K, Butler J, Gnerre S, Mauceli E, Berger B, Mesirov JP, Lander ES. ARACHNE: a whole-genome shotgun assembler. Genome Res. 2002;12:177–89.
Chan CK, Hsu AL, Halgamuge SK, Tang SL. Binning sequences using very sparse labels within a metagenome. BMC Bioinformatics. 2008a;9:215.
Chan CKK, Hsu AL, Tang SL, Halgamuge SK. Using growing self-organising maps to improve the binning process in environmental whole-genome shotgun sequencing. J Biomed Biotechnol. 2008b;2008(513701):p 10. doi:10.1155/2008/513701
Clarke J, Wu HC, Jayasinghe L, Patel A, Reid S, Bayley H. Continuous base identification for single-molecule nanopore DNA sequencing. Nat Nanotechnol. 2009;4:265–70.
Green P. Documentation for PHRAP. 1996; http://bozeman.mbt.washington.edu/
Hsu AL, Halgamuge SK. Enhancement of topology preservation and hierarchical dynamic self-organising maps for data visualisation. Int J Approx Reason. 2003;32:259–79.
Hsu AL, Tang S-L, Halgamuge SK. An unsupervised hierarchical dynamic self-organizing approach to cancer class discovery and marker gene identification in microarray data. Bioinformatics. 2003;19:2131–40.
Hubert L, Arabie P. Comparing partitions. J Classif. 1985;2:193–218.
Kohonen T. The self-organizing map. Proc IEEE. 1990;78:1464–80.
Kohonen T. Analysis of processes and large data sets by a self-organizing method. Intell Process Manuf Mater. 1999;1:27–36.
Mavromatis K, Ivanova N, Barry K, Shapiro H, Goltsman E, McHardy AC, Rigoutsos I, Salamov A, Korzeniewski F, Land M, et al. Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nat Methods. 2007;4:495–500.
McHardy AC, Martin HG, Tsirigos A, Hugenholtz P, Rigoutsos I. Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods. 2007;4:63–72.
Sandberg R, Winberg G, Branden CI, Kaske A, Ernberg I, Coster J. Capturing whole-genome characteristics in short sequences using a naive Bayesian classifier. Genome Res. 2001;11:1404–9.
Teeling H, Meyerdierks A, Bauer M, Amann R, Glockner FO. Application of tetranucleotide frequencies for the assignment of genomic fragments. Environ Microbiol. 2004a;6:938–47.
Teeling H, Waldmann J, Lombardot T, Bauer M, Glockner FO. TETRA: a web-service and a stand-alone program for the analysis and comparison of tetranucleotide usage patterns in DNA sequences. BMC Bioinformatics. 2004b;5:163.
Van Rijsbergen CJ. Information retrieval. London: Butterworths; 1979.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer Science+Business Media New York
About this entry
Cite this entry
Tseng, CH., Chan, CK.K., Hsu, A.L., Halgamuge, S.K., Tang, SL. (2015). Binning Sequences Using Very Sparse Labels Within a Metagenome. In: Nelson, K.E. (eds) Encyclopedia of Metagenomics. Springer, Boston, MA. https://doi.org/10.1007/978-1-4899-7478-5_750
Download citation
DOI: https://doi.org/10.1007/978-1-4899-7478-5_750
Published:
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4899-7477-8
Online ISBN: 978-1-4899-7478-5
eBook Packages: Biomedical and Life SciencesReference Module Biomedical and Life Sciences