Abstract
As the result of the decoding of large numbers of genome sequences, numerous proteins whose functions cannot be identified by the homology search of amino acid sequences have accumulated and remain of no use to science and industry. Establishment of novel prediction methods for protein function is urgently needed. We previously developed Batch-Learning SOM (BL-SOM) for genome informatics; here, we developed BL-SOM to predict functions of proteins on the basis of similarity in oligopeptide composition of proteins. Oligopeptides are component parts of a protein and involved in formation of its functional motifs and structural parts. Concerning oligopeptide frequencies in 110,000 proteins classified into 2853 function-known COGs (clusters of orthologous groups), BL-SOM could faithfully reproduce the COG classifications, and therefore, proteins whose functions have been unidentified with homology searches could be related to function-known proteins. BL-SOM was applied to predict protein functions of large numbers of proteins obtained from metagenome analyses.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Kohonen, T.: Self-organized formation of topologically correct feature maps. Biol. Cybern. 43, 59–69 (1982)
Kohonen, T.: The self-organizing map. Proc. IEEE 78, 1464–1480 (1990)
Kohonen, T., Oja, E., Simula, O., Visa, A., Kangas, J.: Engineering applications of the self-organizing map. Proc. IEEE 84, 1358–1384 (1996)
Ferran, E.A., Pflugfelder, B., Ferrara, P.: Self-organized neural maps of human protein sequences. Protein Sci. 3, 507–521 (1994)
Kanaya, S., Kudo, Y., Abe, T., Okazaki, T., Carlos, D.C., Ikemura, T.: Gene classification by self-organization mapping of codon usage in bacteria with completely sequenced genome. Genome Inform. 9, 369–371 (1998)
Kanaya, S., Kinouchi, M., Abe, T., Kudo, Y., Yamada, Y., Nishi, T., Mori, H., Ikemura, T.: Analysis of codon usage diversity of bacterial genes with a self-organizing map (SOM): characterization of horizontally transferred genes with emphasis on the E. coli O157 genome. Gene. 276, 89–99 (2001)
Abe, T., Kanaya, S., Kinouchi, M., Ichiba, Y., Kozuki, T., Ikemura, T.: A novel bioinformatic strategy for unveiling hidden genome signatures of eukaryotes: Self-organizing map of oligonucleotide frequency. Genome Inform. 13, 12–20 (2002)
Abe, T., Kanaya, S., Kinouchi, M., Ichiba, Y., Kozuki, T., Ikemura, T.: Informatics for unveiling hidden genome signatures. Genome Res. 13, 693–702 (2003)
Abe, T., Kozuki, T., Kosaka, Y., Fukushima, S., Nakagawa, S., Ikemura, T.: Self-organizing map reveals sequence characteristics of 90 prokaryotic and eukaryotic genomes on a single map. In: WSOM 2003, pp. 95–100 (2003)
Abe, T., Sugawara, H., Kinouchi, M., Kanaya, S., Matsuura, Y., Tokutaka, H., Ikemura, T.: A large-scale Self-Organizing Map (SOM) constructed with the Earth Simulator unveils sequence characteristics of a wide range of eukaryotic genomes. In: WSOM 2005, pp. 187–194 (2005)
Abe, T., Sugawara, H., Kinouchi, M., Kanaya, S., Ikemura, T.: A large-scale Self-Organizing Map (SOM) unveils sequence characteristics of a wide range of eukaryote genomes. Gene. 365, 27–34 (2006)
Abe, T., Sugawara, H., Kanaya, S., Ikemura, T.: Sequences from almost all prokaryotic, eukaryotic, and viral genomes available could be classified according to genomes on a large-scale Self-Organizing Map constructed with the Earth Simulator. J. Earth Simulator 6, 17–23 (2006)
Abe, T., Sugawara, H., Kinouchi, M., Kanaya, S., Ikemura, T.: Novel phylogenetic studies of genomic sequence fragments derived from uncultured microbe mixtures in environmental and clinical samples. DNA Res. 12, 281–290 (2005)
Hayashi, H., Abe, T., Sakamoto, M., et al.: Direct cloning of genes encoding novel xylanases from human gut. Can. J. Microbiol. 51, 251–259 (2005)
Uchiyama, T., Abe, T., Ikemura, T., Watanabe, K.: Substrate-induced gene-expression screening of environmental metagenome libraries for isolation of catabolic genes. Nature Biotech. 23, 88–93 (2005)
Abe, T., Sugawara, H., Kanaya, S., Ikemura, T.: A novel bioinformatics tool for phylogenetic classification of genomic sequence fragments derived from mixed genomes of environmental uncultured microbes. Polar Bioscience 20, 103–112 (2006)
Tatsusov, R.L., Koonin, E.V., Lipman, D.J.: A genomic perspective on protein families. Science 278, 631–637 (1997)
Amann, R.I., Ludwig, W., Schleifer, K.H.: Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol. Rev. 59, 143–169 (1995)
Hugenholtz, P., Pace, N.R.: Identifying microbial diversity in the natural environment: a molecular phylogenetic approach. Trends Biotechnol. 14, 190–197 (1996)
Rondon, M.R., August, P.R., Bettermann, A.D., et al.: Cloning the soil metagenome: a strategy for accessing the genetic and functional diversity of uncultured microorganisms. Appl. Environ. Microbiol. 66, 2541–2547 (2000)
Venter, J.C., et al.: Environmental genome shotgun sequencing of the Sargasso Sea. Science 304, 66–74 (2004)
Abe, T., Ikemura, T.: A large-scale batch-learning Self-Organizing Maps for function prediction of poorly characterized proteins progressively accumulating in sequence databases. Annual Report of the Earth Simulator, April 2006 - March 2007, pp. 247–251 (2007)
Abe, T., Ikemura, T.: A large-scale genomics and proteomics analyses conducted by the Earth Simulator. Annual Report of the Earth Simulator, April 2007 - March 2008, pp. 245–249 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Abe, T., Kanaya, S., Ikemura, T. (2009). Batch-Learning Self-Organizing Map for Predicting Functions of Poorly-Characterized Proteins Massively Accumulated. In: Príncipe, J.C., Miikkulainen, R. (eds) Advances in Self-Organizing Maps. WSOM 2009. Lecture Notes in Computer Science, vol 5629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02397-2_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-02397-2_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02396-5
Online ISBN: 978-3-642-02397-2
eBook Packages: Computer ScienceComputer Science (R0)