Abstract
In this work a new method for clustering and building a topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called “housekeeping genes”. The proposed method generates topographic maps of the bacteria taxonomy, where relations among different type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and presents some singular cases potentially due to incorrect classification or erroneous annotations in the database.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Garrity, G.M., Julia, B.A., Lilburn, T.: The revised road map to the manual. In: Garrity, G.M. (ed.) Bergey’s manual of systematic bacteriology, pp. 159–187. Springer, New York (2004)
Joliffe, I.T.: Principal Component Analysis. Springer, New York (1986)
Clarridge III, J.E.: Impact of 16S rRNA Gene Sequence Analysis for Identification of Bacteria on Clinical Microbiology and Infectious Diseases. Clin. Microbiol. Rev. 17, 840–862 (2004)
Drancourt, M., Bollet, C., Carlioz, A., Martelin, R., Gayral, J.-P., Raoult, D.: 16S Ribosomal DNA Sequence Analysis of a Large Collection of Environmental and Clinical Unidentifiable Bacterial Isolates. J. Clin. Microbiol 38, 3623–3630 (2000)
Drancourt, M., Berger, P., Raoult, D.: Systematic 16S rRNA Gene Sequencing of Atypical Clinical Isolates Identified 27 New Bacterial Species Associated with Humans. J. Clin. Microbiol. 42, 2197–2202 (2004)
Drancourt, M., Raoult, D.: Sequence-Based Identification of New Bacteria: a Proposition for Creation of an Orphan Bacterium Repository. J. Clin. Microbiol. 43, 4311–4315 (2005)
Oja, M., Somervuo, P., Kaski, S., Kohonen, T.: Clustering of human endogenous retrovirus sequences with median self-organizing map. In: WSOM 2003. Workshop on Self-Organizing Maps (9-14 September 2003)
Butte, A.J., Kohane, I.S.: Mutual information relevance networks: functional genomics clustering using pairwise entropy measurements. In: Proc. Pacific Symposium on Biocomputing, vol. 5, pp. 415–426 (2000)
Somervuo, P., Kohonen, T.: Clustering and visualization of large protein sequence databases by means of an extension of the self-organizing map. In: Discovery Science. Proceedings of the Third International Conference, pp. 76–85 (2000)
Kohonen, T., Somervuo, P.: How to make large self-organizing maps for nonvectorial data. Neural Networks 15(8-9), 945–952 (2002)
Chen, Y., Reilly, K.D., Sprague, A.P., Guan, Z.: SEQOPTICS: A Protein Sequence Clustering Method. In: First International Multi-Symposiums on Computer and Computational Sciences. IMSCCS 2006, 20-24 June 2006, vol. 1, pp. 69–75 (2006)
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: Ordering Points To Identify the Clustering Structure. In: SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, June 1-3, 1999, pp. 49–60 (1999)
Remm, M., Storm, C.E.V., Sonnhammer, E.L.L.: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. Journal of Molecular Biology 314(5), 1041–1052 (2001)
Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 232, 584–599 (1993)
Dubnov, S., El-Yaniv, R., Gdalyahu, Y., Schneidman, E., Tishby, N., Yona, G.: A new nonparametric pairwise clustering algorithm based on iterative estimation of distance profiles. Machine Learning 47, 35–61 (2002)
Buhmann, J., Zoller, T.: Active Learning for Hierarchical Pairwise Data Clustering. icpr, 2186 (2000)
Hofmann, T., Buhmann, J.M.: Hierarchical pairwise data clustering by mean–field annealing. In: Proceedings of ICANN 1995, NEURON IMES 1995, vol. II, pp. 197–202. EC2 & Cie (1995)
Graepel, T., Herbrich, R., Bollmann-Sdorra, P., Obermayer, K.: Classification on Pairwise Proximity Data. In: NIPS
Hofmann, T., Buhmann, J.: Multidimensional scaling and data clustering. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 459–466. MIT Press, Cambridge, Mass (1995)
Klock, H., Buhmann, J.M.: Multidimensional scaling by deterministic annealing. In: Pelillo, M., Hancock, E.R. (eds.) EMMCVPR 1997. LNCS, vol. 1223, pp. 246–260. Springer, Heidelberg (1997)
Torgerson, W.S.: Multidimensional scaling: I. Theory and method. Psychometrika 17, 401–419 (1952)
Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4680 (1994)
Needleman, S.B., Wunsch, C.D.: J. Mol. Biol. 48, 443–453 (1970)
Jukes, T.H., Cantor, C.R.: Mammalian Protein Metabolism. In: Munro, H.N. (ed.) Evolution of Protein Molecules, pp. 21–132. Academic Press, New York (1969)
Luttrell, S.P.: A Bayesian analysis of self-organizing maps. Neural Comput. 6, 767–794 (1994)
Graepel, T., Burger, M., Obermayer, K.: Self-organizing maps: generalizations and new optimization techniques. Neurocomputing 21, 173–190 (1998)
Graepel, T., Obermayer, K.: A stochastic self organizing map for proximity data. Neural Computation 11, 139–155 (1999)
Hofmann, T., Buhmann, J.M.: Pairwise data clustering by deterministic annealing. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 1–14 (1997)
Rose, K.: Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems. Proc. of the IEEE 86(11), 2210–2239 (1998)
Kohonen, T.: Self-organizing maps. Springer, Heidelberg (1995)
Ultsch, A.: U*-Matrix: a Tool to visualize Clusters in high dimensional Data, Technical Report No. 36, Dept. of Mathematics and Computer Science, University of Marburg, Germany (2003)
Kumar, S., Tamura, K., Nei, M.: MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Briefings in Bioinformatics 5, 150–163 (2004)
Rice, P., Longden, I., Bleasby, A.: EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics 16(6), 276–277 (2000)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
La Rosa, M., Di Fatta, G., Gaglio, S., Giammanco, G.M., Rizzo, R., Urso, A.M. (2007). Soft Topographic Map for Clustering and Classification of Bacteria. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds) Advances in Intelligent Data Analysis VII. IDA 2007. Lecture Notes in Computer Science, vol 4723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74825-0_30
Download citation
DOI: https://doi.org/10.1007/978-3-540-74825-0_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74824-3
Online ISBN: 978-3-540-74825-0
eBook Packages: Computer ScienceComputer Science (R0)