Skip to main content

Soft Topographic Map for Clustering and Classification of Bacteria

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4723))

Abstract

In this work a new method for clustering and building a topographic representation of a bacteria taxonomy is presented. The method is based on the analysis of stable parts of the genome, the so-called “housekeeping genes”. The proposed method generates topographic maps of the bacteria taxonomy, where relations among different type strains can be visually inspected and verified. Two well known DNA alignement algorithms are applied to the genomic sequences. Topographic maps are optimized to represent the similarity among the sequences according to their evolutionary distances. The experimental analysis is carried out on 147 type strains of the Gammaprotebacteria class by means of the 16S rRNA housekeeping gene. Complete sequences of the gene have been retrieved from the NCBI public database. In the experimental tests the maps show clusters of homologous type strains and presents some singular cases potentially due to incorrect classification or erroneous annotations in the database.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Garrity, G.M., Julia, B.A., Lilburn, T.: The revised road map to the manual. In: Garrity, G.M. (ed.) Bergey’s manual of systematic bacteriology, pp. 159–187. Springer, New York (2004)

    Google Scholar 

  2. Joliffe, I.T.: Principal Component Analysis. Springer, New York (1986)

    Google Scholar 

  3. Clarridge III, J.E.: Impact of 16S rRNA Gene Sequence Analysis for Identification of Bacteria on Clinical Microbiology and Infectious Diseases. Clin. Microbiol. Rev. 17, 840–862 (2004)

    Article  Google Scholar 

  4. Drancourt, M., Bollet, C., Carlioz, A., Martelin, R., Gayral, J.-P., Raoult, D.: 16S Ribosomal DNA Sequence Analysis of a Large Collection of Environmental and Clinical Unidentifiable Bacterial Isolates. J. Clin. Microbiol 38, 3623–3630 (2000)

    Google Scholar 

  5. Drancourt, M., Berger, P., Raoult, D.: Systematic 16S rRNA Gene Sequencing of Atypical Clinical Isolates Identified 27 New Bacterial Species Associated with Humans. J. Clin. Microbiol. 42, 2197–2202 (2004)

    Article  Google Scholar 

  6. Drancourt, M., Raoult, D.: Sequence-Based Identification of New Bacteria: a Proposition for Creation of an Orphan Bacterium Repository. J. Clin. Microbiol. 43, 4311–4315 (2005)

    Article  Google Scholar 

  7. Oja, M., Somervuo, P., Kaski, S., Kohonen, T.: Clustering of human endogenous retrovirus sequences with median self-organizing map. In: WSOM 2003. Workshop on Self-Organizing Maps (9-14 September 2003)

    Google Scholar 

  8. Butte, A.J., Kohane, I.S.: Mutual information relevance networks: functional genomics clustering using pairwise entropy measurements. In: Proc. Pacific Symposium on Biocomputing, vol. 5, pp. 415–426 (2000)

    Google Scholar 

  9. Somervuo, P., Kohonen, T.: Clustering and visualization of large protein sequence databases by means of an extension of the self-organizing map. In: Discovery Science. Proceedings of the Third International Conference, pp. 76–85 (2000)

    Google Scholar 

  10. Kohonen, T., Somervuo, P.: How to make large self-organizing maps for nonvectorial data. Neural Networks 15(8-9), 945–952 (2002)

    Article  Google Scholar 

  11. Chen, Y., Reilly, K.D., Sprague, A.P., Guan, Z.: SEQOPTICS: A Protein Sequence Clustering Method. In: First International Multi-Symposiums on Computer and Computational Sciences. IMSCCS 2006, 20-24 June 2006, vol. 1, pp. 69–75 (2006)

    Google Scholar 

  12. Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: OPTICS: Ordering Points To Identify the Clustering Structure. In: SIGMOD 1999, Proceedings ACM SIGMOD International Conference on Management of Data, Philadelphia, Pennsylvania, USA, June 1-3, 1999, pp. 49–60 (1999)

    Google Scholar 

  13. Remm, M., Storm, C.E.V., Sonnhammer, E.L.L.: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. Journal of Molecular Biology 314(5), 1041–1052 (2001)

    Article  Google Scholar 

  14. Altschul, S., Gish, W., Miller, W., Myers, E., Lipman, D.: Basic local alignment search tool. J. Mol. Biol. 232, 584–599 (1993)

    Article  Google Scholar 

  15. http://www.ncbi.nlm.nih.gov/blast/fasta.shtml

  16. Dubnov, S., El-Yaniv, R., Gdalyahu, Y., Schneidman, E., Tishby, N., Yona, G.: A new nonparametric pairwise clustering algorithm based on iterative estimation of distance profiles. Machine Learning 47, 35–61 (2002)

    Article  MATH  Google Scholar 

  17. Buhmann, J., Zoller, T.: Active Learning for Hierarchical Pairwise Data Clustering. icpr, 2186 (2000)

    Google Scholar 

  18. Hofmann, T., Buhmann, J.M.: Hierarchical pairwise data clustering by mean–field annealing. In: Proceedings of ICANN 1995, NEURON IMES 1995, vol. II, pp. 197–202. EC2 & Cie (1995)

    Google Scholar 

  19. Graepel, T., Herbrich, R., Bollmann-Sdorra, P., Obermayer, K.: Classification on Pairwise Proximity Data. In: NIPS 

    Google Scholar 

  20. Hofmann, T., Buhmann, J.: Multidimensional scaling and data clustering. In: Tesauro, G., Touretzky, D.S., Leen, T.K. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 459–466. MIT Press, Cambridge, Mass (1995)

    Google Scholar 

  21. Klock, H., Buhmann, J.M.: Multidimensional scaling by deterministic annealing. In: Pelillo, M., Hancock, E.R. (eds.) EMMCVPR 1997. LNCS, vol. 1223, pp. 246–260. Springer, Heidelberg (1997)

    Google Scholar 

  22. Torgerson, W.S.: Multidimensional scaling: I. Theory and method. Psychometrika 17, 401–419 (1952)

    Article  MATH  MathSciNet  Google Scholar 

  23. Thompson, J.D., Higgins, D.G., Gibson, T.J.: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 4673–4680 (1994)

    Article  Google Scholar 

  24. Needleman, S.B., Wunsch, C.D.: J. Mol. Biol. 48, 443–453 (1970)

    Google Scholar 

  25. Jukes, T.H., Cantor, C.R.: Mammalian Protein Metabolism. In: Munro, H.N. (ed.) Evolution of Protein Molecules, pp. 21–132. Academic Press, New York (1969)

    Google Scholar 

  26. Luttrell, S.P.: A Bayesian analysis of self-organizing maps. Neural Comput. 6, 767–794 (1994)

    Article  MATH  Google Scholar 

  27. Graepel, T., Burger, M., Obermayer, K.: Self-organizing maps: generalizations and new optimization techniques. Neurocomputing 21, 173–190 (1998)

    Article  MATH  Google Scholar 

  28. Graepel, T., Obermayer, K.: A stochastic self organizing map for proximity data. Neural Computation 11, 139–155 (1999)

    Article  Google Scholar 

  29. Hofmann, T., Buhmann, J.M.: Pairwise data clustering by deterministic annealing. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 1–14 (1997)

    Article  Google Scholar 

  30. Rose, K.: Deterministic Annealing for Clustering, Compression, Classification, Regression, and Related Optimization Problems. Proc. of the IEEE 86(11), 2210–2239 (1998)

    Article  Google Scholar 

  31. Kohonen, T.: Self-organizing maps. Springer, Heidelberg (1995)

    Google Scholar 

  32. Ultsch, A.: U*-Matrix: a Tool to visualize Clusters in high dimensional Data, Technical Report No. 36, Dept. of Mathematics and Computer Science, University of Marburg, Germany (2003)

    Google Scholar 

  33. http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide

  34. Kumar, S., Tamura, K., Nei, M.: MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment. Briefings in Bioinformatics 5, 150–163 (2004)

    Article  Google Scholar 

  35. Rice, P., Longden, I., Bleasby, A.: EMBOSS: The European Molecular Biology Open Software Suite. Trends in Genetics 16(6), 276–277 (2000)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Michael R. Berthold John Shawe-Taylor Nada Lavrač

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

La Rosa, M., Di Fatta, G., Gaglio, S., Giammanco, G.M., Rizzo, R., Urso, A.M. (2007). Soft Topographic Map for Clustering and Classification of Bacteria. In: R. Berthold, M., Shawe-Taylor, J., Lavrač, N. (eds) Advances in Intelligent Data Analysis VII. IDA 2007. Lecture Notes in Computer Science, vol 4723. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74825-0_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74825-0_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74824-3

  • Online ISBN: 978-3-540-74825-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics