Skip to main content

Clustering Metagenome Short Reads Using Weighted Proteins

  • Conference paper
Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics (EvoBIO 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5483))

Abstract

This paper proposes a new knowledge-based method for clustering metagenome short reads. The method incorporates biological knowledge in the clustering process, by means of a list of proteins associated to each read. These proteins are chosen from a reference proteome database according to their similarity with the given read, as evaluated by BLAST. We introduce a scoring function for weighting the resulting proteins and use them for clustering reads. The resulting clustering algorithm performs automatic selection of the number of clusters, and generates possibly overlapping clusters of reads. Experiments on real-life benchmark datasets show the effectiveness of the method for reducing the size of a metagenome dataset while maintaining a high accuracy of organism content.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J.: Basic local alignment search tool. J. Molecular Biology 215(3), 403–410 (1990)

    Article  Google Scholar 

  2. Chan, C.K., Hsu, A.L., Tang, S., Halgamuge, S.K.: Using growing self-organising maps to improve the binning process in environmental whole-genome shotgun sequencing. Journal of Biomedicine and Biotechnology (2008)

    Google Scholar 

  3. Dalevi, D., Ivanova, N.N., Mavromatis, K., Hooper, S.D., Szeto, E., Hugenholtz, P., Kyrpides, N.C., Markowitz, V.M.: Annotation of metagenome short reads using proxygenes. Bioinformatics 24(16) (2008)

    Google Scholar 

  4. Mavromatis, K., et al.: Use of simulated data sets to evaluate the fidelity of metagenomic processing methods. Nature Methods 4(6), 495–500 (2007)

    Article  Google Scholar 

  5. Yooseph, S., et al.: The sorcerer ii global ocean sampling expedition: Expanding the universe of protein families. PLoS Biol. 5(3), 432–466 (2007)

    Article  Google Scholar 

  6. Hernandez, D., Francois, P., Farinelli, L., Osteras, M., Schrenzel, J.: De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Research 18(5), 802–809 (2008)

    Article  Google Scholar 

  7. Korf, I., Yandell, M., Bedell, J.: BLAST. O’Reilly & Associates, Inc., Sebastopol (2003)

    Google Scholar 

  8. Li, W., Wooley, J.C., Godzik, A.: Probing metagenomics by rapid cluster analysis of very large datasets. PLoS ONE 3(10) (2008)

    Google Scholar 

  9. Madden, T.: The BLAST Sequence Analysis Tool, ch. 16. Bethesda, MD (2002)

    Google Scholar 

  10. Marchiori, E., Steenbeek, A.: An evolutionary algorithm for large scale set covering problems with application to airline crew scheduling. In: Oates, M.J., Lanzi, P.L., Li, Y., Cagnoni, S., Corne, D.W., Fogarty, T.C., Poli, R., Smith, G.D. (eds.) EvoIASP 2000, EvoWorkshops 2000, EvoFlight 2000, EvoSCONDI 2000, EvoSTIM 2000, EvoTEL 2000, and EvoROB/EvoRobot 2000. LNCS, vol. 1803, pp. 367–381. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  11. McHardy, A.C., Rigoutsos, I.: What’s in the mix: phylogenetic classification of metagenome sequence samples. Current Opinion in Microbiology 10, 499–503 (2007)

    Article  Google Scholar 

  12. Pluim, J.P.W., Antoine Maintz, J.B., Viergever, M.A.: Image registration by maximization of combined mutual information and gradient information. IEEE Trans. Med. Imaging 19(8), 809–814 (2000)

    Article  Google Scholar 

  13. Pop, M., Phillippy, A., Delcher, A.L., Salzberg, S.L.: Comparative genome assembly. Briefings in Bioinformatics 5(3), 237–248 (2004)

    Article  Google Scholar 

  14. Raes, J., Foerstner, K.U., Bork, P.: Get the most out of your metagenome: computational analysis of environmental sequence data. Current Opinion in Microbiology 10, 490–498 (2007)

    Article  Google Scholar 

  15. Zhao, W., Fanning, M.L., Lane, T.: Efficient RNAi-based gene family knockdown via set cover optimization. Artificial Intelligence in Medicine 35(1-2), 61–73 (2005)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Folino, G., Gori, F., Jetten, M.S.M., Marchiori, E. (2009). Clustering Metagenome Short Reads Using Weighted Proteins. In: Pizzuti, C., Ritchie, M.D., Giacobini, M. (eds) Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics. EvoBIO 2009. Lecture Notes in Computer Science, vol 5483. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01184-9_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-01184-9_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-01183-2

  • Online ISBN: 978-3-642-01184-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics