Skip to main content

Clustering of Large Data Sets in the Life Sciences

  • Chapter
Soft Computing Approaches in Chemistry

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 120))

  • 285 Accesses

Summary

With the growing amount of genetic data available to scientists there is a pressing need to characterise the functions of genes. Such knowledge will enable us to better understand organisms at the molecular level and to elucidate the mechanisms by which diseases disrupt biological processes. With the advent of whole genome expression technologies such as DNA microarrays and proteomics, scientists can at last determine how the genes and proteins change their rates of expression under specific experimental conditions. The data sets generated from such studies are large and require sophisticated tools for proper analysis. In this chapter we review several techniques employed in clustering data sets of this type. Clustering can often reveal broad patterns which show that certain genes or proteins are performing common functions. This is a useful way in which one can attribute functions to newly discovered genes. A wide variety of clustering algorithms exists; we consider several of the most promising and look at how the techniques perform when tested with different types of data from gene expression and protein expression experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. Schena, D. Shalon, R. Davis and P. O. Brown, Quantitative monitoring of gene expression patterns with a cDNA microarray, Science 270: 467–470, (1995).

    Article  Google Scholar 

  2. P. O. Brown and D. Botstein, Exploring the New World of the genome with DNA microarrays, Nature Genetics 21: 33–37, (1999).

    Article  Google Scholar 

  3. M.R. Wilkins, K. L. Williams, R.D. Appel, D. F. Hochstrasser, (Eds.), Proteome Research: New Frontiers in Functional Genomics, Springer-Verlag Berlin, Heidelberg, New York, (1997).

    Google Scholar 

  4. Humphrey-Smith I., Cordwell S.J., Blackstock W.P.; Proteome Research: Complementarity and limitations with respect to the RNA and DNA worlds; Electrophoresis 18 (8): 1217–1242 (1997).

    Article  Google Scholar 

  5. D. Shipton, Autoimmune disease in rodents: control and specificity, DPhil Thesis, University of Oxford, (1999).

    Google Scholar 

  6. M. B. Eisen, P. T. Spellman, P. O. Brown and D. Botstein, Cluster Analysis and display of genome-wide expression patterns, Proc. Natl. Acad. Sci. USA, vol 95 pp 14863–14868, (1998).

    Article  Google Scholar 

  7. T. Kohenen, Self-organized formation of topologically correct feature maps, Biol. Cybern. 43: 59–69, (1982).

    Article  Google Scholar 

  8. P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. S. Lander and T. R. Golub, Interpreting patterns of gene expression with selforgansing maps: Methods and application to hematopoietic differentiation, Proc. Natl. Acad. Aci. USA, 96: 2907–2912, (1999).

    Article  Google Scholar 

  9. R. J. Cho, J. J. Campbell, E. A. Winzeler, L. Steinmetz, A. Conway, L. Wodicka, T. G. Wolfsberg, A. E. Gabrielian, D. Landsman, D. J. Lockhart, A genome-wide transcriptional analysis of the mitotic cell cycle, Mol. Cell, 2(1):65–73, (1998)

    Google Scholar 

  10. Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs ( 3rd edition ), Springer-Verlag, Berlin, Heidelberg, New York, (1996).

    MATH  Google Scholar 

  11. R. Cole, Clustering with Genetic Algorithms, MSc Thesis, Department of Computer Science, University of Western Australia, (1998).

    Google Scholar 

  12. D. R. Jones and M. A. Beltramo, Solving partitioning problems with genetic algoritms, In R. K. Belew and L. B. Booker (editors), Proceedings on the Fourth International conference on Genetic Algorithms p442–9, Morgan Kaufmann publishers, San Mateo, California, (1991).

    Google Scholar 

  13. D. E. Goldberg, Genetic Algorithms in Search, Optimisation and Machine Learning, Addison-Wesley Publishing Company, Inc., (1989).

    Google Scholar 

  14. J. Bhuyan, A combination of genetic algorithm and simulated evolution techniques for clustering, In C. J. Hwang and B. W. Hwang (editors), Proceedings of the 1995 ACM Computer Science conference. pl 27–134, The Association for Computing Machinery, Inc., (1995).

    Google Scholar 

  15. B. Fritzke, Unsupervised clustering with growing cell structures, Proc. IJCNN-91, (1991).

    Google Scholar 

  16. A. J. Walker, S. S. Cross and R. F. Harrison, Visualisation of biomedical datasets by use of growing cell structure networks: a novel classification technique, Lancet 354: 1518–21, (1999).

    Article  Google Scholar 

  17. V. Vapnik, Statistical Learning Theory, Wiley, Chichester, England, (1998).

    Google Scholar 

  18. J. C. Platt, Fast training of support vector machines using sequential minimal optimization, In Schölkopf, B., Burges, C. J. C., and Smola, A. J., editors, Advances in Kernel Methods, MIT Press, Boston, (1999).

    Google Scholar 

  19. C. J. C. Burges, A Tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, Kluwer Academic Publishers, Boston, (1998).

    Google Scholar 

  20. M. P. S. Brown, W. N. Grundy, D. Lin, N. Cristianini, C. W. Sugnet, T. S. Furey, M. Ares Jr., D. Haussier, Knowledge based analysis of microarray gene expression data by using support vector machines, Proc. Natl. Acad. Aci. USA, vol. 97: 262–267, (2000).

    Article  Google Scholar 

  21. R D Meyer and D Cook, Visualisation of data, Current Opinion in Biotechnology 2000 11: 89–96, (2000).

    Article  Google Scholar 

  22. D. Gilbert, M. Schroeder, J. van Helden, Space Explorer: Interactive visualisation of relationships between biological objects, Trends in Biotechnology 18(12): 487–493, (2000).

    Google Scholar 

  23. M Gerstein and R Jansen, The current excitement in bioinformatics — analysis of whole genome expression data: how does it relate to protein structure and function?, Current Opinion in Structural Biology 10: 574–584, (2000).

    Article  Google Scholar 

  24. M. Q. Zhang, Large-scale gene expression data analysis: a new challenge to computational biologists, Genome Research 9: 681–688, (1999).

    Google Scholar 

  25. V. R. Iyer, M. B. Eisen, D. T. Ross, G. Schuler, T. Moore, J. C. F. Lee, J. M. Trent, L. M. Staudt, J. Hudson, M.S. Boguski, D. Lashkari, D Shalon, D. Botstein, P. Brown, The transcriptional program in the response of human fibroblasts to serum, Science 283: 83–87, (1999).

    Article  Google Scholar 

  26. U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack and A. J. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and colon tissues probed by oligonucleotide arrays, Proc. Natl. Acad. Aci. USA, vol. 96: 6745–6750, (1999).

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Patel, K., Cartwright, H.M. (2003). Clustering of Large Data Sets in the Life Sciences. In: Cartwright, H.M., Sztandera, L.M. (eds) Soft Computing Approaches in Chemistry. Studies in Fuzziness and Soft Computing, vol 120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-36213-5_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-36213-5_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-53507-9

  • Online ISBN: 978-3-540-36213-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics