Clustering of Large Data Sets in the Life Sciences

Patel, Ketan; Cartwright, Hugh M.

doi:10.1007/978-3-540-36213-5_2

Ketan Patel⁴ &
Hugh M. Cartwright⁴

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 120))

285 Accesses

Summary

With the growing amount of genetic data available to scientists there is a pressing need to characterise the functions of genes. Such knowledge will enable us to better understand organisms at the molecular level and to elucidate the mechanisms by which diseases disrupt biological processes. With the advent of whole genome expression technologies such as DNA microarrays and proteomics, scientists can at last determine how the genes and proteins change their rates of expression under specific experimental conditions. The data sets generated from such studies are large and require sophisticated tools for proper analysis. In this chapter we review several techniques employed in clustering data sets of this type. Clustering can often reveal broad patterns which show that certain genes or proteins are performing common functions. This is a useful way in which one can attribute functions to newly discovered genes. A wide variety of clustering algorithms exists; we consider several of the most promising and look at how the techniques perform when tested with different types of data from gene expression and protein expression experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

M. Schena, D. Shalon, R. Davis and P. O. Brown, Quantitative monitoring of gene expression patterns with a cDNA microarray, Science 270: 467–470, (1995).
Article Google Scholar
P. O. Brown and D. Botstein, Exploring the New World of the genome with DNA microarrays, Nature Genetics 21: 33–37, (1999).
Article Google Scholar
M.R. Wilkins, K. L. Williams, R.D. Appel, D. F. Hochstrasser, (Eds.), Proteome Research: New Frontiers in Functional Genomics, Springer-Verlag Berlin, Heidelberg, New York, (1997).
Google Scholar
Humphrey-Smith I., Cordwell S.J., Blackstock W.P.; Proteome Research: Complementarity and limitations with respect to the RNA and DNA worlds; Electrophoresis 18 (8): 1217–1242 (1997).
Article Google Scholar
D. Shipton, Autoimmune disease in rodents: control and specificity, DPhil Thesis, University of Oxford, (1999).
Google Scholar
M. B. Eisen, P. T. Spellman, P. O. Brown and D. Botstein, Cluster Analysis and display of genome-wide expression patterns, Proc. Natl. Acad. Sci. USA, vol 95 pp 14863–14868, (1998).
Article Google Scholar
T. Kohenen, Self-organized formation of topologically correct feature maps, Biol. Cybern. 43: 59–69, (1982).
Article Google Scholar
P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. S. Lander and T. R. Golub, Interpreting patterns of gene expression with selforgansing maps: Methods and application to hematopoietic differentiation, Proc. Natl. Acad. Aci. USA, 96: 2907–2912, (1999).
Article Google Scholar
R. J. Cho, J. J. Campbell, E. A. Winzeler, L. Steinmetz, A. Conway, L. Wodicka, T. G. Wolfsberg, A. E. Gabrielian, D. Landsman, D. J. Lockhart, A genome-wide transcriptional analysis of the mitotic cell cycle, Mol. Cell, 2(1):65–73, (1998)
Google Scholar
Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs ( 3rd edition ), Springer-Verlag, Berlin, Heidelberg, New York, (1996).
MATH Google Scholar
R. Cole, Clustering with Genetic Algorithms, MSc Thesis, Department of Computer Science, University of Western Australia, (1998).
Google Scholar
D. R. Jones and M. A. Beltramo, Solving partitioning problems with genetic algoritms, In R. K. Belew and L. B. Booker (editors), Proceedings on the Fourth International conference on Genetic Algorithms p442–9, Morgan Kaufmann publishers, San Mateo, California, (1991).
Google Scholar
D. E. Goldberg, Genetic Algorithms in Search, Optimisation and Machine Learning, Addison-Wesley Publishing Company, Inc., (1989).
Google Scholar
J. Bhuyan, A combination of genetic algorithm and simulated evolution techniques for clustering, In C. J. Hwang and B. W. Hwang (editors), Proceedings of the 1995 ACM Computer Science conference. pl 27–134, The Association for Computing Machinery, Inc., (1995).
Google Scholar
B. Fritzke, Unsupervised clustering with growing cell structures, Proc. IJCNN-91, (1991).
Google Scholar
A. J. Walker, S. S. Cross and R. F. Harrison, Visualisation of biomedical datasets by use of growing cell structure networks: a novel classification technique, Lancet 354: 1518–21, (1999).
Article Google Scholar
V. Vapnik, Statistical Learning Theory, Wiley, Chichester, England, (1998).
Google Scholar
J. C. Platt, Fast training of support vector machines using sequential minimal optimization, In Schölkopf, B., Burges, C. J. C., and Smola, A. J., editors, Advances in Kernel Methods, MIT Press, Boston, (1999).
Google Scholar
C. J. C. Burges, A Tutorial on support vector machines for pattern recognition, Data Mining and Knowledge Discovery, Kluwer Academic Publishers, Boston, (1998).
Google Scholar
M. P. S. Brown, W. N. Grundy, D. Lin, N. Cristianini, C. W. Sugnet, T. S. Furey, M. Ares Jr., D. Haussier, Knowledge based analysis of microarray gene expression data by using support vector machines, Proc. Natl. Acad. Aci. USA, vol. 97: 262–267, (2000).
Article Google Scholar
R D Meyer and D Cook, Visualisation of data, Current Opinion in Biotechnology 2000 11: 89–96, (2000).
Article Google Scholar
D. Gilbert, M. Schroeder, J. van Helden, Space Explorer: Interactive visualisation of relationships between biological objects, Trends in Biotechnology 18(12): 487–493, (2000).
Google Scholar
M Gerstein and R Jansen, The current excitement in bioinformatics — analysis of whole genome expression data: how does it relate to protein structure and function?, Current Opinion in Structural Biology 10: 574–584, (2000).
Article Google Scholar
M. Q. Zhang, Large-scale gene expression data analysis: a new challenge to computational biologists, Genome Research 9: 681–688, (1999).
Google Scholar
V. R. Iyer, M. B. Eisen, D. T. Ross, G. Schuler, T. Moore, J. C. F. Lee, J. M. Trent, L. M. Staudt, J. Hudson, M.S. Boguski, D. Lashkari, D Shalon, D. Botstein, P. Brown, The transcriptional program in the response of human fibroblasts to serum, Science 283: 83–87, (1999).
Article Google Scholar
U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack and A. J. Levine, Broad patterns of gene expression revealed by clustering analysis of tumor and colon tissues probed by oligonucleotide arrays, Proc. Natl. Acad. Aci. USA, vol. 96: 6745–6750, (1999).
Article Google Scholar

Download references

Author information

Authors and Affiliations

Physical and Theoretical Chemistry Laboratory, University of Oxford, South Parks Road, Oxford, OX1 3QZ, England
Ketan Patel & Hugh M. Cartwright

Authors

Ketan Patel
View author publications
You can also search for this author in PubMed Google Scholar
Hugh M. Cartwright
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Physical and Theoretical Chemistry, Oxford University, South Parks Road, OX1 3QZ, Oxford, UK
Hugh M. Cartwright
CIS Department, Philadelphia University, 19144, Philadelphia, PA, USA
Les M. Sztandera

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Patel, K., Cartwright, H.M. (2003). Clustering of Large Data Sets in the Life Sciences. In: Cartwright, H.M., Sztandera, L.M. (eds) Soft Computing Approaches in Chemistry. Studies in Fuzziness and Soft Computing, vol 120. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-36213-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-36213-5_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-53507-9
Online ISBN: 978-3-540-36213-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics