Skip to main content

A New Clustering Approach, Based on the Estimation of the Probability Density Function, for Gene Expression Data

  • Conference paper
Classification, Clustering, and Data Analysis

Abstract

Many techniques have already been suggested for handling and analyzing the large and high-dimensional data sets produced by newly developed gene expression experiments. These techniques include supervised classification and unsupervised agglomerative or hierarchical clustering techniques. Here, we present an alternative approach that does not make assumption on the shape, size and volumes of the clusters. The technique is based on the estimation of the probability density function (pdf). Once the pdf is estimated, with the Parzen technique (with the right amount of smoothing), the parameter space is partitioned according to methods inherited from image processing, namely the skeleton by influence zones and the watershed. We show some advantages of this suggested approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • ALON, U., BARKAI, N., NOTTERMAN, D.A., GISH, K., YBARRA, S., MACK, D., and LEVINE, A.J. (1999): Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA, 96, 6745–6750.

    Article  Google Scholar 

  • BEN-DOR, A., SHAMIR, R., and YAKHINI, Z. (1999): Clustering gene expression patterns. Journal of Computational Biology, 6, 281–297.

    Article  Google Scholar 

  • BONNET, N. (1998): Multivariate statistical methods for the analysis of microscope image series. Journal of Microscopy, 190, 2–18.

    Article  Google Scholar 

  • BONNET, N. (2000): Artificial intelligence and pattern recognition techniques in microscope image processing and analysis. Advances in Imaging and Electron Physics, 114, 1–77.

    Article  MathSciNet  Google Scholar 

  • BONNET, N., HERBIN, M., and VAUTROT, P. (1997): Une méthode de classification non supervisée ne faisant pas d’hypothèse sur la forme des classes: application à la segmentation d’images multivariables. Cinquièmes Rencontres de la Société Francophone de Classification. Lyon. Proceedings pp 151–154.

    Google Scholar 

  • BONNET, N., and CUTRONA, J. (2001): Improvement of unsupervised multi—component image segmentation through fuzzy relaxation. LASTED International Conference on Visualization, Imaging and Image Processing (VIIP’2001) Marbella ( Spain ). Acta Press: 477–482.

    Google Scholar 

  • BROWN, M., GRUNDY, W., LIN, D., CRISTIANINI, N., SUGNET, C., FUREY, T., ARES, M., and HAUSSLER, D. (2000): Knowledge—based analysis of microarray gene expression data by using support vector machines. Proc. Nat. Acad. Sci. USA, 97, 262–267.

    Article  Google Scholar 

  • CHENG, Y. (1995): Mean shift, mode seeking and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 790–799.

    Article  Google Scholar 

  • COMANICIU, D., and MEER, P. (2002): Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence. In press.

    Google Scholar 

  • CUTRONA, J., BONNET, N., and HERBIN, M. (2002): A new fuzzy clustering technique based on pdf estimation. Information Processing and Management of Uncertainty (IPMU’2002). Submitted.

    Google Scholar 

  • EISEN, M.B., SPELLMAN, P.T., BROWN, P.O., and BOTSTEIN, D. (1998): Cluster analysis and display of genome—wide expression patterns. Proc. Natl. Acad. Sci. USA, 95, 14863–14868.

    Article  Google Scholar 

  • FELLENBERG, K., HAUSER, N.C., BRORS, B., NEUTZNER, A., HOHEISEL, J.D., and VINGRON, M. (2001): Correspondence analysis applied to microarray data. Proc. Nat. Acad. Sci. USA, 98, 10780–10786.

    Article  Google Scholar 

  • GUERRERO, A., BONNET, N., MARCO, S., and CARRASCOSA, J. (2000): Comparative study of methods for the automatic classification of macromolecular image sets: preliminary investigations with realistic simulations. Proc. SPIE - Applications of Artificial Neural Networks in Image Processing V, 3962, 92103.

    Google Scholar 

  • HARTUV, E., SCHMITT, A., LANGE, J., MEIER-EWERT, S., LEHRACH, H., and SHAMIR, R. (1999): An algorithm for clustering cDNAs for gene expression. Third Int. Conf. on Computational Molecular Biology (RECOMB’99). ACM Press, pp. 188–197.

    Google Scholar 

  • HERBIN, M., BONNET, N., and VAUTROT, P. (1996): A clustering method based on the estimation of the probability density function and on the skeleton by influence zones. Pattern Recognition Letters, 22, 1557–1568.

    Article  Google Scholar 

  • HERBIN, M., BONNET, N., and VAUTROT, P. (2001): Estimation of the number of clusters and influence zones. Pattern Recognition Letters, 17, 1141–1150.

    Article  Google Scholar 

  • HERRERO, J., VALENCIA, A., and DOPAZO, J. (2001): A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics, 17, 126–136.

    Article  Google Scholar 

  • LEBART, L., MORINEAU, A., and WARWICK, K.M. (1984): Multivariate Descriptive Statistical Analysis. Wiley & Sons, New York.

    MATH  Google Scholar 

  • MJOLSNESS, E., NO, R.C., and WOLD, B. (1999): Multi—parent clustering algorithms for large scale gene expression analysis. Technical report JPL-ICTR-995.

    Google Scholar 

  • SHERF U. et al. (2000): A gene expression database for the molecular pharmacology of cancer. Nature Genetics, 24, 236–244.

    Article  Google Scholar 

  • TAMAYO, P., SLONIM, D., MESIROV, J., ZHU, Q., KITAREEWAN, S., DMITROWSKY, E., LANDER, E., and GOLUB, T. (1999): Interpreting patterns of gene expression with self—organizing maps: methods and application to hematopoietic differentiation. Proc. Nat. Acad. Sci. USA, 96, 2907–2912.

    Article  Google Scholar 

  • TAVAZOIE, S., HUGHES, J.D., CAMPBELL, M.J., CHO, R.J. and CHURCH, G.M. (1999): Systematic determination of genetic network architecture. Nature Genetics, 22, 281–285.

    Article  Google Scholar 

  • TIBSHIRANI, R., HASTIE T., NARASIMHAN, B., EISEN, M, SHERLOCK, G., BROWN, P., and BOTSTEIN, D. (2001): Exploratory screening of genes and clusters from microarray experiments. Internal report University of Stanford at http://www-stat.stanford.edu.

    Google Scholar 

  • WALL, M.E., DYCK, P.A., and BRETTIN, T.S. (2001): SVDMAN—singular value decomposition analysis of microarray data. Bioinformatics, 17, 566–568.

    Article  Google Scholar 

  • WEINSHTEIN, J.N. et al. (1997): An information—intensive approach to the molec-ular pharmacology of cancer. Science, 275, 343–349.

    Article  Google Scholar 

  • WEN, X., FUHRMAN, S., MICHAELS, G.S., CARR, D.B., SMITH, S., BARKER, J.L., and SOMOGYI, R. (1998): Large—scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. USA, 95, 334–339.

    Article  Google Scholar 

  • YEUNG, K.Y., FRALEY, C., MURUA, A., RAFTERY, A.E., and RUZZO, W.L. (2001): Model—based clustering and data transformations for gene expression data. Bioinformatics, 17, 977–987.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bonnet, N., Herbin, M., Cutrona, J., Zahm, JM. (2002). A New Clustering Approach, Based on the Estimation of the Probability Density Function, for Gene Expression Data. In: Jajuga, K., Sokołowski, A., Bock, HH. (eds) Classification, Clustering, and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-56181-8_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-56181-8_3

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43691-1

  • Online ISBN: 978-3-642-56181-8

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics