A New Clustering Approach, Based on the Estimation of the Probability Density Function, for Gene Expression Data

  • Noël Bonnet
  • Michel Herbin
  • Jérôme Cutrona
  • Jean-Marie Zahm
Conference paper
Part of the Studies in Classification, Data Analysis, and Knowledge Organization book series (STUDIES CLASS)


Many techniques have already been suggested for handling and analyzing the large and high-dimensional data sets produced by newly developed gene expression experiments. These techniques include supervised classification and unsupervised agglomerative or hierarchical clustering techniques. Here, we present an alternative approach that does not make assumption on the shape, size and volumes of the clusters. The technique is based on the estimation of the probability density function (pdf). Once the pdf is estimated, with the Parzen technique (with the right amount of smoothing), the parameter space is partitioned according to methods inherited from image processing, namely the skeleton by influence zones and the watershed. We show some advantages of this suggested approach.


Support Vector Machine Probability Density Function Dimensionality Reduction Factorial Axis Influence Zone 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. ALON, U., BARKAI, N., NOTTERMAN, D.A., GISH, K., YBARRA, S., MACK, D., and LEVINE, A.J. (1999): Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA, 96, 6745–6750.CrossRefGoogle Scholar
  2. BEN-DOR, A., SHAMIR, R., and YAKHINI, Z. (1999): Clustering gene expression patterns. Journal of Computational Biology, 6, 281–297.CrossRefGoogle Scholar
  3. BONNET, N. (1998): Multivariate statistical methods for the analysis of microscope image series. Journal of Microscopy, 190, 2–18.CrossRefGoogle Scholar
  4. BONNET, N. (2000): Artificial intelligence and pattern recognition techniques in microscope image processing and analysis. Advances in Imaging and Electron Physics, 114, 1–77.MathSciNetCrossRefGoogle Scholar
  5. BONNET, N., HERBIN, M., and VAUTROT, P. (1997): Une méthode de classification non supervisée ne faisant pas d’hypothèse sur la forme des classes: application à la segmentation d’images multivariables. Cinquièmes Rencontres de la Société Francophone de Classification. Lyon. Proceedings pp 151–154.Google Scholar
  6. BONNET, N., and CUTRONA, J. (2001): Improvement of unsupervised multi—component image segmentation through fuzzy relaxation. LASTED International Conference on Visualization, Imaging and Image Processing (VIIP’2001) Marbella ( Spain ). Acta Press: 477–482.Google Scholar
  7. BROWN, M., GRUNDY, W., LIN, D., CRISTIANINI, N., SUGNET, C., FUREY, T., ARES, M., and HAUSSLER, D. (2000): Knowledge—based analysis of microarray gene expression data by using support vector machines. Proc. Nat. Acad. Sci. USA, 97, 262–267.CrossRefGoogle Scholar
  8. CHENG, Y. (1995): Mean shift, mode seeking and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 790–799.CrossRefGoogle Scholar
  9. COMANICIU, D., and MEER, P. (2002): Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence. In press.Google Scholar
  10. CUTRONA, J., BONNET, N., and HERBIN, M. (2002): A new fuzzy clustering technique based on pdf estimation. Information Processing and Management of Uncertainty (IPMU’2002). Submitted.Google Scholar
  11. EISEN, M.B., SPELLMAN, P.T., BROWN, P.O., and BOTSTEIN, D. (1998): Cluster analysis and display of genome—wide expression patterns. Proc. Natl. Acad. Sci. USA, 95, 14863–14868.CrossRefGoogle Scholar
  12. FELLENBERG, K., HAUSER, N.C., BRORS, B., NEUTZNER, A., HOHEISEL, J.D., and VINGRON, M. (2001): Correspondence analysis applied to microarray data. Proc. Nat. Acad. Sci. USA, 98, 10780–10786.CrossRefGoogle Scholar
  13. GUERRERO, A., BONNET, N., MARCO, S., and CARRASCOSA, J. (2000): Comparative study of methods for the automatic classification of macromolecular image sets: preliminary investigations with realistic simulations. Proc. SPIE - Applications of Artificial Neural Networks in Image Processing V, 3962, 92103.Google Scholar
  14. HARTUV, E., SCHMITT, A., LANGE, J., MEIER-EWERT, S., LEHRACH, H., and SHAMIR, R. (1999): An algorithm for clustering cDNAs for gene expression. Third Int. Conf. on Computational Molecular Biology (RECOMB’99). ACM Press, pp. 188–197.Google Scholar
  15. HERBIN, M., BONNET, N., and VAUTROT, P. (1996): A clustering method based on the estimation of the probability density function and on the skeleton by influence zones. Pattern Recognition Letters, 22, 1557–1568.CrossRefGoogle Scholar
  16. HERBIN, M., BONNET, N., and VAUTROT, P. (2001): Estimation of the number of clusters and influence zones. Pattern Recognition Letters, 17, 1141–1150.CrossRefGoogle Scholar
  17. HERRERO, J., VALENCIA, A., and DOPAZO, J. (2001): A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics, 17, 126–136.CrossRefGoogle Scholar
  18. LEBART, L., MORINEAU, A., and WARWICK, K.M. (1984): Multivariate Descriptive Statistical Analysis. Wiley & Sons, New York.zbMATHGoogle Scholar
  19. MJOLSNESS, E., NO, R.C., and WOLD, B. (1999): Multi—parent clustering algorithms for large scale gene expression analysis. Technical report JPL-ICTR-995.Google Scholar
  20. SHERF U. et al. (2000): A gene expression database for the molecular pharmacology of cancer. Nature Genetics, 24, 236–244.CrossRefGoogle Scholar
  21. TAMAYO, P., SLONIM, D., MESIROV, J., ZHU, Q., KITAREEWAN, S., DMITROWSKY, E., LANDER, E., and GOLUB, T. (1999): Interpreting patterns of gene expression with self—organizing maps: methods and application to hematopoietic differentiation. Proc. Nat. Acad. Sci. USA, 96, 2907–2912.CrossRefGoogle Scholar
  22. TAVAZOIE, S., HUGHES, J.D., CAMPBELL, M.J., CHO, R.J. and CHURCH, G.M. (1999): Systematic determination of genetic network architecture. Nature Genetics, 22, 281–285.CrossRefGoogle Scholar
  23. TIBSHIRANI, R., HASTIE T., NARASIMHAN, B., EISEN, M, SHERLOCK, G., BROWN, P., and BOTSTEIN, D. (2001): Exploratory screening of genes and clusters from microarray experiments. Internal report University of Stanford at Scholar
  24. WALL, M.E., DYCK, P.A., and BRETTIN, T.S. (2001): SVDMAN—singular value decomposition analysis of microarray data. Bioinformatics, 17, 566–568.CrossRefGoogle Scholar
  25. WEINSHTEIN, J.N. et al. (1997): An information—intensive approach to the molec-ular pharmacology of cancer. Science, 275, 343–349.CrossRefGoogle Scholar
  26. WEN, X., FUHRMAN, S., MICHAELS, G.S., CARR, D.B., SMITH, S., BARKER, J.L., and SOMOGYI, R. (1998): Large—scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. USA, 95, 334–339.CrossRefGoogle Scholar
  27. YEUNG, K.Y., FRALEY, C., MURUA, A., RAFTERY, A.E., and RUZZO, W.L. (2001): Model—based clustering and data transformations for gene expression data. Bioinformatics, 17, 977–987.CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Noël Bonnet
    • 1
    • 2
  • Michel Herbin
    • 2
  • Jérôme Cutrona
    • 1
    • 2
  • Jean-Marie Zahm
    • 1
  1. 1.Inserm Unit 514 (UMRS, IFR53)Reims cedexFrance
  2. 2.LERI, IUT Léonard de VinciUniversity of ReimsReims cedexFrance

Personalised recommendations