A New Clustering Approach, Based on the Estimation of the Probability Density Function, for Gene Expression Data

Bonnet, Noël; Herbin, Michel; Cutrona, Jérôme; Zahm, Jean-Marie

doi:10.1007/978-3-642-56181-8_3

Noël Bonnet^7,8,
Michel Herbin⁸,
Jérôme Cutrona^7,8 &
…
Jean-Marie Zahm⁷

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

1771 Accesses

Abstract

Many techniques have already been suggested for handling and analyzing the large and high-dimensional data sets produced by newly developed gene expression experiments. These techniques include supervised classification and unsupervised agglomerative or hierarchical clustering techniques. Here, we present an alternative approach that does not make assumption on the shape, size and volumes of the clusters. The technique is based on the estimation of the probability density function (pdf). Once the pdf is estimated, with the Parzen technique (with the right amount of smoothing), the parameter space is partitioned according to methods inherited from image processing, namely the skeleton by influence zones and the watershed. We show some advantages of this suggested approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

ALON, U., BARKAI, N., NOTTERMAN, D.A., GISH, K., YBARRA, S., MACK, D., and LEVINE, A.J. (1999): Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA, 96, 6745–6750.
Article Google Scholar
BEN-DOR, A., SHAMIR, R., and YAKHINI, Z. (1999): Clustering gene expression patterns. Journal of Computational Biology, 6, 281–297.
Article Google Scholar
BONNET, N. (1998): Multivariate statistical methods for the analysis of microscope image series. Journal of Microscopy, 190, 2–18.
Article Google Scholar
BONNET, N. (2000): Artificial intelligence and pattern recognition techniques in microscope image processing and analysis. Advances in Imaging and Electron Physics, 114, 1–77.
Article MathSciNet Google Scholar
BONNET, N., HERBIN, M., and VAUTROT, P. (1997): Une méthode de classification non supervisée ne faisant pas d’hypothèse sur la forme des classes: application à la segmentation d’images multivariables. Cinquièmes Rencontres de la Société Francophone de Classification. Lyon. Proceedings pp 151–154.
Google Scholar
BONNET, N., and CUTRONA, J. (2001): Improvement of unsupervised multi—component image segmentation through fuzzy relaxation. LASTED International Conference on Visualization, Imaging and Image Processing (VIIP’2001) Marbella ( Spain ). Acta Press: 477–482.
Google Scholar
BROWN, M., GRUNDY, W., LIN, D., CRISTIANINI, N., SUGNET, C., FUREY, T., ARES, M., and HAUSSLER, D. (2000): Knowledge—based analysis of microarray gene expression data by using support vector machines. Proc. Nat. Acad. Sci. USA, 97, 262–267.
Article Google Scholar
CHENG, Y. (1995): Mean shift, mode seeking and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 790–799.
Article Google Scholar
COMANICIU, D., and MEER, P. (2002): Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence. In press.
Google Scholar
CUTRONA, J., BONNET, N., and HERBIN, M. (2002): A new fuzzy clustering technique based on pdf estimation. Information Processing and Management of Uncertainty (IPMU’2002). Submitted.
Google Scholar
EISEN, M.B., SPELLMAN, P.T., BROWN, P.O., and BOTSTEIN, D. (1998): Cluster analysis and display of genome—wide expression patterns. Proc. Natl. Acad. Sci. USA, 95, 14863–14868.
Article Google Scholar
FELLENBERG, K., HAUSER, N.C., BRORS, B., NEUTZNER, A., HOHEISEL, J.D., and VINGRON, M. (2001): Correspondence analysis applied to microarray data. Proc. Nat. Acad. Sci. USA, 98, 10780–10786.
Article Google Scholar
GUERRERO, A., BONNET, N., MARCO, S., and CARRASCOSA, J. (2000): Comparative study of methods for the automatic classification of macromolecular image sets: preliminary investigations with realistic simulations. Proc. SPIE - Applications of Artificial Neural Networks in Image Processing V, 3962, 92103.
Google Scholar
HARTUV, E., SCHMITT, A., LANGE, J., MEIER-EWERT, S., LEHRACH, H., and SHAMIR, R. (1999): An algorithm for clustering cDNAs for gene expression. Third Int. Conf. on Computational Molecular Biology (RECOMB’99). ACM Press, pp. 188–197.
Google Scholar
HERBIN, M., BONNET, N., and VAUTROT, P. (1996): A clustering method based on the estimation of the probability density function and on the skeleton by influence zones. Pattern Recognition Letters, 22, 1557–1568.
Article Google Scholar
HERBIN, M., BONNET, N., and VAUTROT, P. (2001): Estimation of the number of clusters and influence zones. Pattern Recognition Letters, 17, 1141–1150.
Article Google Scholar
HERRERO, J., VALENCIA, A., and DOPAZO, J. (2001): A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics, 17, 126–136.
Article Google Scholar
LEBART, L., MORINEAU, A., and WARWICK, K.M. (1984): Multivariate Descriptive Statistical Analysis. Wiley & Sons, New York.
MATH Google Scholar
MJOLSNESS, E., NO, R.C., and WOLD, B. (1999): Multi—parent clustering algorithms for large scale gene expression analysis. Technical report JPL-ICTR-995.
Google Scholar
SHERF U. et al. (2000): A gene expression database for the molecular pharmacology of cancer. Nature Genetics, 24, 236–244.
Article Google Scholar
TAMAYO, P., SLONIM, D., MESIROV, J., ZHU, Q., KITAREEWAN, S., DMITROWSKY, E., LANDER, E., and GOLUB, T. (1999): Interpreting patterns of gene expression with self—organizing maps: methods and application to hematopoietic differentiation. Proc. Nat. Acad. Sci. USA, 96, 2907–2912.
Article Google Scholar
TAVAZOIE, S., HUGHES, J.D., CAMPBELL, M.J., CHO, R.J. and CHURCH, G.M. (1999): Systematic determination of genetic network architecture. Nature Genetics, 22, 281–285.
Article Google Scholar
TIBSHIRANI, R., HASTIE T., NARASIMHAN, B., EISEN, M, SHERLOCK, G., BROWN, P., and BOTSTEIN, D. (2001): Exploratory screening of genes and clusters from microarray experiments. Internal report University of Stanford at http://www-stat.stanford.edu.
Google Scholar
WALL, M.E., DYCK, P.A., and BRETTIN, T.S. (2001): SVDMAN—singular value decomposition analysis of microarray data. Bioinformatics, 17, 566–568.
Article Google Scholar
WEINSHTEIN, J.N. et al. (1997): An information—intensive approach to the molec-ular pharmacology of cancer. Science, 275, 343–349.
Article Google Scholar
WEN, X., FUHRMAN, S., MICHAELS, G.S., CARR, D.B., SMITH, S., BARKER, J.L., and SOMOGYI, R. (1998): Large—scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. USA, 95, 334–339.
Article Google Scholar
YEUNG, K.Y., FRALEY, C., MURUA, A., RAFTERY, A.E., and RUZZO, W.L. (2001): Model—based clustering and data transformations for gene expression data. Bioinformatics, 17, 977–987.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Inserm Unit 514 (UMRS, IFR53), 45 rue Cognacq Jay, 51092, Reims cedex, France
Noël Bonnet, Jérôme Cutrona & Jean-Marie Zahm
LERI, IUT Léonard de Vinci, University of Reims, Rue des Crayères, BP 1035, 51687, Reims cedex, France
Noël Bonnet, Michel Herbin & Jérôme Cutrona

Authors

Noël Bonnet
View author publications
You can also search for this author in PubMed Google Scholar
Michel Herbin
View author publications
You can also search for this author in PubMed Google Scholar
Jérôme Cutrona
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Marie Zahm
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Wroclaw University of Economics, ul. Komandorska 118/120, 53-345, Wroclaw, Poland
Krzysztof Jajuga
Department of Statistics, Cracow University of Economics, ul. Rakowicka 27, 31-510, Cracow, Poland
Andrzej Sokołowski
Institute of Statistics, Technical University of Aachen, Wuellnerstrasse 3, 52056, Aachen, Germany
Hans-Hermann Bock

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bonnet, N., Herbin, M., Cutrona, J., Zahm, JM. (2002). A New Clustering Approach, Based on the Estimation of the Probability Density Function, for Gene Expression Data. In: Jajuga, K., Sokołowski, A., Bock, HH. (eds) Classification, Clustering, and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-56181-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-642-56181-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43691-1
Online ISBN: 978-3-642-56181-8
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics