Abstract
Many techniques have already been suggested for handling and analyzing the large and high-dimensional data sets produced by newly developed gene expression experiments. These techniques include supervised classification and unsupervised agglomerative or hierarchical clustering techniques. Here, we present an alternative approach that does not make assumption on the shape, size and volumes of the clusters. The technique is based on the estimation of the probability density function (pdf). Once the pdf is estimated, with the Parzen technique (with the right amount of smoothing), the parameter space is partitioned according to methods inherited from image processing, namely the skeleton by influence zones and the watershed. We show some advantages of this suggested approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
ALON, U., BARKAI, N., NOTTERMAN, D.A., GISH, K., YBARRA, S., MACK, D., and LEVINE, A.J. (1999): Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA, 96, 6745–6750.
BEN-DOR, A., SHAMIR, R., and YAKHINI, Z. (1999): Clustering gene expression patterns. Journal of Computational Biology, 6, 281–297.
BONNET, N. (1998): Multivariate statistical methods for the analysis of microscope image series. Journal of Microscopy, 190, 2–18.
BONNET, N. (2000): Artificial intelligence and pattern recognition techniques in microscope image processing and analysis. Advances in Imaging and Electron Physics, 114, 1–77.
BONNET, N., HERBIN, M., and VAUTROT, P. (1997): Une méthode de classification non supervisée ne faisant pas d’hypothèse sur la forme des classes: application à la segmentation d’images multivariables. Cinquièmes Rencontres de la Société Francophone de Classification. Lyon. Proceedings pp 151–154.
BONNET, N., and CUTRONA, J. (2001): Improvement of unsupervised multi—component image segmentation through fuzzy relaxation. LASTED International Conference on Visualization, Imaging and Image Processing (VIIP’2001) Marbella ( Spain ). Acta Press: 477–482.
BROWN, M., GRUNDY, W., LIN, D., CRISTIANINI, N., SUGNET, C., FUREY, T., ARES, M., and HAUSSLER, D. (2000): Knowledge—based analysis of microarray gene expression data by using support vector machines. Proc. Nat. Acad. Sci. USA, 97, 262–267.
CHENG, Y. (1995): Mean shift, mode seeking and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 790–799.
COMANICIU, D., and MEER, P. (2002): Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence. In press.
CUTRONA, J., BONNET, N., and HERBIN, M. (2002): A new fuzzy clustering technique based on pdf estimation. Information Processing and Management of Uncertainty (IPMU’2002). Submitted.
EISEN, M.B., SPELLMAN, P.T., BROWN, P.O., and BOTSTEIN, D. (1998): Cluster analysis and display of genome—wide expression patterns. Proc. Natl. Acad. Sci. USA, 95, 14863–14868.
FELLENBERG, K., HAUSER, N.C., BRORS, B., NEUTZNER, A., HOHEISEL, J.D., and VINGRON, M. (2001): Correspondence analysis applied to microarray data. Proc. Nat. Acad. Sci. USA, 98, 10780–10786.
GUERRERO, A., BONNET, N., MARCO, S., and CARRASCOSA, J. (2000): Comparative study of methods for the automatic classification of macromolecular image sets: preliminary investigations with realistic simulations. Proc. SPIE - Applications of Artificial Neural Networks in Image Processing V, 3962, 92103.
HARTUV, E., SCHMITT, A., LANGE, J., MEIER-EWERT, S., LEHRACH, H., and SHAMIR, R. (1999): An algorithm for clustering cDNAs for gene expression. Third Int. Conf. on Computational Molecular Biology (RECOMB’99). ACM Press, pp. 188–197.
HERBIN, M., BONNET, N., and VAUTROT, P. (1996): A clustering method based on the estimation of the probability density function and on the skeleton by influence zones. Pattern Recognition Letters, 22, 1557–1568.
HERBIN, M., BONNET, N., and VAUTROT, P. (2001): Estimation of the number of clusters and influence zones. Pattern Recognition Letters, 17, 1141–1150.
HERRERO, J., VALENCIA, A., and DOPAZO, J. (2001): A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics, 17, 126–136.
LEBART, L., MORINEAU, A., and WARWICK, K.M. (1984): Multivariate Descriptive Statistical Analysis. Wiley & Sons, New York.
MJOLSNESS, E., NO, R.C., and WOLD, B. (1999): Multi—parent clustering algorithms for large scale gene expression analysis. Technical report JPL-ICTR-995.
SHERF U. et al. (2000): A gene expression database for the molecular pharmacology of cancer. Nature Genetics, 24, 236–244.
TAMAYO, P., SLONIM, D., MESIROV, J., ZHU, Q., KITAREEWAN, S., DMITROWSKY, E., LANDER, E., and GOLUB, T. (1999): Interpreting patterns of gene expression with self—organizing maps: methods and application to hematopoietic differentiation. Proc. Nat. Acad. Sci. USA, 96, 2907–2912.
TAVAZOIE, S., HUGHES, J.D., CAMPBELL, M.J., CHO, R.J. and CHURCH, G.M. (1999): Systematic determination of genetic network architecture. Nature Genetics, 22, 281–285.
TIBSHIRANI, R., HASTIE T., NARASIMHAN, B., EISEN, M, SHERLOCK, G., BROWN, P., and BOTSTEIN, D. (2001): Exploratory screening of genes and clusters from microarray experiments. Internal report University of Stanford at http://www-stat.stanford.edu.
WALL, M.E., DYCK, P.A., and BRETTIN, T.S. (2001): SVDMAN—singular value decomposition analysis of microarray data. Bioinformatics, 17, 566–568.
WEINSHTEIN, J.N. et al. (1997): An information—intensive approach to the molec-ular pharmacology of cancer. Science, 275, 343–349.
WEN, X., FUHRMAN, S., MICHAELS, G.S., CARR, D.B., SMITH, S., BARKER, J.L., and SOMOGYI, R. (1998): Large—scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. USA, 95, 334–339.
YEUNG, K.Y., FRALEY, C., MURUA, A., RAFTERY, A.E., and RUZZO, W.L. (2001): Model—based clustering and data transformations for gene expression data. Bioinformatics, 17, 977–987.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bonnet, N., Herbin, M., Cutrona, J., Zahm, JM. (2002). A New Clustering Approach, Based on the Estimation of the Probability Density Function, for Gene Expression Data. In: Jajuga, K., Sokołowski, A., Bock, HH. (eds) Classification, Clustering, and Data Analysis. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-56181-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-56181-8_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43691-1
Online ISBN: 978-3-642-56181-8
eBook Packages: Springer Book Archive