Abstract
One of the most promising approaches for gaining insight into the biological activity of genes is to study their expression patterns in a variety of experimental conditions and contexts. In this work we present a genetic- algorithm-based approach for optimizing weighting schemes of variables used to improve clustering solutions. The same technique is used for feature selection and the detection of marker components in large datasets. An original string representation based on real numbers is used to encode the variable weight, and a modified silhouette value is used as fitness function. The strategy has a generic and parametric formulation, and effectiveness is demonstrated on gene-expression data.
Keywords
- Genetic Algorithm
- Feature Selection
- Weighting Scheme
- Acute Myeloid Leukaemia
- Acute Lymphoblastic Leukaemia
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alizadeh, A.A.; et. al. (2000) “Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling” Nature 403, 503–511.
Bittner, M.; et. al.(2000) “Molecular classification of cutaneous malignant melanoma by gene expression profiling”. Nature 406, 536–540.
Brazma, A. and Vilo J. (2000), “Gene expression data analysis”, FEBS letters, vol 480, Issue 1, pp 17–24
Davies, D.L. and Bouldin, D.W. (1979), “A cluster separation measure”, IEEE Trans. Patt.Anal. Mach. Intell. 1 pp. 224–227
Dillon, W.R. and Goldstein, M. (1984) “Multivariate Analysis: Methods and Applications”. John Wiley & Sons, New York.
Eisen, M., Spellman, P.T., Botstein, D. and Brown, P.O. (1998) Proc. Natl. Acad. Sci. USA 95, 14863–14867
Everitt, B. (1993), “Cluster analysis”, London: Edward Arnold, third edition.
Golberg, D.E., (1989), “Genetic Algorithms in Search, Optimisation and Machine Learning”, Addison Wesley Publishing Company.
Golub, T.R. et.al. (1999) “Molecular Classifications of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring”. Science 286:531–537
Hartigan, J.A., (1975), “Clustering Algorithms”, Wiley, New-York
Iyer, V.R.; et.al (1999) “The transcriptional program in the response of human fibroblast to serum”. Science 283 (5398):83–87.
Kaufman, L. and Rousseeuw, P.J. (1990). “Finding groups in data. An introduction to cluster analysis”. Wiley-Interscince, New York.
Jain, A,K, and Dubes, R.L. (1998), “Algorithms for clustering data”, Prentice-Hall
Lowell, D.R.; et al. (1997) “On the use of expected attainable discrimination for feature selection in large scale medical risk prediction problems”. CUED/F-INFENG/TR299
Perez, O. M.; Marin F. J.; and Trelles, O. (2001), “Improving Biological Sequence Property Distances by using a Genetic Algorithm”, IWANN 2001, LNCS 2085, pp. 539–546.
Rousseeuw, P.J. (1987) “Silhouettes: A graphical aid to the interpretations and validation of cluster analysis”. J. of Computational and Applied mathematics,20:53–65.
Ríos Sixto (1983), “Análisis estadístico aplicado”. Madrid: Paraninfo,1983. 3a edición.
Sokal, R.R. (1977), “Clustering and classification: background and current directions”, In Van Ryzin, J. ed., Classification and Clustering, 1–15, Acad. Press.
Stefanini, F.M. and Camussi, A. (2000) “The reduction of large molecular profiles to informative components using a genetic algorithm” Bioinformatics 16, 923–931
Tamayo, P.; et.al. (1999) “Interpreting patterns of gene expression with selforganizing maps: methods and application to hematopoietic differentiation”. Proc. Natl. Acad. Sci. USA 96 (6),2907–2912.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Pérez, O.M., Hidalgo-Conde, M., Marín, F.J., Trelles, O. (2003). Weighting and Feature Selection on Gene-Expression data by the use of Genetic Algorithms. In: Mira, J., Álvarez, J.R. (eds) Artificial Neural Nets Problem Solving Methods. IWANN 2003. Lecture Notes in Computer Science, vol 2687. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44869-1_48
Download citation
DOI: https://doi.org/10.1007/3-540-44869-1_48
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40211-4
Online ISBN: 978-3-540-44869-3
eBook Packages: Springer Book Archive