Gene Selection in Time-Series Gene Expression Data
The dimensionality of biological data is often very high. Feature selection can be used to tackle the problem of high dimensionality. However, majority of the work in feature selection consists of supervised feature selection methods which require class labels. The problem further escalates when the data is time–series gene expression measurements that measure the effect of external stimuli on biological system. In this paper we propose an unsupervised method for gene selection from time–series gene expression data founded on statistical significance testing and swap randomization. We perform experiments with a publicly available mouse gene expression dataset and also a human gene expression dataset describing the exposure to asbestos. The results in both datasets show a considerable decrease in number of genes.
KeywordsFeature Selection Statistical Significance Time–series Randomization
- 4.Mörchen, F.: Time series feature extraction for data mining using DWT and DFT. Technical Report 33, Department of Mathematics and Computer Science, University of Marburg, Germany (2003)Google Scholar
- 11.Lin, T., Kaminski, N., Bar-Joseph, Z.: Alignment and classification of time series gene expression in clinical studies. Bioinformatics 24(13), i147–i155 (2008)Google Scholar
- 12.Hyvärinen, A., Karhunen, J., Oja, E.: Independent component analysis. Adaptive and learning systems for signal processing, communications, and control. John Wiley and Sons (2001)Google Scholar
- 19.Schervish, M.J.: P Values: What They Are and What They Are Not. American Statistician 50(3), 203–206 (1996)Google Scholar