Gene Selection in Time-Series Gene Expression Data
Abstract
The dimensionality of biological data is often very high. Feature selection can be used to tackle the problem of high dimensionality. However, majority of the work in feature selection consists of supervised feature selection methods which require class labels. The problem further escalates when the data is time–series gene expression measurements that measure the effect of external stimuli on biological system. In this paper we propose an unsupervised method for gene selection from time–series gene expression data founded on statistical significance testing and swap randomization. We perform experiments with a publicly available mouse gene expression dataset and also a human gene expression dataset describing the exposure to asbestos. The results in both datasets show a considerable decrease in number of genes.
Keywords
Feature Selection Statistical Significance Time–series RandomizationReferences
- 1.Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the ninth international workshop on Machine Learning, ML 1992, pp. 249–256. Morgan Kaufmann Publishers Inc., San Francisco (1992)CrossRefGoogle Scholar
- 2.Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artificial Intelligence 97(1-2), 245–271 (1997)CrossRefMATHGoogle Scholar
- 3.Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRefGoogle Scholar
- 4.Mörchen, F.: Time series feature extraction for data mining using DWT and DFT. Technical Report 33, Department of Mathematics and Computer Science, University of Marburg, Germany (2003)Google Scholar
- 5.Tikka, J., Hollmén, J.: A Sequential Input Selection Algorithm for Long-term prediction of Time Series. Neurocomputing 71(13-15), 2604–2615 (2008)CrossRefGoogle Scholar
- 6.Heller, M.J.: DNA microarray technology: Devices, systems, and applications. Annual Review Of Biomedical Engineering 4, 129–153 (2002)CrossRefGoogle Scholar
- 7.Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for Cancer Classification using Support Vector Machines. Machine Learning 46, 389–422 (2002)CrossRefMATHGoogle Scholar
- 8.Diaz-Uriarte, R., Alvarez de Andres, S.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7(1), 3 (2006)CrossRefGoogle Scholar
- 9.Wichert, S., Fokianos, K., Strimmer, K.: Identifying periodically expressed transcripts in microarray time series data. Bioinformatics 20, 5–20 (2004)CrossRefGoogle Scholar
- 10.Peddada, S.D., Lobenhofer, E.K., Li, L., Afshari, C.A., Weinberg, C.R., Umbach, D.M.: Gene selection and clustering for time–course and dose–response microarray experiments using order–restricted inference. Bioinformatics 19(7), 834–841 (2003)CrossRefGoogle Scholar
- 11.Lin, T., Kaminski, N., Bar-Joseph, Z.: Alignment and classification of time series gene expression in clinical studies. Bioinformatics 24(13), i147–i155 (2008)Google Scholar
- 12.Hyvärinen, A., Karhunen, J., Oja, E.: Independent component analysis. Adaptive and learning systems for signal processing, communications, and control. John Wiley and Sons (2001)Google Scholar
- 13.Nymark, N., Lindholm, P.M., Korpela, M.V., Lahti, L., Ruosaari, S., Kaski, S., Hollmén, J., Anttila, S., Kinnula, V.L., Knuutila, S.: Gene Expression Profiles in Asbestos-exposed Epithelial and Mesothelial Lung Cell Lines. BMC Genomics 8(1), 62 (2007)CrossRefGoogle Scholar
- 14.Zhang, Z., Martino, A., Faulon, J.: Identification of expression patterns of IL-2-responsive genes in the murine T cell line CTLL-2. Jounal of Interferon & Cytokine Research 27(12), 991–995 (2007)CrossRefGoogle Scholar
- 15.Bolstad, B.M., Irizarry, R.A., Astrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185–193 (2003)CrossRefGoogle Scholar
- 16.Parmigiani, G.: The analysis of gene expression data: methods and software. Springer, Heidelberg (2003)CrossRefMATHGoogle Scholar
- 17.Good, P.: Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses, 2nd edn. Springer, Heidelberg (2000)CrossRefMATHGoogle Scholar
- 18.Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. ACM Transactions on Knowledge Discovery from Data 1(3), 14 (2007)CrossRefGoogle Scholar
- 19.Schervish, M.J.: P Values: What They Are and What They Are Not. American Statistician 50(3), 203–206 (1996)Google Scholar
- 20.Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 65–70 (1979)MATHGoogle Scholar