Gene Selection in Time-Series Gene Expression Data

  • Prem Raj Adhikari
  • Bimal Babu Upadhyaya
  • Chen Meng
  • Jaakko Hollmén
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7036)


The dimensionality of biological data is often very high. Feature selection can be used to tackle the problem of high dimensionality. However, majority of the work in feature selection consists of supervised feature selection methods which require class labels. The problem further escalates when the data is time–series gene expression measurements that measure the effect of external stimuli on biological system. In this paper we propose an unsupervised method for gene selection from time–series gene expression data founded on statistical significance testing and swap randomization. We perform experiments with a publicly available mouse gene expression dataset and also a human gene expression dataset describing the exposure to asbestos. The results in both datasets show a considerable decrease in number of genes.


Feature Selection Statistical Significance Time–series Randomization 


  1. 1.
    Kira, K., Rendell, L.A.: A practical approach to feature selection. In: Proceedings of the ninth international workshop on Machine Learning, ML 1992, pp. 249–256. Morgan Kaufmann Publishers Inc., San Francisco (1992)CrossRefGoogle Scholar
  2. 2.
    Blum, A.L., Langley, P.: Selection of relevant features and examples in machine learning. Artificial Intelligence 97(1-2), 245–271 (1997)CrossRefzbMATHGoogle Scholar
  3. 3.
    Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRefGoogle Scholar
  4. 4.
    Mörchen, F.: Time series feature extraction for data mining using DWT and DFT. Technical Report 33, Department of Mathematics and Computer Science, University of Marburg, Germany (2003)Google Scholar
  5. 5.
    Tikka, J., Hollmén, J.: A Sequential Input Selection Algorithm for Long-term prediction of Time Series. Neurocomputing 71(13-15), 2604–2615 (2008)CrossRefGoogle Scholar
  6. 6.
    Heller, M.J.: DNA microarray technology: Devices, systems, and applications. Annual Review Of Biomedical Engineering 4, 129–153 (2002)CrossRefGoogle Scholar
  7. 7.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for Cancer Classification using Support Vector Machines. Machine Learning 46, 389–422 (2002)CrossRefzbMATHGoogle Scholar
  8. 8.
    Diaz-Uriarte, R., Alvarez de Andres, S.: Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7(1), 3 (2006)CrossRefGoogle Scholar
  9. 9.
    Wichert, S., Fokianos, K., Strimmer, K.: Identifying periodically expressed transcripts in microarray time series data. Bioinformatics 20, 5–20 (2004)CrossRefGoogle Scholar
  10. 10.
    Peddada, S.D., Lobenhofer, E.K., Li, L., Afshari, C.A., Weinberg, C.R., Umbach, D.M.: Gene selection and clustering for time–course and dose–response microarray experiments using order–restricted inference. Bioinformatics 19(7), 834–841 (2003)CrossRefGoogle Scholar
  11. 11.
    Lin, T., Kaminski, N., Bar-Joseph, Z.: Alignment and classification of time series gene expression in clinical studies. Bioinformatics 24(13), i147–i155 (2008)Google Scholar
  12. 12.
    Hyvärinen, A., Karhunen, J., Oja, E.: Independent component analysis. Adaptive and learning systems for signal processing, communications, and control. John Wiley and Sons (2001)Google Scholar
  13. 13.
    Nymark, N., Lindholm, P.M., Korpela, M.V., Lahti, L., Ruosaari, S., Kaski, S., Hollmén, J., Anttila, S., Kinnula, V.L., Knuutila, S.: Gene Expression Profiles in Asbestos-exposed Epithelial and Mesothelial Lung Cell Lines. BMC Genomics 8(1), 62 (2007)CrossRefGoogle Scholar
  14. 14.
    Zhang, Z., Martino, A., Faulon, J.: Identification of expression patterns of IL-2-responsive genes in the murine T cell line CTLL-2. Jounal of Interferon & Cytokine Research 27(12), 991–995 (2007)CrossRefGoogle Scholar
  15. 15.
    Bolstad, B.M., Irizarry, R.A., Astrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185–193 (2003)CrossRefGoogle Scholar
  16. 16.
    Parmigiani, G.: The analysis of gene expression data: methods and software. Springer, Heidelberg (2003)CrossRefzbMATHGoogle Scholar
  17. 17.
    Good, P.: Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses, 2nd edn. Springer, Heidelberg (2000)CrossRefzbMATHGoogle Scholar
  18. 18.
    Gionis, A., Mannila, H., Mielikäinen, T., Tsaparas, P.: Assessing data mining results via swap randomization. ACM Transactions on Knowledge Discovery from Data 1(3), 14 (2007)CrossRefGoogle Scholar
  19. 19.
    Schervish, M.J.: P Values: What They Are and What They Are Not. American Statistician 50(3), 203–206 (1996)Google Scholar
  20. 20.
    Holm, S.: A simple sequentially rejective multiple test procedure. Scandinavian Journal of Statistics 6, 65–70 (1979)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Prem Raj Adhikari
    • 1
  • Bimal Babu Upadhyaya
    • 1
  • Chen Meng
    • 2
  • Jaakko Hollmén
    • 1
  1. 1.Department of Information and Computer ScienceAalto University School of ScienceAaltoFinland
  2. 2.Department of Computational BiologyRoyal Institute of Technology, School of Computer Science and CommunicationStockholmSweden

Personalised recommendations