Deep Feature Selection: Theory and Application to Identify Enhancers and Promoters

  • Yifeng Li
  • Chih-Yu Chen
  • Wyeth W. WassermanEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9029)


Sparse linear models approximate target variable(s) by a sparse linear combination of input variables. The sparseness is realized through a regularization term. Since they are simple, fast, and able to select features, they are widely used in classification and regression. Essentially linear models are shallow feed-forward neural networks which have three limitations: (1) incompatibility to model non-linearity of features, (2) inability to learn high-level features, and (3) unnatural extensions to select features in multi-class case. Deep neural networks are models structured by multiple hidden layers with non-linear activation functions. Compared with linear models, they have two distinctive strengths: the capability to (1) model complex systems with non-linear structures, (2) learn high-level representation of features. Deep learning has been applied in many large and complex systems where deep models significantly outperform shallow ones. However, feature selection at the input level, which is very helpful to understand the nature of a complex system, is still not well-studied. In genome research, the cis-regulatory elements in non-coding DNA sequences play a key role in the expression of genes. Since the activity of regulatory elements involves highly interactive factors, a deep tool is strongly needed to discover informative features. In order to address the above limitations of shallow and deep models for selecting features of a complex system, we propose a deep feature selection model that (1) takes advantages of deep structures to model non-linearity and (2) conveniently selects a subset of features right at the input level for multi-class data. We applied this model to the identification of active enhancers and promoters by integrating multiple sources of genomic information. Results show that our model outperforms elastic net in terms of size of discriminative feature subset and classification accuracy.


Deep learning Feature selection Enhancer Promoter 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Ackley, D., Hinton, G., Sejnowski, T.: A learning algorithm for Boltzmann machines. Cognitive Science, 147–169 (1985)Google Scholar
  2. 2.
    Andersson, R., Gebhard, C., Miguel-Escalada, I., et al.: An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014)CrossRefGoogle Scholar
  3. 3.
    Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(8), 1798–1828 (2013)CrossRefGoogle Scholar
  4. 4.
    Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: The Python for Scientific Computing Conference (SciPy), June 2010Google Scholar
  5. 5.
    Bradley, P., Mangasarian, O.: Feature selection via concave minimization and support vector machines. In: International Conference on Machine Learning, pp. 82–90. Morgan Kaufmann Publishers Inc. (1998)Google Scholar
  6. 6.
    Bredemeier-Ernst, I., Nordheim, A., Janknecht, R.: Transcriptional activity and constitutive nuclear localization of the ETS protein Elf-1. FEBS Letters 408(1), 47–51 (1997)CrossRefGoogle Scholar
  7. 7.
    Breiman, L.: Random Forests. Machine learning 45, 5–32 (2001)CrossRefzbMATHGoogle Scholar
  8. 8.
    Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33, 1–22 (2010)Google Scholar
  9. 9.
    Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313, 504–507 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  10. 10.
    Hinton, G., Osindero, S., Teh, Y.: A fast learning algorithm for deep belief nets. Neural Computation 18, 1527–1554 (2006)CrossRefzbMATHMathSciNetGoogle Scholar
  11. 11.
    Ise, W., Kohyama, M., Schraml, B., Zhang, T., Schwer, B., Basu, U., Alt, F., Tang, J., Oltz, E., Murphy, T., Murphy, K.: The transcription factor BATF controls the global regulators of class-switch recombination in both B cells and T cells. Nature Immunology 12(6), 536–543 (2011)CrossRefGoogle Scholar
  12. 12.
    Kratz, A., Arner, E., Saito, R., Kubosaki, A., Kawai, J., Suzuki, H., Carninci, P., Arakawa, T., Tomita, M., Hayashizaki, Y., Daub, C.: Core promoter structure and genomic context reflect histone 3 lysine 9 acetylation patterns. BMC Genomics 11, 257 (2010)CrossRefGoogle Scholar
  13. 13.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  14. 14.
    Lee, B., Dekker, J., Lee, B., Iyer, V., Sleckman, B., Shaffer, A.I., Ippolito, G., Tucker, P.: The BCL11A transcription factor directly activates rag gene expression and V(D)J recombination. Molecular Cell Biology 33(9), 1768–1781 (2013)CrossRefGoogle Scholar
  15. 15.
    Li, Y.: Deep learning package.
  16. 16.
    Li, Y., Ngom, A.: Classification approach based on non-negative least squares. Neurocomputing 118, 41–57 (2013)CrossRefGoogle Scholar
  17. 17.
    LISA Lab: Deep learning tutorials.
  18. 18.
    Nechanitzky, R., Akbas, D., Scherer, S., Gyory, I., Hoyler, T., Ramamoorthy, S., Diefenbach, A., Grosschedl, R.: Transcription factor EBF1 is essential for the maintenance of B cell identity and prevention of alternative fates in committed cells. Nature Immunology 14(8), 867–875 (2013)CrossRefGoogle Scholar
  19. 19.
    Pjanic, M., Pjanic, P., Schmid, C., Ambrosini, G., Gaussin, A., Plasari, G., Mazza, C., Bucher, P., Mermod, N.: Nuclear factor I revealed as family of promoter binding transcription activators. BMC Genomics 12, 181 (2011)CrossRefGoogle Scholar
  20. 20.
    Rebhan, M., Chalifa-Caspi, V., Prilusky, J., Lancet, D.: Genecards: Integrating information about genes, proteins and diseases. Trends in Genetics 13(4), 163 (1997)CrossRefGoogle Scholar
  21. 21.
    Shlyueva, D., Stampfel, G., Stark, A.: Transcriptional enhancers: From properties to genome-wide predictions. Nature Review Genetics 15, 272–286 (2014)CrossRefGoogle Scholar
  22. 22.
    The ENCODE Project Consortium: An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012)CrossRefGoogle Scholar
  23. 23.
    The FANTOM Consortium: The RIKEN PMI, CLST (DGT): A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014)CrossRefGoogle Scholar
  24. 24.
    Tibshirani, R.: Regression shrinkage and selection via the Lasso. Journal of the Royal Statistical Society. Series B (Methodological) 58(1), 267–288 (1996)zbMATHMathSciNetGoogle Scholar
  25. 25.
    Vakoc, C., Sachdeva, M., Wang, H., Blobel, G.: Profile of histone lysine methylation across transcribed mammalian chromatin. Molecular and Cellular Biology 26(24), 9185–9195 (2006)CrossRefGoogle Scholar
  26. 26.
    Wang, Y., Li, X., Hua, H.: H3K4me2 reliably defines transcription factor binding regions in different cells. Genomics 103(2–3), 222–228 (2014)CrossRefGoogle Scholar
  27. 27.
    Zhou, V., Goren, A., Bernstein, B.: Charting histone modifications and the functional organization of mammalian genomes. Nature Review Genetics 12, 7–18 (2011)CrossRefGoogle Scholar
  28. 28.
    Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Series B Stat. Methodol. 67(2), 301–320 (2005)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Centre for Molecular Medicine and TherapeuticsUniversity of British ColumbiaVancouverCanada

Personalised recommendations