Skip to main content

Signature Selection for Grouped Features with a Case Study on Exon Microarrays

  • Chapter
  • First Online:
Feature Selection for Data and Pattern Recognition

Part of the book series: Studies in Computational Intelligence ((SCI,volume 584))

Abstract

When features are grouped, it is desirable to perform feature selection groupwise in addition to selecting individual features. It is typically the case in data obtained by modern high-throughput genomic profiling technologies such as exon microarrays, which measure the amount of gene expression in fine resolution. Exons are disjoint subsequences corresponding to coding regions in genes, and exon microarrays enable us to study the event of different usage of exons, called alternative splicing, which is presumed to contribute to development of diseases. To identify such events, all exons that belong to a relevant gene may have to be selected, perhaps with different weights assigned to them to detect most relevant ones. In this chapter we discuss feature selection methods to handle grouped features. A popular shrinkage method, lasso, and its variants will be our focus, that are based on regularized regression with generalized linear models. Data from exon microarrays will be used for a case study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.geneontology.org

  2. 2.

    http://www.affymetrix.com

  3. 3.

    http://www.ncbi.nlm.nih.gov/geo

  4. 4.

    http://www.affymetrix.com/analysis/downloads/na33/

References

  1. Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, P., Saeys, Y.: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3), 392–398 (2010)

    Article  Google Scholar 

  2. Bach, F.R.: Bolasso: model consistent Lasso estimation through the bootstrap. In: The 25th International Conference on Machine Learning, pp. 33–40 (2008)

    Google Scholar 

  3. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)

    Google Scholar 

  4. Candés, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(12), 4203–4215 (2005)

    Article  MATH  Google Scholar 

  5. Candés, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)

    Article  MATH  Google Scholar 

  6. Cox, D.R.: Regression models and life-tables. J. R. Stat. Soc. 34(2), 187–220 (1972)

    MATH  Google Scholar 

  7. Davis, C.A., Gerick, F., Hintermair, V., Friedel, C.C., Fundel, K., Küffner, R., Zimmer, R.: Reliable gene signatures for microarray classification: assessment of stability and performance. Bioinformatics 22(19), 2356–2363 (2006)

    Article  Google Scholar 

  8. Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  9. Eschenburg, G., Eggert, A., Schramm, A., Lode, H.N., Hundsdoerfer, P.: Smac mimetic LBW242 sensitizes XIAP-overexpressing neuroblastoma cells for TNF-\(\alpha \)-independent apoptosis. Cancer Res. 72(10), 2645–2656 (2012)

    Article  Google Scholar 

  10. Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)

    Google Scholar 

  11. Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning, 2nd ed. 2009. corr. 10th printing 2013 edn. Springer (2009)

    Google Scholar 

  12. Jacob, L., Obozinski, G., Vert, J.P.: Group Lasso with overlap and graph Lasso. In: Proceedings of the 26th International Conference on Machine Learning, pp. 433–440. Montreal, Quebec, (2009)

    Google Scholar 

  13. Mairal, J., Jenatton, R., Obozinski, G., Bach, F.: Network flow algorithms for structured sparsity. In: Advances in Neural Information Processing Systems, vol. 23, pp. 1558–1566. MIT Press (2010)

    Google Scholar 

  14. McCall, M.N., Bolstad, B.M., Irizarry, R.A.: Frozen robust multiarray analysis (fRMA). Biostatistics 11(2), 242–253 (2010)

    Article  Google Scholar 

  15. McCall, M., Murakami, P., Lukk, M., Huber, W., Irizarry, R.: Assessing affymetrix GeneChip microarray quality. BMC Bioinform. 12(1), 137 (2011)

    Article  Google Scholar 

  16. Meier, L., van de Geer, S., Bühlmann, P.: The group Lasso for logistic regression. J. R. Stat. Soc. Ser. B 70, 53–71 (2008)

    Article  MATH  Google Scholar 

  17. Meinshausen, N., Bühlmann, P.: High-dimensional graphs and variable selection with the Lasso. Ann. Stat. 34, 1436–1462 (2006)

    Article  MATH  Google Scholar 

  18. Meinshausen, N., Bühlmann, P.: Stability selection. J. R. Stat. Soc. Ser. B 72(4), 417–473 (2010)

    Article  MathSciNet  Google Scholar 

  19. Mestdagh, P., Boström, A.K., Impens, F., Fredlund, E., Peer, G.V., Antonellis, P.D., von Stedingk, K., Ghesquière, B., Schulte, S., Dews, M., Thomas-Tikhonenko, A., Schulte, J.H., Zollo, M., Schramm, A., Gevaert, K., Axelson, H., Speleman, F., Vandesompele, J.: The miR-17-92 microRNA cluster regulates multiple components of the TGF-\(\beta \) pathway in neuroblastoma. Mol. Cell 40(5), 762–773 (2010)

    Article  Google Scholar 

  20. Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.C., Müller, M.: pROC: an open-source package for r and s\(+\) to analyze and compare ROC curves. BMC Bioinform. 12(1), 77 (2011)

    Article  Google Scholar 

  21. Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)

    Google Scholar 

  22. Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group Lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)

    Article  MathSciNet  Google Scholar 

  23. Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  24. Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68, 49–67 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  25. Yuan, L., Liu, J., Ye, J.: Efficient methods for overlapping group Lasso. In: Advances in Neural Information Processing Systems, vol. 24, pp. 352–360. MIT Press (2011)

    Google Scholar 

  26. Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  27. Zhao, P., Yu, B.: On model selection consistency of Lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)

    MathSciNet  MATH  Google Scholar 

  28. Zhao, P., Rocha, G., Yu, B.: The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Stat. 37(6A), 3468–3497 (2009)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work has been supported by Deutsche Forschungsgemeinschaft (DFG) within the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Analysis”, project C1.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sangkyun Lee .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Lee, S. (2015). Signature Selection for Grouped Features with a Case Study on Exon Microarrays. In: Stańczyk, U., Jain, L. (eds) Feature Selection for Data and Pattern Recognition. Studies in Computational Intelligence, vol 584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45620-0_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-45620-0_14

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-45619-4

  • Online ISBN: 978-3-662-45620-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics