Signature Selection for Grouped Features with a Case Study on Exon Microarrays

Lee, Sangkyun

doi:10.1007/978-3-662-45620-0_14

Sangkyun Lee⁴

Part of the book series: Studies in Computational Intelligence ((SCI,volume 584))

2875 Accesses
1 Citations

Abstract

When features are grouped, it is desirable to perform feature selection groupwise in addition to selecting individual features. It is typically the case in data obtained by modern high-throughput genomic profiling technologies such as exon microarrays, which measure the amount of gene expression in fine resolution. Exons are disjoint subsequences corresponding to coding regions in genes, and exon microarrays enable us to study the event of different usage of exons, called alternative splicing, which is presumed to contribute to development of diseases. To identify such events, all exons that belong to a relevant gene may have to be selected, perhaps with different weights assigned to them to detect most relevant ones. In this chapter we discuss feature selection methods to handle grouped features. A popular shrinkage method, lasso, and its variants will be our focus, that are based on regularized regression with generalized linear models. Data from exon microarrays will be used for a case study.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, P., Saeys, Y.: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3), 392–398 (2010)
Article Google Scholar
Bach, F.R.: Bolasso: model consistent Lasso estimation through the bootstrap. In: The 25th International Conference on Machine Learning, pp. 33–40 (2008)
Google Scholar
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)
Google Scholar
Candés, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(12), 4203–4215 (2005)
Article MATH Google Scholar
Candés, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)
Article MATH Google Scholar
Cox, D.R.: Regression models and life-tables. J. R. Stat. Soc. 34(2), 187–220 (1972)
MATH Google Scholar
Davis, C.A., Gerick, F., Hintermair, V., Friedel, C.C., Fundel, K., Küffner, R., Zimmer, R.: Reliable gene signatures for microarray classification: assessment of stability and performance. Bioinformatics 22(19), 2356–2363 (2006)
Article Google Scholar
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
Article MathSciNet MATH Google Scholar
Eschenburg, G., Eggert, A., Schramm, A., Lode, H.N., Hundsdoerfer, P.: Smac mimetic LBW242 sensitizes XIAP-overexpressing neuroblastoma cells for TNF-\(\alpha \)-independent apoptosis. Cancer Res. 72(10), 2645–2656 (2012)
Article Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning, 2nd ed. 2009. corr. 10th printing 2013 edn. Springer (2009)
Google Scholar
Jacob, L., Obozinski, G., Vert, J.P.: Group Lasso with overlap and graph Lasso. In: Proceedings of the 26th International Conference on Machine Learning, pp. 433–440. Montreal, Quebec, (2009)
Google Scholar
Mairal, J., Jenatton, R., Obozinski, G., Bach, F.: Network flow algorithms for structured sparsity. In: Advances in Neural Information Processing Systems, vol. 23, pp. 1558–1566. MIT Press (2010)
Google Scholar
McCall, M.N., Bolstad, B.M., Irizarry, R.A.: Frozen robust multiarray analysis (fRMA). Biostatistics 11(2), 242–253 (2010)
Article Google Scholar
McCall, M., Murakami, P., Lukk, M., Huber, W., Irizarry, R.: Assessing affymetrix GeneChip microarray quality. BMC Bioinform. 12(1), 137 (2011)
Article Google Scholar
Meier, L., van de Geer, S., Bühlmann, P.: The group Lasso for logistic regression. J. R. Stat. Soc. Ser. B 70, 53–71 (2008)
Article MATH Google Scholar
Meinshausen, N., Bühlmann, P.: High-dimensional graphs and variable selection with the Lasso. Ann. Stat. 34, 1436–1462 (2006)
Article MATH Google Scholar
Meinshausen, N., Bühlmann, P.: Stability selection. J. R. Stat. Soc. Ser. B 72(4), 417–473 (2010)
Article MathSciNet Google Scholar
Mestdagh, P., Boström, A.K., Impens, F., Fredlund, E., Peer, G.V., Antonellis, P.D., von Stedingk, K., Ghesquière, B., Schulte, S., Dews, M., Thomas-Tikhonenko, A., Schulte, J.H., Zollo, M., Schramm, A., Gevaert, K., Axelson, H., Speleman, F., Vandesompele, J.: The miR-17-92 microRNA cluster regulates multiple components of the TGF-\(\beta \) pathway in neuroblastoma. Mol. Cell 40(5), 762–773 (2010)
Article Google Scholar
Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.C., Müller, M.: pROC: an open-source package for r and s\(+\) to analyze and compare ROC curves. BMC Bioinform. 12(1), 77 (2011)
Article Google Scholar
Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)
Google Scholar
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group Lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)
Article MathSciNet Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68, 49–67 (2006)
Article MathSciNet MATH Google Scholar
Yuan, L., Liu, J., Ye, J.: Efficient methods for overlapping group Lasso. In: Advances in Neural Information Processing Systems, vol. 24, pp. 352–360. MIT Press (2011)
Google Scholar
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005)
Article MathSciNet MATH Google Scholar
Zhao, P., Yu, B.: On model selection consistency of Lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)
MathSciNet MATH Google Scholar
Zhao, P., Rocha, G., Yu, B.: The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Stat. 37(6A), 3468–3497 (2009)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work has been supported by Deutsche Forschungsgemeinschaft (DFG) within the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Analysis”, project C1.

Author information

Authors and Affiliations

Technische Universität Dortmund, Joseph-von-Fraunhofer-Strasse 23, 44227, Dortmund, Germany
Sangkyun Lee

Authors

Sangkyun Lee
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sangkyun Lee .

Editor information

Editors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Urszula Stańczyk
Mawson Lakes Campus, Faculty of Education, Science, Technology and Mathematics, University of Canberra, Canberra, Australia, and University of South Australia, Adelaide, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lee, S. (2015). Signature Selection for Grouped Features with a Case Study on Exon Microarrays. In: Stańczyk, U., Jain, L. (eds) Feature Selection for Data and Pattern Recognition. Studies in Computational Intelligence, vol 584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45620-0_14

Download citation

DOI: https://doi.org/10.1007/978-3-662-45620-0_14
Published: 31 December 2014
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45619-4
Online ISBN: 978-3-662-45620-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics