Abstract
When features are grouped, it is desirable to perform feature selection groupwise in addition to selecting individual features. It is typically the case in data obtained by modern high-throughput genomic profiling technologies such as exon microarrays, which measure the amount of gene expression in fine resolution. Exons are disjoint subsequences corresponding to coding regions in genes, and exon microarrays enable us to study the event of different usage of exons, called alternative splicing, which is presumed to contribute to development of diseases. To identify such events, all exons that belong to a relevant gene may have to be selected, perhaps with different weights assigned to them to detect most relevant ones. In this chapter we discuss feature selection methods to handle grouped features. A popular shrinkage method, lasso, and its variants will be our focus, that are based on regularized regression with generalized linear models. Data from exon microarrays will be used for a case study.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abeel, T., Helleputte, T., Van de Peer, Y., Dupont, P., Saeys, Y.: Robust biomarker identification for cancer diagnosis with ensemble feature selection methods. Bioinformatics 26(3), 392–398 (2010)
Bach, F.R.: Bolasso: model consistent Lasso estimation through the bootstrap. In: The 25th International Conference on Machine Learning, pp. 33–40 (2008)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)
Candés, E.J., Tao, T.: Decoding by linear programming. IEEE Trans. Inf. Theory 51(12), 4203–4215 (2005)
Candés, E.J., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)
Cox, D.R.: Regression models and life-tables. J. R. Stat. Soc. 34(2), 187–220 (1972)
Davis, C.A., Gerick, F., Hintermair, V., Friedel, C.C., Fundel, K., Küffner, R., Zimmer, R.: Reliable gene signatures for microarray classification: assessment of stability and performance. Bioinformatics 22(19), 2356–2363 (2006)
Donoho, D.L.: Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
Eschenburg, G., Eggert, A., Schramm, A., Lode, H.N., Hundsdoerfer, P.: Smac mimetic LBW242 sensitizes XIAP-overexpressing neuroblastoma cells for TNF-\(\alpha \)-independent apoptosis. Cancer Res. 72(10), 2645–2656 (2012)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Hastie, T., Tibshirani, R., Friedman, J.H.: The Elements of Statistical Learning, 2nd ed. 2009. corr. 10th printing 2013 edn. Springer (2009)
Jacob, L., Obozinski, G., Vert, J.P.: Group Lasso with overlap and graph Lasso. In: Proceedings of the 26th International Conference on Machine Learning, pp. 433–440. Montreal, Quebec, (2009)
Mairal, J., Jenatton, R., Obozinski, G., Bach, F.: Network flow algorithms for structured sparsity. In: Advances in Neural Information Processing Systems, vol. 23, pp. 1558–1566. MIT Press (2010)
McCall, M.N., Bolstad, B.M., Irizarry, R.A.: Frozen robust multiarray analysis (fRMA). Biostatistics 11(2), 242–253 (2010)
McCall, M., Murakami, P., Lukk, M., Huber, W., Irizarry, R.: Assessing affymetrix GeneChip microarray quality. BMC Bioinform. 12(1), 137 (2011)
Meier, L., van de Geer, S., Bühlmann, P.: The group Lasso for logistic regression. J. R. Stat. Soc. Ser. B 70, 53–71 (2008)
Meinshausen, N., Bühlmann, P.: High-dimensional graphs and variable selection with the Lasso. Ann. Stat. 34, 1436–1462 (2006)
Meinshausen, N., Bühlmann, P.: Stability selection. J. R. Stat. Soc. Ser. B 72(4), 417–473 (2010)
Mestdagh, P., Boström, A.K., Impens, F., Fredlund, E., Peer, G.V., Antonellis, P.D., von Stedingk, K., Ghesquière, B., Schulte, S., Dews, M., Thomas-Tikhonenko, A., Schulte, J.H., Zollo, M., Schramm, A., Gevaert, K., Axelson, H., Speleman, F., Vandesompele, J.: The miR-17-92 microRNA cluster regulates multiple components of the TGF-\(\beta \) pathway in neuroblastoma. Mol. Cell 40(5), 762–773 (2010)
Robin, X., Turck, N., Hainard, A., Tiberti, N., Lisacek, F., Sanchez, J.C., Müller, M.: pROC: an open-source package for r and s\(+\) to analyze and compare ROC curves. BMC Bioinform. 12(1), 77 (2011)
Schölkopf, B., Smola, A.J.: Learning with Kernels. MIT Press, Cambridge (2002)
Simon, N., Friedman, J., Hastie, T., Tibshirani, R.: A sparse-group Lasso. J. Comput. Graph. Stat. 22(2), 231–245 (2013)
Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)
Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68, 49–67 (2006)
Yuan, L., Liu, J., Ye, J.: Efficient methods for overlapping group Lasso. In: Advances in Neural Information Processing Systems, vol. 24, pp. 352–360. MIT Press (2011)
Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. B 67, 301–320 (2005)
Zhao, P., Yu, B.: On model selection consistency of Lasso. J. Mach. Learn. Res. 7, 2541–2563 (2006)
Zhao, P., Rocha, G., Yu, B.: The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Stat. 37(6A), 3468–3497 (2009)
Acknowledgments
This work has been supported by Deutsche Forschungsgemeinschaft (DFG) within the Collaborative Research Center SFB 876 “Providing Information by Resource-Constrained Analysis”, project C1.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Lee, S. (2015). Signature Selection for Grouped Features with a Case Study on Exon Microarrays. In: Stańczyk, U., Jain, L. (eds) Feature Selection for Data and Pattern Recognition. Studies in Computational Intelligence, vol 584. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45620-0_14
Download citation
DOI: https://doi.org/10.1007/978-3-662-45620-0_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-45619-4
Online ISBN: 978-3-662-45620-0
eBook Packages: EngineeringEngineering (R0)