Multi-task Feature Selection Using the Multiple Inclusion Criterion (MIC)

Dhillon, Paramveer S.; Tomasik, Brian; Foster, Dean; Ungar, Lyle

doi:10.1007/978-3-642-04180-8_35

Paramveer S. Dhillon²²,
Brian Tomasik²³,
Dean Foster²⁴ &
…
Lyle Ungar²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5781))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

2630 Accesses

Abstract

We address the problem of joint feature selection in multiple related classification or regression tasks. When doing feature selection with multiple tasks, usually one can “borrow strength” across these tasks to get a more sensitive criterion for deciding which features to select. We propose a novel method, the Multiple Inclusion Criterion (MIC), which modifies stepwise feature selection to more easily select features that are helpful across multiple tasks. Our approach allows each feature to be added to none, some, or all of the tasks. MIC is most beneficial for selecting a small set of predictive features from a large pool of potential features, as is common in genomic and biological datasets. Experimental results on such datasets show that MIC usually outperforms other competing multi-task learning methods not only in terms of accuracy but also by building simpler and more interpretable models.

Download to read the full chapter text

Chapter PDF

Feature redundancy term variation for mutual information-based feature selection

Article 10 January 2020

McTwo: a two-step feature selection algorithm based on maximal information coefficient

Article Open access 23 March 2016

Feature selection techniques for machine learning: a survey of more than two decades of research

Article 01 December 2023

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Caruana, R.: Multitask learning. In: Machine Learning, pp. 41–75 (1997)
Google Scholar
Ando, R., Zhang, T.: A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data. The Journal of Machine Learning Research 6, 1817–1853 (2005)
MathSciNet MATH Google Scholar
Jacob, L., Bach, F., Vert, J.P.: Clustered multi-task learning: A convex formulation. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 21 (2009)
Google Scholar
Argyriou, A., Evgeniou, T., Pontil, M.: Convex multi-task feature learning. Mach. Learn. 73(3), 243–272 (2008)
Article Google Scholar
Ben-David, S., Borbely, R.S.: A notion of task relatedness yielding provable multiple-task learning guarantees. Mach. Learn. 73(3), 273–287 (2008)
Article Google Scholar
Lee, S.I., Chatalbashev, V., Vickrey, D., Koller, D.: Learning a meta-level prior for feature relevance from multiple related tasks. In: ICML 2007: Proceedings of the 24th international conference on Machine learning, pp. 489–496. ACM, New York (2007)
Google Scholar
Raina, R., Ng, A.Y., Koller, D.: Constructing informative priors using transfer learning. In: ICML 2006, pp. 713–720. ACM, New York (2006)
Google Scholar
Jebara, T.: Multi-task feature and kernel selection for SVMs. In: ICML 2004, ACM Press, New York (2004)
Google Scholar
Turlach, B., Venables, W., Wright, S.: Simultaneous variable selection. Technometrics 47(3), 349–363 (2005)
Article MathSciNet Google Scholar
Obozinski, G., Taskar, B., Jordan, M.I.: Joint covariate selection and joint subspace selection for multiple classification problems. Statistics and Computing (2009)
Google Scholar
Efron, B., Hastie, T., Johnstone, L., Tibshirani, R.: Least angle regression. Annals of Statistics 32, 407–499 (2004)
Article MathSciNet MATH Google Scholar
Natarajan, B.: Sparse approximate solutions to linear systems. SIAM journal on computing 24, 227 (1995)
Article MathSciNet MATH Google Scholar
Lin, D., Pitler, E., Foster, D.P., Ungar, L.H.: In defense of ℓ₀. In: Workhsop on Feature Selection at International Conference on Machine Learning, ICML 2008 (2008)
Google Scholar
Rissanen, J.: Hypothesis selection and testing by the mdl principle. The Computer Journal 42, 260–269 (1999)
Article MATH Google Scholar
George, E., Foster, D.: Calibration and empirical Bayes variable selection. Biometrika 87(4), 731–747 (2000)
Article MathSciNet MATH Google Scholar
Foster, D.P., George, E.I.: The risk inflation criterion for multiple regression. The Annals of Statistics 22(4), 1947–1975 (1994)
Article MathSciNet MATH Google Scholar
Zhou, J., Foster, D., Stine, R., Ungar, L.: Streamwise feature selection. The Journal of Machine Learning Research 7, 1861–1885 (2006)
MathSciNet MATH Google Scholar
Rissanen, J.: A universal prior for integers and estimation by minimum description length. Annals of Statistics 11(2), 416–431 (1983)
Article MathSciNet MATH Google Scholar
Elias, P.: Universal codeword sets and representations of the integers. IEEE Transactions on Information Theory 21(2), 194–203 (1975)
Article MathSciNet MATH Google Scholar
Friedman, J.: Fast Sparse Regression and Classification (2008)
Google Scholar
Litvin, O., Causton, H.C., Chen, B., Pe’er, D.: Special feature: Modularity and interactions in the genetics of gene expression. Proceedings of the National Academy of Sciences of the United States of America (February 2009) PMID: 19223586
Google Scholar
van’t Veer, L.J., Dai, H., van de Vijver, M.J., He, Y.D., Hart, A.A., Mao, M., Peterse, H.L., van der Kooy, K., Marton, M.J., Witteveen, A.T., Schreiber, G.J., Kerkhoven, R.M., Roberts, C., Linsley, P.S., Bernards, R., Friend, S.H.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415(6871), 530–536 (2002)
Article Google Scholar
Kao, W.C., Rakhlin, A.: Transfer learning toolkit (2007), http://multitask.cs.berkeley.edu

Download references

Author information

Authors and Affiliations

CIS Department, University of Pennsylvania, Philadelphia, PA, 19104, U.S.A.
Paramveer S. Dhillon & Lyle Ungar
Computer Science Department, Swarthmore College, PA, 19081, U.S.A.
Brian Tomasik
Statistics Department, University of Pennsylvania, Philadelphia, PA, 19104, U.S.A.
Dean Foster

Authors

Paramveer S. Dhillon
View author publications
You can also search for this author in PubMed Google Scholar
Brian Tomasik
View author publications
You can also search for this author in PubMed Google Scholar
Dean Foster
View author publications
You can also search for this author in PubMed Google Scholar
Lyle Ungar
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

NICTA, Locked Bag 8001, Canberra, 2601, Australia and Helsinki Institute of IT,, Finland
Wray Buntine
Dept. of Knowledge Technologies, Jožef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Marko Grobelnik & Dunja Mladenić &
University College London, The Centre for Computational Statistics and Machine Learning Department of Computer Science, Gower St., WC1E 6BT, London, UK
John Shawe-Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dhillon, P.S., Tomasik, B., Foster, D., Ungar, L. (2009). Multi-task Feature Selection Using the Multiple Inclusion Criterion (MIC). In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2009. Lecture Notes in Computer Science(), vol 5781. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04180-8_35

Download citation

DOI: https://doi.org/10.1007/978-3-642-04180-8_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-04179-2
Online ISBN: 978-3-642-04180-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-task Feature Selection Using the Multiple Inclusion Criterion (MIC)

Abstract

Chapter PDF

Similar content being viewed by others

Feature redundancy term variation for mutual information-based feature selection

McTwo: a two-step feature selection algorithm based on maximal information coefficient

Feature selection techniques for machine learning: a survey of more than two decades of research

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Multi-task Feature Selection Using the Multiple Inclusion Criterion (MIC)

Abstract

Chapter PDF

Similar content being viewed by others

Feature redundancy term variation for mutual information-based feature selection

McTwo: a two-step feature selection algorithm based on maximal information coefficient

Feature selection techniques for machine learning: a survey of more than two decades of research

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation