Feature Selection for Transfer Learning

  • Selen Uguroglu
  • Jaime Carbonell
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6913)


Common assumption in most machine learning algorithms is that, labeled (source) data and unlabeled (target) data are sampled from the same distribution. However, many real world tasks violate this assumption: in temporal domains, feature distributions may vary over time, clinical studies may have sampling bias, or sometimes sufficient labeled data for the domain of interest does not exist, and labeled data from a related domain must be utilized. In such settings, knowing in which dimensions source and target data vary is extremely important to reduce the distance between domains and accurately transfer knowledge. In this paper, we present a novel method to identify variant and invariant features between two datasets. Our contribution is two fold: First, we present a novel transfer learning approach for domain adaptation, and second, we formalize the problem of finding differently distributed features as a convex optimization problem. Experimental studies on synthetic and benchmark real world datasets show that our approach outperform other transfer learning approaches, and it aids the prediction accuracy significantly.


Feature Selection Synthetic Dataset Target Domain Domain Adaptation Invariant Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bach, F.R., Jordan, M.: Kernel Independent Component Analysis. Journal of Machine Learning Research 3, 1–48 (2002)zbMATHMathSciNetGoogle Scholar
  2. 2.
    Blitzer J., McDonald R., Pereira F.: Domain Adaptation with Structural Correspondence Learning. In: EMNLP 2006: 2006 Conference on Empirical Methods in Natural Language Processing, pp. 120–128 (2006)Google Scholar
  3. 3.
    Borgwardt, K., Gretton, A., Rasch, M., Kriegel, H.P., Schölkopf, B., Smola, A.J.: Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 22(14), 49–57 (2006)CrossRefGoogle Scholar
  4. 4.
    Grant, M., Boyd, S.: Graph implementations for nonsmooth convex programs. In: Blondel, V., Boyd, S., Kimura, H. (eds.) Recent Advances in Learning and Control (a tribute to M. Vidyasagar). LNCIS, pp. 95–110. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Grant, M., Boyd, S.: CVX: Matlab software for disciplined convex programming, version 1.21 (April 2011),
  6. 6.
    Gretton, A., Borgwardt, K.M., Rasch, M., Schölkopf, B., Smola, A.: A Kernel Method for the Two-Sample-Problem. In: Advances in Neural Information Processing Systems. Proceedings of the 2006 Conference, vol. 19, pp. 513–520. MIT Press, Cambridge (2007)Google Scholar
  7. 7.
    Huang, J., Smola, A.J., Gretton, A., Borgwardt, K.M., Schölkopf, B.: Correcting sample selection bias by unlabeled data. Advances in Neural Information Processing Systems 19, 601–608 (2007)Google Scholar
  8. 8.
    Margolis, A.: Literature Review of Domain Adaptation with Unlabeled DataGoogle Scholar
  9. 9.
    Pan, S.J., Kwok, J.T., Yang, Q.: Transfer Learning via Dimensionality Reduction. In: AAAI, pp. 677–682 (2008)Google Scholar
  10. 10.
    Pan, S.J., Tsang, I.W., Kwok J.T., Yang, Q.: Domain Adaptation via Transfer Component Analysis. In: Proceedings of IJCAI 2009, pp. 1187–1192 (2009)Google Scholar
  11. 11.
    Pan, S.J., Yang, Q.: IEEE Transactions on Knowledge and Data Engineering 22(10), 1345–1359 (2010)CrossRefGoogle Scholar
  12. 12.
    Remus, S., Tomasi, C.: Semi-Supervised Fisher Linear Discriminant (SFLD). In: ICASSP, pp. 1862–1865 (2010)Google Scholar
  13. 13.
    Sugiyama, M., Nakajima, S., Kashima, H., Bnau, P.V., Kawanabe, M.: Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation. In: NIPS (2007)Google Scholar
  14. 14.
    Vandenberghe, L., Boyd, S.: Semidefinite Programming. SIAM Review 38(1), 49–95 (1996)CrossRefzbMATHMathSciNetGoogle Scholar
  15. 15.
    Xu, Z., Jin, R., Lyu, M.R., King, I.: Discriminative Semi-Supervised Feature Selection via Manifold Regularization. In: IJCAI, pp. 1303–1308 (2009)Google Scholar
  16. 16.
    Yang, Q., Pan, S.J., Zheng, V.W.: Estimating Location Using Wi-Fi. IEEE Intelligent Systems 23(1), 8–13 (2008)CrossRefGoogle Scholar
  17. 17.
    Zeng, H., Cheung, Y.: Feature Selection for Local Learning Based Clustering. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS, vol. 5476, pp. 414–425. Springer, Heidelberg (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Selen Uguroglu
    • 1
  • Jaime Carbonell
    • 1
  1. 1.Language Technologies InstituteCarnegie Mellon UniversityUSA

Personalised recommendations