Semi-supervised Online Kernel Semantic Embedding for Multi-label Annotation

  • Jorge A. Vanegas
  • Hugo Jair Escalante
  • Fabio A. González
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10657)


This paper presents a multi-label annotation method that uses a semantic embedding strategy based on kernel matrix factorization. The proposed method called Semi-supervised Online Kernel Semantic Embedding (SS-OKSE) learns to predict the labels of a document by building a semantic representation of the document features that takes into account the labels, when available. A remarkable characteristic of the algorithm is that it is based on a kernel formulation that allows to model non-linear relationships. The SS-OKSE method was evaluated under a semi-supervised learning setup for a multi-label annotation task, over two text document datasets and was compared against several supervised and semi-supervised methods. Experimental results show that SS-OKSE exhibits a significant improvement, showing that a better modeling can be achieved with an adequate selection/construction of a kernel input representation.


Semantic representation Semi-supervised learning Learning on a budget Multi-label annotation 



Jorge A. Vanegas thanks for doctoral grant supports Colciencias 617/2013.


  1. 1.
    Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: Advances in Neural Information Processing Systems, pp. 577–584 (2003)Google Scholar
  2. 2.
    Beltrán, V., Vanegas, J.A., González, F.A.: Semi-supervised dimensionality reduction via multimodal matrix factorization. In: Pardo, A., Kittler, J. (eds.) Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications. LNCS, vol. 9423, pp. 676–682. Springer, Cham (2015). CrossRefGoogle Scholar
  3. 3.
    Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)MathSciNetMATHGoogle Scholar
  4. 4.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3(Jan), 993–1022 (2003)MATHGoogle Scholar
  5. 5.
    Chollet, F.: Keras (2015).
  6. 6.
    Chollet, F., et al.: Keras. GitHub (2015).
  7. 7.
    Lee, H., Yoo, J., Choi, S.: Semi-supervised nonnegative matrix factorization. IEEE Sig. Process. Lett. 17(1), 4–7 (2010)CrossRefGoogle Scholar
  8. 8.
    Mcauliffe, J.D., Blei, D.M.: Supervised topic models. In: Advances in Neural Information Processing Systems, pp. 121–128 (2008)Google Scholar
  9. 9.
    Soleimani, H., Miller, D.J.: Semi-supervised multi-label topic models for document classification and sentence labeling. In: CIKM 2016, pp. 105–114. ACM (2016)Google Scholar
  10. 10.
    Theano Development Team. Theano: a Python framework for fast computation of mathematical expressions. arXiv e-prints, arXiv:1605.02688, May 2016
  11. 11.
    Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer, Boston (2009). CrossRefGoogle Scholar
  12. 12.
    Zhu, J., Ahmed, A., Xing, E.P.: Medlda maximum margin supervised topic models. J. Mach. Learn. Res. 13(Aug), 2237–2278 (2012)MathSciNetMATHGoogle Scholar
  13. 13.
    Zubiaga, A., García-Plaza, A.P., Fresno, V., Martínez, R.: Content-based clustering for tag cloud visualization. In: Social Network Analysis and Mining, ASONAM 2009, pp. 316–319. IEEE (2009)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.MindLab Research GroupUniversidad Nacional de ColombiaBogotáColombia
  2. 2.Instituto Nacional de Astrofísica, Optica y Electrónica (INAOE)PueblaMexico

Personalised recommendations