Generalized Feature Embedding for Supervised, Unsupervised, and Online Learning Tasks

Golinko, Eric; Zhu, Xingquan

doi:10.1007/s10796-018-9850-y

Generalized Feature Embedding for Supervised, Unsupervised, and Online Learning Tasks

Published: 16 April 2018

Volume 21, pages 125–142, (2019)
Cite this article

Information Systems Frontiers Aims and scope Submit manuscript

1389 Accesses
11 Citations
Explore all metrics

Abstract

Feature embedding is an emerging research area which intends to transform features from the original space into a new space to support effective learning. Many feature embedding algorithms exist, but they often suffer from several major drawbacks, including (1) only handle single feature types, or users have to clearly separate features into different feature views and supply such information for feature embedding learning; (2) designed for either supervised or unsupervised learning tasks, but not for both; and (3) feature embedding for new out-of-training samples have to be obtained through a retraining phase, therefore unsuitable for online learning tasks. In this paper, we propose a generalized feature embedding algorithm, GEL, for both supervised, unsupervised, and online learning tasks. GEL learns feature embedding from any type of data or data with mixed feature types. For supervised learning tasks with class label information, GEL leverages a Class Partitioned Instance Representation (CPIR) process to arrange instances, based on their labels, as a dense binary representation via row and feature vectors for feature embedding learning. If class labels are unavailable, CPIR is naturally degenerated and treats all instances as one class. Based on the CPIR representation, GEL uses eigenvector decomposition to convert the proximity matrix into a low-dimensional space. For new out-of-training samples, their low-dimensional representation are derived through a direct conversion without a retraining phase. The learned numerical embedding features can be directly used to represent instances for effective learning. Experiments and comparisons on 28 datasets, including categorical, numerical, and ordinal features, demonstrate that embedding features learned from GEL can effectively represent the original instances for clustering, classification, and online learning.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Locally Linear Embedding Discriminant Feature Learning Model

Nonlinear Manifold Classification Based on LLE

Efficient regularized spectral data embedding

Article 24 February 2020

References

Abdi, H., & Valentin, D. (2007). Multiple correspondence analysis. In Encyclopedia of measurement and statistics (pp. 651–657).
Alamuri, M., Surampudi, B.R., Negi, A. (2014). A survey of distance/similarity measures for categorical data. In 2014 International joint conference on neural networks (IJCNN) (pp. 1907–1914).
Aljarah, I. (2016). https://www.kaggle.com/aljarah/xapi-edu-data.
Argyriou, A., & Evgeniou, T. (2007). Multi-task feature learning. In Proceedings of neural information processing systems (NIPS).
Argyriou, A., Evgeniou, T., Pontil, M. (2008). Convex multi-task feature learning. Machine Learning, 73(3), 243–272.
Article Google Scholar
Axler, S.J. (1997). Linear algebra done right Vol. 2. Berlin: Springer.
Book Google Scholar
Babenko, A., & Lempitsky, V. (2015). Aggregating local deep features for image retrieval. In Proceedings of the IEEE international conference on computer vision (pp. 1269–1277).
Bates, D., & Eddelbuettel, D. (2013). Fast and elegant numerical linear algebra using the RcppEigen package. Journal of Statistical Software, 52(5), 1–24.
Article Google Scholar
Benoit, K., & Nulty, P. (2016). quanteda: quantitative analysis of textual data. R package version 0.9, 8.
Bro, R., & Smilde, A.K. (2014). Principal component analysis. Analytical Methods, 6(9), 2812–2831.
Article Google Scholar
Chen, C., Shyu, M.-L., Chen, S.-C. (2016). Weighted subspace modeling for semantic concept retrieval using gaussian mixture models. Information Systems Frontiers, 18(5), 877–889.
Article Google Scholar
Choi, S.-S., Cha, S.-H., Tappert, C.C. (2010). A survey of binary similarity and distance measures. Journal of Systemics, Cybernetics and Informatics, 8(1), 43–48.
Google Scholar
Crane, H. (2015). Clustering from categorical data sequences. Journal of the American Statistical Association, 110(510), 810–823.
Article Google Scholar
de Leeuw, J. (2011). Principal component analysis of binary data. applications to roll-call-analysis. Department of statistics, UCLA.
Ditzler, G., & Polikar, R. (2013). Incremental learning of concept drift from streaming imbalanced data. ieee transactions on knowledge and data engineering, 25(10), 2283–2301.
Article Google Scholar
Domingos, P. (2012). A few useful things to know about machine learning. Communications of the ACM, 55 (10), 78–87.
Article Google Scholar
Gal, Y., Chen, Y., Ghahramani, Z. (2015). Latent gaussian processes for distribution estimation of multivariate categorical data. In Proceedings of the 32nd international conference on machine learning (ICML-15) (pp. 645–654).
Gelbard, R. (2013). padding bitmaps to support similarity and mining. Information Systems Frontiers, 15(1), 99–110.
Article Google Scholar
Golinko, E., & Zhu, X. (2017). Gfel: Generalized feature embedding learning using weighted instance matching. In 2017 IEEE International conference on information reuse and integration (IRI) (pp. 235–244).
Greenacre, M. (2007). Correspondence analysis in practice. CRC press.
Greene, D. (2016). http://mlg.ucd.ie/datasets/bbc.html.
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of machine learning research 3:1157–1182.
Hou, C., Nie, F., Li, X., Yi, D., Wu, Y. (2014). Joint embedding learning and sparse regression: A framework for unsupervised feature selection. IEEE Transactions on Cybernetics, 44(6), 793–804.
Article Google Scholar
Hsu, C.-W., Chang, C.-C., Lin, C.-J., et al. (2003). A practical guide to support vector classification.
Hsu, C.-C., & Huang, W.-H. (2016). Integrated dimensionality reduction technique for mixed-type data involving categorical values. Applied Soft Computing, 43, 199–209.
Article Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of ACM multimedia conference.
Juan, A., & Vidal, E. (2004). Bernoulli mixture models for binary images. In Proceedings of the 17th international conference on Pattern recognition, 2004. ICPR 2004, (Vol. 3 pp. 367–370). IEEE.
Kaban, A., Bingham, E., Hirsimäki, T. (2004). Learning to read between the lines The aspect bernoulli model. In Proceedings of the 2004 SIAM international conference on data mining (pp. 462–466). SIAM.
Kaggle. (2017). https://www.kaggle.com.
Krijthe, J. (2015). Rtsne: T-distributed stochastic neighbor embedding using barnes-hut implementation. R package version 0.10, http://CRAN.R-project.org/package=Rtsne.
Lee, S. (2009). Principal components analysis for binary data. PhD thesis: Texas A&M University.
Google Scholar
Lee, S., Huang, J.Z., Hu, J. (2010). Sparse logistic principal components analysis for binary data. The annals of applied statistics, 4(3), 1579.
Article Google Scholar
Lichman, M. (2013). UCI machine learning repository.
van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-sne. Journal of Machine Learning Research, 9(Nov), 2579–2605.
Google Scholar
Malik, Z.K., Hussain, A., Wu, J. (2016). An online generalized eigenvalue version of laplacian eigenmaps for visual big data. Neurocomputing, 173, 127–136.
Article Google Scholar
Meyer, D., & Buchta, C. proxy: Distance and Similarity Measures, 2016. R package version 0.4-16.
Muhlbaier, M.D., & Polikar, R. (2007). An ensemble approach for incremental learning in nonstationary environments. In International workshop on multiple classifier systems (pp. 490–500). Berlin: Springer.
Müller, B., Reinhardt, J., Strickland, M.T. (2012). Neural networks: an introduction. Berlin: Springer Science & Business Media.
Google Scholar
Najafi, A., Motahari, A., Rabiee, H.R. (2017). Reliable learning of bernoulli mixture models. arXiv:1710.02101.
Nenadic, O., & Greenacre, M. (2007). Correspondence analysis in r, with two-and three-dimensional graphics The ca package. Journal of Statistical Software.
Pan, S., Wu, J.W., Zhu, X., Zhang, C., Wang, Y. (2016). Tri-party deep network representation. In Proc. of international joint conference on artificial intelligence.
Plaza, A., Benediktsson, J.A., Boardman, J.W., Brazile, J., Bruzzone, L., Camps-Valls, G., Chanussot, J., Fauvel, M., Gamba, P., Gualtieri, A., et al. (2009). Recent advances in techniques for hyperspectral image processing. Remote sensing of environment, 113, S110–S122.
Article Google Scholar
Qian, Y., Li, F., Liang, J., Liu, B., Dang, C. (2016). Space structure and clustering of categorical data. IEEE transactions on neural networks and learning systems, 27(10), 2047–2059.
Article Google Scholar
Ramos, J. (2003). Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning.
Rokach, L., & Maimon, O. (2005). Decision trees. Data mining and knowledge discovery handbook, pp. 165–192.
Rokach, L., & Maimon, O. (2014). Data mining with decision trees: theory and applications. Singapore: World scientific.
Book Google Scholar
Romero, C., Ventura, S., Espejo, P.G., Hervás, C. (2008). Data mining algorithms to classify students. In Educational data mining 2008.
Roweis, S., & Saul, L. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290 (5500), 2323–2326.
Article Google Scholar
Shen, L., Wang, H., Xu, L.D., Ma, X., Chaudhry, S., He, W. (2016). Identity management based on pca and svm. Information Systems Frontiers, 18(4), 711–716.
Article Google Scholar
Shlens, J. (2014). A tutorial on principal component analysis. arXiv:1404.1100.
Shmelkov, K., Schmid, C., Alahari, K. (2017). Incremental learning of object detectors without catastrophic forgetting. arXiv:1708.06977.
Stanford. (2009). https://nlp.stanford.edu/ir-book/html/htmledition/evaluation-of-clustering-1.html.
Strange, H., & Zwiggelaar, R. (2011). A generalised solution to the out-of-sample extension problem in manifold learning. In AAAI (pp. 293–296).
Sun, B.-Y., Zhang, X.-M., Li, J., Mao, X.-M. (2010). Feature fusion using locally linear embedding for classification. IEEE Transactions on Neural Networks, 21(1), 163–168.
Article Google Scholar
Tsymbal, A., Puuronen, S., Pechenizkiy, M., Baumgarten, M., Patterson, D.W. (2002). Eigenvector-based feature extraction for classification. In FLAIRS Conference (pp. 354–358).
Venables, W.N., & Ripley, B.D. (2002). Modern applied statistics with S, 4th edn. New York: Springer. ISBN 0-387-95457-0.
Book Google Scholar
Vural, E., & Guillemot, C. (2016). Out-of-sample generalizations for supervised manifold learning for classification. IEEE Transactions on Image Processing, 25(3), 1410–1424.
Article Google Scholar
Xie, J., Szymanski, B.K., Zaki, M.J. (2010). Learning dissimilarities for categorical symbols. FSDM, 10, 97–106.
Google Scholar
Zhang, D., Yin, J., Zhu, X., Zhang, C. (2017). User profile preserving social network embedding. In Proc. of international joint conference on artificial intelligence.
Zhang, H. (2004). The optimality of naive bayes. AA, 1(2), 3.
Google Scholar
Zhang, L., Zhang, Q., Zhang, L., Tao, D., Huang, X., Du, B. (2015). Ensemble manifold regularized sparse low-rank approximation for multiview feature embedding. Pattern Recognition, 48(10).
Zhang, P., Zhu, X., Shi, Y. (2008). Categorizing and mining concept drifting data streams. In ACM SIGKDD Conference (pp. 812–820).
Zheng, L., Wang, S., Tian, Q. (2014). Coupled binary embedding for large-scale image retrieval. IEEE Transactions on Image processing, 23(8), 3368–3380.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer, Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, 33431, USA
Eric Golinko & Xingquan Zhu

Authors

Eric Golinko
View author publications
You can also search for this author in PubMed Google Scholar
Xingquan Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eric Golinko.

Appendices

Appendix

Toy example showing detailed GEL feature embedding learning process

$$ X = \left[\begin{array}{llll} car & boy & 18-34 & C^{1}\\ truck & boy & 35-64 & C^{1}\\ car & girl & 18-34 & C^{1}\\ truck & boy & 35-64 & C^{2}\\ car & girl & 65+ & C^{2} \end{array}\right] $$

(15)

Converting to binary

$$ W = \left[\begin{array}{llll} 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 \end{array}\right] $$

(16)

Then we compute the weighted proximity matrix

$$ A= \left[\begin{array}{lllll} 0.6 & 0.6 & 0.6 & 0 & 0 \\ 0.6 & 0.6 & 0.6 & 0 & 0 \\ 0.6 & 0.6 & 0.6 & 0 & 0 \\ 0 & 0 & 0 & 0.4 & 0.4 \\ 0 & 0 & 0 & 0.4 & 0.4 \end{array}\right] $$

(17)

Then

$$R^{C^{1}} = \left[\begin{array}{ll} 0.75 & 0.25 \\ 0.5 & 0.5 \\ 0.5 & 0.5 \end{array}\right] , R^{C^{2}} = \left[\begin{array}{ll} 0.5 & 0.5 \\ 0.75 & 0.25 \end{array}\right] $$

And,

$$F^{C^{1}} = \left[\begin{array}{ll} 0.67 & 0.33 \\ 0.67 & 0.33 \\ 0.33 & 0.67 \\ 0.67 & 0.33 \end{array}\right] , F^{C^{2}} = \left[\begin{array}{ll} 0.5 & 0.5 \\ 0.5 & 0.5 \\ 1.0 & 0.0 \\ 0.5 & 0.5 \end{array}\right] , F^{C^{1},C^{2}} = \left[\begin{array}{ll} 0.6 & 0.4 \\ 0.6 & 0.4 \\ 0.6 & 0.4 \\ 0.6 & 0.4 \end{array}\right] $$

Then

$$Q^{C^{1}} = \left[\begin{array}{lll} 1.00 & 0.86 & 0.86 \\ 1.00 & 0.86 & 0.86 \\ 0.71 & 0.86 & 0.86 \\ 1.00 & 0.86 & 0.86 \end{array}\right] , Q^{C^{2}} = \left[\begin{array}{ll} 0.67 & 0.67 \\ 0.67 & 0.67 \\ 0.67 & 1.00 \\ 0.67 & 0.67 \end{array}\right] $$

$$, Q^{C^{1},C^{2}} = \left[\begin{array}{ll} 0.91 & 1.00 \\ 0.91 & 1.00 \\ 0.91 & 1.00 \\ 0.91 & 1.00 \end{array}\right] , Q^{C^{2},C^{1}} = \left[\begin{array}{lll} 1 & 0.91 & 0.91 \\ 1 & 0.91 & 0.91 \\ 1 & 0.91 & 0.91 \\ 1 & 0.91 & 0.91 \end{array}\right] $$

Then

$$ Q= \left[\begin{array}{lllll} 1.00 & 0.86 & 0.86 & 0.91 & 1.00 \\ 1.00 & 0.86 & 0.86 & 0.91 & 1.00 \\ 0.71 & 0.86 & 0.86 & 0.91 & 1.00 \\ 1.00 & 0.86 & 0.86 & 0.91 & 1.00 \\ 1.00 & 0.91 & 0.91 & 0.67 & 0.67 \\ 1.00 & 0.91 & 0.91 & 0.67 & 0.67 \\ 1.00 & 0.91 & 0.91 & 0.67 & 1.00 \\ 1.00 & 0.91 & 0.91 & 0.67 & 0.67 \end{array}\right] $$

(18)

Finally, we utilize single value decomposition the find the first two eigenvectors V² of,

$$ S = Q^{T}QAW $$

(19)

And we reduce dimension only on the first two components

$$ \mathcal{F} = WV^{2} $$

(20)

and

$$\mathcal{F} = \left[\begin{array}{llll} 0 & 0 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ 0 & 1 & 1 & 0 \\ 1 & 0 & 0 & 1 \\ 0 & 1 & 0 & 0 \end{array}\right] \left[\begin{array}{ll} -0.39 & 0.40 \\ -0.39 & 0.48 \\ -0.74 & -0.67 \\ -0.39 & 0.40 \end{array}\right] =\left[\begin{array}{ll} -0.74 & -0.67 \\ -0.77 & 0.80 \\ -1.13 & -0.19 \\ -0.77 & 0.80 \\ -0.39 & 0.48 \end{array}\right] $$

Link to code

We hereby release the source code for public examination and validation: https://github.com/egolinko/GEL

Rights and permissions

Reprints and permissions

About this article

Cite this article

Golinko, E., Zhu, X. Generalized Feature Embedding for Supervised, Unsupervised, and Online Learning Tasks. Inf Syst Front 21, 125–142 (2019). https://doi.org/10.1007/s10796-018-9850-y

Download citation

Published: 16 April 2018
Issue Date: 15 February 2019
DOI: https://doi.org/10.1007/s10796-018-9850-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generalized Feature Embedding for Supervised, Unsupervised, and Online Learning Tasks

Abstract

Access this article

Similar content being viewed by others

Locally Linear Embedding Discriminant Feature Learning Model

Nonlinear Manifold Classification Based on LLE

Efficient regularized spectral data embedding

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix

Toy example showing detailed GEL feature embedding learning process

Link to code

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Generalized Feature Embedding for Supervised, Unsupervised, and Online Learning Tasks

Abstract

Access this article

Similar content being viewed by others

Locally Linear Embedding Discriminant Feature Learning Model

Nonlinear Manifold Classification Based on LLE

Efficient regularized spectral data embedding

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix

Toy example showing detailed GEL feature embedding learning process

Link to code

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation