Transformation of Categorical Features into Real Using Low-Rank Approximations

Fonarev, Alexander

doi:10.1007/978-3-319-25485-2_7

Alexander Fonarev^14,15

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 505))

Included in the following conference series:

Russian Summer School in Information Retrieval

1993 Accesses

Abstract

Most of existing machine learning techniques can handle objects described by real but not categorical features. In this paper we introduce a simple unsupervised method for transforming categorical feature values into real ones. It is based on low-rank approximations of collaborative feature value frequencies. Once object descriptions are transformed, any common real-value machine learning technique can be applied for further data analysis. For example, it becomes possible to apply classic and powerful Random Forest predictor in supervised learning problems. Our experiments show that a combination of the proposed features transformation method with common real-value supervised algorithms leads to the results that are comparable to the state-of-the-art approaches like Factorization Machines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

References

Bader, B.W., Kolda, T.G.: Efficient MATLAB computations with sparse and factored tensors. SIAM J. Sci. Comput. 30(1), 205–231 (2007)
Article MathSciNet MATH Google Scholar
Bengio, Y., Schwenk, H., Senécal, J.S., Morin, F., Gauvain, J.L.: Neural probabilistic language models. In: Holmes, D.E., Jain, L.C. (eds.) Innovations in Machine Learning. SFSC, vol. 194, pp. 137–186. Springer, Heidelberg (2006)
Chapter Google Scholar
Breiman, L.: Mach. Learn. Random forests 45(1), 5–32 (2001)
Google Scholar
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press, Boca Raton (1984)
MATH Google Scholar
Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.I.: Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. Wiley, United Kingdom (2009)
Google Scholar
D’yakonov, A.G.: Solution methods for classification problems with categorical attributes. Comput. Math. Model. 46, 1–21 (2015)
Google Scholar
Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence. Neural Comput. 23(9), 2421–2456 (2011)
Article MathSciNet MATH Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer, Heidelberg (2001)
MATH Google Scholar
Hardy, M.A.: Regression with dummy variables. No. 93, Sage (1993)
Google Scholar
Jurafsky, D., James, H.: Speech and language processing an introduction to natural language processing, computational linguistics, and speech (2000)
Google Scholar
Koren, Y.: Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 426–434. ACM (2008)
Google Scholar
Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 50–60 (1947)
Article MathSciNet MATH Google Scholar
Omlin, C.W., Giles, C.L.: Stable encoding of large finite-state automata in recurrent neural networks with sigmoid discriminants. Neural Comput. 8(4), 675–696 (1996)
Article Google Scholar
Rendle, S.: Factorization machines. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 995–1000. IEEE (2010)
Google Scholar
Rendle, S.: Factorization machines with libfm. ACM Trans. Intell. Syst. Technol. (TIST) 3(3), 57 (2012)
Google Scholar
Rendle, S., Freudenthaler, C., Schmidt-Thieme, L.: Factorizing personalized markov chains for next-basket recommendation. In: Proceedings of the 19th International Conference on World Wide Web, pp. 811–820. ACM (2010)
Google Scholar
Rendle, S., Schmidt-Thieme, L.: Pairwise interaction tensor factorization for personalized tag recommendation. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 81–90. ACM (2010)
Google Scholar
Ricci, F., Rokach, L., Shapira, B.: Introduction to recommender systems handbook. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook, pp. 1–35. Springer, Heidelberg (2011)
Chapter MATH Google Scholar

Download references

Author information

Authors and Affiliations

Skolkovo Institute of Science and Technology, Skolkovo, Russia
Alexander Fonarev
Yandex, Moscow, Russia
Alexander Fonarev

Authors

Alexander Fonarev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Fonarev .

Editor information

Editors and Affiliations

Ural Federal University, Yekaterinburg, Russia
Pavel Braslavski
National Research University Higher School of Economics, Nizhniy Novgorod, Russia
Nikolay Karpov
Intelligent Systems Laboratory, University of Amsterdam, Amsterdam, The Netherlands
Marcel Worring
Barcelona Media Research Foundation, Barcelona, Spain
Yana Volkovich
National Research University Higher School of Economics, Moscow, Russia
Dmitry I. Ignatov

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Fonarev, A. (2015). Transformation of Categorical Features into Real Using Low-Rank Approximations. In: Braslavski, P., Karpov, N., Worring, M., Volkovich, Y., Ignatov, D.I. (eds) Information Retrieval. RuSSIR 2014. Communications in Computer and Information Science, vol 505. Springer, Cham. https://doi.org/10.1007/978-3-319-25485-2_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-25485-2_7
Published: 10 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25484-5
Online ISBN: 978-3-319-25485-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics