Abstract
Most of existing machine learning techniques can handle objects described by real but not categorical features. In this paper we introduce a simple unsupervised method for transforming categorical feature values into real ones. It is based on low-rank approximations of collaborative feature value frequencies. Once object descriptions are transformed, any common real-value machine learning technique can be applied for further data analysis. For example, it becomes possible to apply classic and powerful Random Forest predictor in supervised learning problems. Our experiments show that a combination of the proposed features transformation method with common real-value supervised algorithms leads to the results that are comparable to the state-of-the-art approaches like Factorization Machines.
References
Bader, B.W., Kolda, T.G.: Efficient MATLAB computations with sparse and factored tensors. SIAM J. Sci. Comput. 30(1), 205–231 (2007)
Bengio, Y., Schwenk, H., Senécal, J.S., Morin, F., Gauvain, J.L.: Neural probabilistic language models. In: Holmes, D.E., Jain, L.C. (eds.) Innovations in Machine Learning. SFSC, vol. 194, pp. 137–186. Springer, Heidelberg (2006)
Breiman, L.: Mach. Learn. Random forests 45(1), 5–32 (2001)
Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A.: Classification and Regression Trees. CRC Press, Boca Raton (1984)
Cichocki, A., Zdunek, R., Phan, A.H., Amari, S.I.: Nonnegative matrix and tensor factorizations: applications to exploratory multi-way data analysis and blind source separation. Wiley, United Kingdom (2009)
D’yakonov, A.G.: Solution methods for classification problems with categorical attributes. Comput. Math. Model. 46, 1–21 (2015)
Févotte, C., Idier, J.: Algorithms for nonnegative matrix factorization with the \(\beta \)-divergence. Neural Comput. 23(9), 2421–2456 (2011)
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning. Springer, Heidelberg (2001)
Hardy, M.A.: Regression with dummy variables. No. 93, Sage (1993)
Jurafsky, D., James, H.: Speech and language processing an introduction to natural language processing, computational linguistics, and speech (2000)
Koren, Y.: Factorization meets the neighborhood: a multifaceted collaborative filtering model. In: Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 426–434. ACM (2008)
Mann, H.B., Whitney, D.R.: On a test of whether one of two random variables is stochastically larger than the other. Ann. Math. Stat. 18, 50–60 (1947)
Omlin, C.W., Giles, C.L.: Stable encoding of large finite-state automata in recurrent neural networks with sigmoid discriminants. Neural Comput. 8(4), 675–696 (1996)
Rendle, S.: Factorization machines. In: 2010 IEEE 10th International Conference on Data Mining (ICDM), pp. 995–1000. IEEE (2010)
Rendle, S.: Factorization machines with libfm. ACM Trans. Intell. Syst. Technol. (TIST) 3(3), 57 (2012)
Rendle, S., Freudenthaler, C., Schmidt-Thieme, L.: Factorizing personalized markov chains for next-basket recommendation. In: Proceedings of the 19th International Conference on World Wide Web, pp. 811–820. ACM (2010)
Rendle, S., Schmidt-Thieme, L.: Pairwise interaction tensor factorization for personalized tag recommendation. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 81–90. ACM (2010)
Ricci, F., Rokach, L., Shapira, B.: Introduction to recommender systems handbook. In: Ricci, F., Rokach, L., Shapira, B., Kantor, P.B. (eds.) Recommender Systems Handbook, pp. 1–35. Springer, Heidelberg (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Fonarev, A. (2015). Transformation of Categorical Features into Real Using Low-Rank Approximations. In: Braslavski, P., Karpov, N., Worring, M., Volkovich, Y., Ignatov, D.I. (eds) Information Retrieval. RuSSIR 2014. Communications in Computer and Information Science, vol 505. Springer, Cham. https://doi.org/10.1007/978-3-319-25485-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-25485-2_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25484-5
Online ISBN: 978-3-319-25485-2
eBook Packages: Computer ScienceComputer Science (R0)