Abstract
We study the user profile completion and enrichment problem, where the goal is to estimate the unknown values of user profiles. We investigate how the type of the features (categorical or continuous) suggests the use of a specific approach for this task. In particular, in this context, we validate the hypothesis that a classification method like K-nearest neighbor search fits better for categorical features and matrix factorization methods such as Non-negative Matrix Factorization perform superior on continuous features. We study different variants of K-nearest neighbor search (with different metrics) and demonstrate how they perform in different settings. Moreover, we investigate the impact of shifting the variables on the quality of (non-negative) factorization and the prediction error. We validate our methods via extensive experiments on real-world datasets and, finally, based on the results and observations, we discuss a hybrid approach to accomplish this task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The unknown elements might be filled by a default value before computing the pairwise distances.
- 2.
Matrix factorization methods often require \(\mathcal O(N^3)\) or \(\mathcal O(N^2 \log N)\) runtime for training (and then they need to do matrix multiplication for estimation), whereas the runtime of different variants of K-NN is \(\mathcal O(N)\) (more precisely \(\mathcal O(KN|\mathcal M|)\) for \(|\mathcal M|\) unknown elements).
- 3.
In our experiments, we did not observe a significant improvement when applying K-NN variants on the NMF results, instead of the original dataset.
References
Berman, A., Plemmons, R.J.: Nonnegative Matrices in the Mathematical Sciences. Academic Press, New York (1994)
Candes, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Computat. Math. 9(6), 717–772 (2009)
Chebotarev, P.: A class of graph-geodetic distances generalizing the shortest-path and the resistance distances. Discrete Appl. Math. 159(5), 295–302 (2011)
Chehreghani, M.H.: K-nearest neighbor search and outlier detection via minimax distances. In: Proceedings of the 2016 SIAM International Conference on Data Mining, Miami, Florida, USA, 5–7 May 2016, pp. 405–413 (2016)
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1, 269–271 (1959)
Ding, C.H.Q., Li, T., Jordan, M.I.: Convex and semi-nonnegative matrix factorizations. IEEE Trans. Pattern Anal. Mach. Intell. 32(1), 45–55 (2010)
Fouss, F., Francoisse, K., Yen, L., Pirotte, A., Saerens, M.: An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification. Neural Networks 31, 53–72 (2012)
Fouss, F., Pirotte, A., Renders, J.-M., Saerens, M.: Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Trans. Knowl. Data Eng. 19(3), 355–369 (2007)
Freedman, D.A.: Statistical Models: Theory and Practice. Cambridge University Press, Cambridge (2009)
Gabow, H.N., Galil, Z., Spencer, T., Tarjan, R.E.: Efficient algorithms for finding minimum spanning trees in undirected and directed graphs. Combinatorica 6(2), 109–122 (1986)
Golub, G., Reinsch, C.E.R.: Singular value decomposition and least squares solutions. Numer. Math. 14(5), 403–420 (1970)
Hettich, S., Bay, S.D.: The UCI KDD Archive, Irvine, CA. University of California, Department of Information and Computer Science (1999). http://kdd.ics.uci.edu
Hogben, L.: Graph theoretic methods for matrix completion problems. Linear Algebra Appl. 328, 161–202 (2001)
Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5, 1457–1469 (2004)
Hsieh, C.-J., Dhillon, I.S.: Fast coordinate descent methods with variable selection for non-negative matrix factorization. In: 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1064–1072 (2011)
Hu, T.C.: The maximum capacity route problem. Oper. Res. 9, 898–900 (1961)
Johnson, C.R.: Matrix completion problems: a survey. Matrix Theory Appl. 40, 171–176 (1990)
Kim, K.-H., Choi, S.: Neighbor search with global geometry: a minimax message passing algorithm. In: ICML, pp. 401–408 (2007)
Kim, K.-H., Choi, S.: Walking on minimax paths for K-NN search. In: AAAI (2013)
Lee, D.D., Seung, H.S.: Learning the parts of objects by nonnegative matrix factorization. Nature 401, 788–791 (1999)
Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems 13, pp. 556–562. MIT Press (2001)
Lichman, M.: UCI Machine Learning Repository, Irvine, CA. University of California, School of Information and Computer Science (2013). http://archive.ics.uci.edu/ml
Meyer, C.D. (ed.): Matrix Analysis and Applied Linear Algebra. Society for Industrial and Applied Mathematics, Philadelphia (2000)
Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)
Morse, A.S.: A gain matrix decomposition and some of its applications. Syst. Control Lett. 21, 1–10 (1993)
Piziak, R., Odell, P.L.: Full rank factorization of matrices. Math. Mag. 72, 193–201 (1999)
Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36(6), 1389–1401 (1957)
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319 (2000)
Hongguo, X.: An SVD-like matrix decomposition and its applications. Linear Algebra Appl. 368, 1–24 (2003)
Yen, L., Saerens, M., Mantrach, A., Shimbo, M.: A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances. In: KDD, pp. 785–793 (2008)
Zhang, T., Fang, B., Liu, W., Tang, Y.Y., He, G., Wen, J.: Total variation norm-based nonnegative matrix factorization for identifying discriminant representation of image patterns. Neurocomputing 71(10–12), 1824–1831 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Haghir Chehreghani, M. (2017). Feature-Oriented Analysis of User Profile Completion Problem. In: Jose, J., et al. Advances in Information Retrieval. ECIR 2017. Lecture Notes in Computer Science(), vol 10193. Springer, Cham. https://doi.org/10.1007/978-3-319-56608-5_24
Download citation
DOI: https://doi.org/10.1007/978-3-319-56608-5_24
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56607-8
Online ISBN: 978-3-319-56608-5
eBook Packages: Computer ScienceComputer Science (R0)