Skip to main content

Feature-Oriented Analysis of User Profile Completion Problem

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2017)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10193))

Included in the following conference series:

Abstract

We study the user profile completion and enrichment problem, where the goal is to estimate the unknown values of user profiles. We investigate how the type of the features (categorical or continuous) suggests the use of a specific approach for this task. In particular, in this context, we validate the hypothesis that a classification method like K-nearest neighbor search fits better for categorical features and matrix factorization methods such as Non-negative Matrix Factorization perform superior on continuous features. We study different variants of K-nearest neighbor search (with different metrics) and demonstrate how they perform in different settings. Moreover, we investigate the impact of shifting the variables on the quality of (non-negative) factorization and the prediction error. We validate our methods via extensive experiments on real-world datasets and, finally, based on the results and observations, we discuss a hybrid approach to accomplish this task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The unknown elements might be filled by a default value before computing the pairwise distances.

  2. 2.

    Matrix factorization methods often require \(\mathcal O(N^3)\) or \(\mathcal O(N^2 \log N)\) runtime for training (and then they need to do matrix multiplication for estimation), whereas the runtime of different variants of K-NN is \(\mathcal O(N)\) (more precisely \(\mathcal O(KN|\mathcal M|)\) for \(|\mathcal M|\) unknown elements).

  3. 3.

    In our experiments, we did not observe a significant improvement when applying K-NN variants on the NMF results, instead of the original dataset.

References

  1. Berman, A., Plemmons, R.J.: Nonnegative Matrices in the Mathematical Sciences. Academic Press, New York (1994)

    Book  MATH  Google Scholar 

  2. Candes, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Computat. Math. 9(6), 717–772 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  3. Chebotarev, P.: A class of graph-geodetic distances generalizing the shortest-path and the resistance distances. Discrete Appl. Math. 159(5), 295–302 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  4. Chehreghani, M.H.: K-nearest neighbor search and outlier detection via minimax distances. In: Proceedings of the 2016 SIAM International Conference on Data Mining, Miami, Florida, USA, 5–7 May 2016, pp. 405–413 (2016)

    Google Scholar 

  5. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  6. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math. 1, 269–271 (1959)

    Article  MathSciNet  MATH  Google Scholar 

  7. Ding, C.H.Q., Li, T., Jordan, M.I.: Convex and semi-nonnegative matrix factorizations. IEEE Trans. Pattern Anal. Mach. Intell. 32(1), 45–55 (2010)

    Article  Google Scholar 

  8. Fouss, F., Francoisse, K., Yen, L., Pirotte, A., Saerens, M.: An experimental investigation of kernels on graphs for collaborative recommendation and semisupervised classification. Neural Networks 31, 53–72 (2012)

    Article  MATH  Google Scholar 

  9. Fouss, F., Pirotte, A., Renders, J.-M., Saerens, M.: Random-walk computation of similarities between nodes of a graph with application to collaborative recommendation. IEEE Trans. Knowl. Data Eng. 19(3), 355–369 (2007)

    Article  Google Scholar 

  10. Freedman, D.A.: Statistical Models: Theory and Practice. Cambridge University Press, Cambridge (2009)

    Book  MATH  Google Scholar 

  11. Gabow, H.N., Galil, Z., Spencer, T., Tarjan, R.E.: Efficient algorithms for finding minimum spanning trees in undirected and directed graphs. Combinatorica 6(2), 109–122 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  12. Golub, G., Reinsch, C.E.R.: Singular value decomposition and least squares solutions. Numer. Math. 14(5), 403–420 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  13. Hettich, S., Bay, S.D.: The UCI KDD Archive, Irvine, CA. University of California, Department of Information and Computer Science (1999). http://kdd.ics.uci.edu

  14. Hogben, L.: Graph theoretic methods for matrix completion problems. Linear Algebra Appl. 328, 161–202 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  15. Hoyer, P.O.: Non-negative matrix factorization with sparseness constraints. J. Mach. Learn. Res. 5, 1457–1469 (2004)

    MathSciNet  MATH  Google Scholar 

  16. Hsieh, C.-J., Dhillon, I.S.: Fast coordinate descent methods with variable selection for non-negative matrix factorization. In: 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1064–1072 (2011)

    Google Scholar 

  17. Hu, T.C.: The maximum capacity route problem. Oper. Res. 9, 898–900 (1961)

    Article  Google Scholar 

  18. Johnson, C.R.: Matrix completion problems: a survey. Matrix Theory Appl. 40, 171–176 (1990)

    Article  MathSciNet  Google Scholar 

  19. Kim, K.-H., Choi, S.: Neighbor search with global geometry: a minimax message passing algorithm. In: ICML, pp. 401–408 (2007)

    Google Scholar 

  20. Kim, K.-H., Choi, S.: Walking on minimax paths for K-NN search. In: AAAI (2013)

    Google Scholar 

  21. Lee, D.D., Seung, H.S.: Learning the parts of objects by nonnegative matrix factorization. Nature 401, 788–791 (1999)

    Article  Google Scholar 

  22. Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems 13, pp. 556–562. MIT Press (2001)

    Google Scholar 

  23. Lichman, M.: UCI Machine Learning Repository, Irvine, CA. University of California, School of Information and Computer Science (2013). http://archive.ics.uci.edu/ml

  24. Meyer, C.D. (ed.): Matrix Analysis and Applied Linear Algebra. Society for Industrial and Applied Mathematics, Philadelphia (2000)

    Google Scholar 

  25. Moro, S., Cortez, P., Rita, P.: A data-driven approach to predict the success of bank telemarketing. Decis. Support Syst. 62, 22–31 (2014)

    Article  Google Scholar 

  26. Morse, A.S.: A gain matrix decomposition and some of its applications. Syst. Control Lett. 21, 1–10 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  27. Piziak, R., Odell, P.L.: Full rank factorization of matrices. Math. Mag. 72, 193–201 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  28. Prim, R.C.: Shortest connection networks and some generalizations. Bell Syst. Tech. J. 36(6), 1389–1401 (1957)

    Article  Google Scholar 

  29. Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319 (2000)

    Article  Google Scholar 

  30. Hongguo, X.: An SVD-like matrix decomposition and its applications. Linear Algebra Appl. 368, 1–24 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  31. Yen, L., Saerens, M., Mantrach, A., Shimbo, M.: A family of dissimilarity measures between nodes generalizing both the shortest-path and the commute-time distances. In: KDD, pp. 785–793 (2008)

    Google Scholar 

  32. Zhang, T., Fang, B., Liu, W., Tang, Y.Y., He, G., Wen, J.: Total variation norm-based nonnegative matrix factorization for identifying discriminant representation of image patterns. Neurocomputing 71(10–12), 1824–1831 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Morteza Haghir Chehreghani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Haghir Chehreghani, M. (2017). Feature-Oriented Analysis of User Profile Completion Problem. In: Jose, J., et al. Advances in Information Retrieval. ECIR 2017. Lecture Notes in Computer Science(), vol 10193. Springer, Cham. https://doi.org/10.1007/978-3-319-56608-5_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-56608-5_24

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-56607-8

  • Online ISBN: 978-3-319-56608-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics