Advertisement

A Riemannian gossip approach to subspace learning on Grassmann manifold

  • Bamdev MishraEmail author
  • Hiroyuki Kasai
  • Pratik Jawanpuria
  • Atul Saroop
Article
  • 99 Downloads
Part of the following topical collections:
  1. Special Issue of the ACML 2018 Journal Track
  2. Special Issue of the ACML 2018 Journal Track
  3. Special Issue of the ACML 2018 Journal Track
  4. Special Issue of the ACML 2018 Journal Track

Abstract

In this paper, we focus on subspace learning problems on the Grassmann manifold. Interesting applications in this setting include low-rank matrix completion and low-dimensional multivariate regression, among others. Motivated by privacy concerns, we aim to solve such problems in a decentralized setting where multiple agents have access to (and solve) only a part of the whole optimization problem. The agents communicate with each other to arrive at a consensus, i.e., agree on a common quantity, via the gossip protocol. We propose a novel cost function for subspace learning on the Grassmann manifold, which is a weighted sum of several sub-problems (each solved by an agent) and the communication cost among the agents. The cost function has a finite-sum structure. In the proposed modeling approach, different agents learn individual local subspaces but they achieve asymptotic consensus on the global learned subspace. The approach is scalable and parallelizable. Numerical experiments show the efficacy of the proposed decentralized algorithms on various matrix completion and multivariate regression benchmarks.

Keywords

Non-linear gossip Stochastic gradients Manifold optimization Matrix completion Multivariate regression 

Notes

Acknowledgements

We thank the editor and two anonymous reviewers for carefully checking the paper and providing a number of helpful remarks. Most of the work was done when Bamdev Mishra and Pratik Jawanpuria were with Amazon.com.

References

  1. Abernethy, J., Bach, F., Evgeniou, T., & Vert, J.-P. (2009). A new approach to collaborative filtering: Operator estimation with spectral regularization. Journal of Machine Learning Research, 10, 803–826.zbMATHGoogle Scholar
  2. Absil, P.-A., Mahony, R., & Sepulchre, R. (2008). Optimization algorithms on matrix manifolds. Princeton, NJ: Princeton University Press.zbMATHGoogle Scholar
  3. Álvarez, M. A., Rosasco, L., & Lawrence, N. D. (2012). Kernels for vector-valued functions: A review. Foundations and Trends in Machine Learning, 4, 195–266.zbMATHGoogle Scholar
  4. Amit, Y., Fink, M., Srebro, N., & Ullman, S. (2007). Uncovering shared structures in multiclass classification. In Proceedings of the 24th international conference on machine learning (pp. 17–24).Google Scholar
  5. Ando, R. K., & Zhang, T. (2005). A framework for learning predictive structures from multiple tasks and unlabeled data. Journal of Machine Learning Research, 6(May), 1817–1853.MathSciNetzbMATHGoogle Scholar
  6. Argyriou, A., Evgeniou, T., & Pontil, M. (2008). Convex multi-task feature learning. Machine Learning, 73(3), 243–272.Google Scholar
  7. Balzano, L., Nowak, R., & Recht, B. (2010). Online identification and tracking of subspaces from highly incomplete information. In The 48th annual Allerton conference on communication, control, and computing (Allerton) (pp. 704–711).Google Scholar
  8. Baxter, J. J. (1997). A Bayesian/information theoretic model of learning to learn via multiple task sampling. Machine Learning, 28, 7–39.zbMATHGoogle Scholar
  9. Baxter, J. J. (2000). A model of inductive bias learning. Journal of Artificial Intelligence Research, 12, 149–198.MathSciNetzbMATHGoogle Scholar
  10. Bishop, C . M. (2006). Pattern recognition and machine learning. Berlin: Springer.zbMATHGoogle Scholar
  11. Blot, M., Picard, D., Cord, M., & Thome, N. (2016). Gossip training for deep learning. Technical report, arXiv preprint arXiv:1611.04581.
  12. Bonnabel, S. (2013). Stochastic gradient descent on Riemannian manifolds. IEEE Transactions on Automatic Control, 58(9), 2217–2229.MathSciNetzbMATHGoogle Scholar
  13. Boumal, N., & Absil, P. -A. (2011). RTRMC: A Riemannian trust-region method for low-rank matrix completion. In Advances in neural information processing systems 24 (NIPS) (pp. 406–414).Google Scholar
  14. Boumal, N., & Absil, P.-A. (2015). Low-rank matrix completion via preconditioned optimization on the Grassmann manifold. Linear Algebra and its Applications, 475, 200–239.MathSciNetzbMATHGoogle Scholar
  15. Boumal, N., Mishra, B., Absil, P.-A., & Sepulchre, R. (2014). Manopt: A Matlab toolbox for optimization on manifolds. Journal of Machine Learning Research, 15, 1455–1459.zbMATHGoogle Scholar
  16. Boyd, S., Ghosh, A., Prabhakar, B., & Shah, D. (2006). Randomized gossip algorithms. IEEE Transaction on Information Theory, 52(6), 2508–2530.MathSciNetzbMATHGoogle Scholar
  17. Cai, J. F., Candès, E. J., & Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4), 1956–1982.MathSciNetzbMATHGoogle Scholar
  18. Candès, E. J., & Recht, B. (2009). Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6), 717–772.MathSciNetzbMATHGoogle Scholar
  19. Caruana, R. (1997). Multitask learning. Machine Learning, 28(1), 41–75.MathSciNetGoogle Scholar
  20. Cetingul, H. E., & Vidal, R. (2009). Intrinsic mean shift for clustering on Stiefel and Grassmann manifolds. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  21. Chen, J., Zhou, J., & Jieping, Y. (2011). Integrating low-rank and group-sparse structures for robust multi-task learning. In ACM SIGKDD international conference on knowledge discovery and data mining (pp. 42–50).Google Scholar
  22. Colin, I., Bellet, A., Salmon, J., & Clémençon, S. (2016). Gossip dual averaging for decentralized optimization of pairwise functions. In International conference on machine learning (ICML) (pp. 1388–1396).Google Scholar
  23. Dai, W., Kerman, E., & Milenkovic, O. (2012). A geometric approach to low-rank matrix completion. IEEE Transactions on Information Theory, 58(1), 237–247.MathSciNetzbMATHGoogle Scholar
  24. Dai, W., Milenkovic, O., & Kerman, E. (2011). Subspace evolution and transfer (SET) for low-rank matrix completion. IEEE Transactions on Signal Processing, 59(7), 3120–3132.MathSciNetzbMATHGoogle Scholar
  25. Edelman, A., Arias, T. A., & Smith, S. T. (1998). The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20(2), 303–353.MathSciNetzbMATHGoogle Scholar
  26. Evgeniou, T., Pontil, M. (2004). Regularized multi-task learning. In ACM SIGKDD international conference on knowledge discovery and data mining (KDD) (pp. 109–117).Google Scholar
  27. Evgeniou, T., Micchelli, C. A., & Pontil, M. (2005). Learning multiple tasks with kernel methods. Journal of Machine Learning Research, 6, 615–637.MathSciNetzbMATHGoogle Scholar
  28. Frank, A., & Asuncion, A. UCI machine learning repository. http://archive.ics.uci.edu/ml. Accessed 4 Jan 2019.
  29. Goldstein, H. (1991). Multilevel modelling of survey data. Journal of the Royal Statistical Society. Series D (The Statistician), 40(2), 235–244.Google Scholar
  30. Harandi, M., Hartley, R., Salzmann, M., & Trumpf, J. (2016). Dictionary learning on Grassmann manifolds. In Algorithmic advances in Riemannian geometry and applications (pp. 145–172).Google Scholar
  31. Harandi, M., Salzmann, M., & Hartley, R. (2017). Joint dimensionality reduction and metric learning: A geometric take. In International conference on machine learning (ICML).Google Scholar
  32. Harandi, M., Salzmann, M., & Hartley, R. (2018). Dimensionality reduction on SPD manifolds: The emergence of geometry-aware methods. IEEE Transactions on Pattern Analysis & Machine Intelligence, 40(1), 48–62.Google Scholar
  33. He, J., Balzano, L., & Szlam, A. (2012). Incremental gradient on the Grassmannian for online foreground and background separation in subsampled video. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  34. Jacob, L., Bach, F., & Vert, J. P. (2008). Clustered multi-task learning: A convex formulation. In Advances in neural information processing systems 21 (NIPS).Google Scholar
  35. Jalali, A., Ravikumar, P., Sanghavi, S., & Ruan, C. (2010). A dirty model for multi-task learning. In Advances in neural information processing systems 23 (NIPS).Google Scholar
  36. Jawanpuria, P., & Nath, J. S. (2011). Multi-task multiple kernel learning. In SIAM international conference on data mining (SDM) (pp. 828–830).Google Scholar
  37. Jawanpuria, P., & Nath, J. S. (2012). A convex feature learning formulation for latent task structure discovery. In International conference on machine learning (ICML) (pp. 1531–1538).Google Scholar
  38. Jin, P. H., Yuan, Q., Iandola, F., & Keutzer, K. (2016). How to scale distributed deep learning? Technical report, arXiv preprint arXiv:1611.04581.
  39. Kang, Z., Grauman, K., & Sha, F. (2011). Learning with whom to share in multi-task feature learning. In International conference on machine learning (ICML) (pp. 521–528).Google Scholar
  40. Kapur, A., Marwah, K., & Alterovitz, G. (2016). Gene expression prediction using low-rank matrix completion. BMC Bioinformatics, 17, 243.Google Scholar
  41. Keshavan, R. H., Montanari, A., & Oh, S. (2009). Low-rank matrix completion with noisy observations: A quantitative comparison. In Annual Allerton conference on communication, control, and computing (Allerton) (pp. 1216–1222).Google Scholar
  42. Keshavan, R. H., Montanari, A., & Oh, S. (2010). Matrix completion from a few entries. IEEE Transactions on Information Theory, 56(6), 2980–2998.MathSciNetzbMATHGoogle Scholar
  43. Kumar, A., & Daume, H. (2012). Learning task grouping and overlap in multi-task learning. In International conference on machine learning (ICML).Google Scholar
  44. Lapin, M., Schiele, B., & Hein, M. (2014). Scalable multitask representation learning for scene classification. In Conference on computer vision and pattern recognition.Google Scholar
  45. Ling, Q., Xu, Y., Yin, W., & Wen, Z. (2012). Decentralized low-rank matrix completion. In IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 2925–2928).Google Scholar
  46. Lin, A.-Y., & Ling, Q. (2015). Decentralized and privacy-preserving low-rank matrix completion. Journal of the Operations Research Society of China, 3(2), 189–205.MathSciNetzbMATHGoogle Scholar
  47. Markovsky, I., & Usevich, K. (2013). Structured low-rank approximation with missing data. SIAM Journal on Matrix Analysis and Applications, 34(2), 814–830.MathSciNetzbMATHGoogle Scholar
  48. Meyer, G., Journée, M., Bonnabel, S., & Sepulchre, R. (2009). From subspace learning to distance learning: A geometrical optimization approach. In IEEE/SP 15th workshop on statistical signal processing (pp. 385–388).Google Scholar
  49. Meyer, G., Bonnabel, S., & Sepulchre, R. (2011). Regression on fixed-rank positive semidefinite matrices: A Riemannian approach. Journal of Machine Learning Research, 11, 593–625.MathSciNetzbMATHGoogle Scholar
  50. Mishra, B., & Sepulchre, R. (2014). R3MC: A Riemannian three-factor algorithm for low-rank matrix completion. In Proceedings of the 53rd IEEE conference on decision and control (CDC) (pp. 1137–1142).Google Scholar
  51. Mishra, B., Kasai, H., & Saroop, A. (2016). A Riemannian gossip approach to decentralized matrix completion. Technical report, arXiv preprint arXiv:1605.06968, 2016. A shorter version appeared in the 9th NIPS Workshop on Optimization for Machine Learning.
  52. Mishra, B., Meyer, G., Bonnabel, S., & Sepulchre, R. (2014). Fixed-rank matrix factorizations and Riemannian low-rank optimization. Computational Statistics, 29(3–4), 591–621.MathSciNetzbMATHGoogle Scholar
  53. MovieLens. MovieLens (1997). http://grouplens.org/datasets/movielens/. Accessed 4 Jan 2019.
  54. Muandet, K., Balduzzi, D., & Schölkopf, B. (2013). Domain generalization via invariant feature representation. In International conference on machine learning (ICML) (pp. 10–18).Google Scholar
  55. Ngo, T. T., & Saad, Y. (2012). Scaled gradients on Grassmann manifolds for matrix completion. In Advances in neural information processing systems 25 (NIPS) (pp. 1421–1429).Google Scholar
  56. Ormándi, R., Hegedűs, I., & Jelasity, M. (2013). Gossip learning with linear models on fully distributed data. Concurrency and Computation: Practice and Experience, 25(4), 556–571.Google Scholar
  57. Recht, B., & Ré, C. (2013). Parallel stochastic gradient algorithms for large-scale matrix completion. Mathematical Programming Computation, 5(2), 201–226.MathSciNetzbMATHGoogle Scholar
  58. Rennie, J., & Srebro, N. (2005). Fast maximum margin matrix factorization for collaborative prediction. In International conference on machine learning (ICML) (pp. 713–719).Google Scholar
  59. Sarlette, A., & Sepulchre, R. (2009). Consensus optimization on manifolds. SIAM Journal on Control and Optimization, 48(1), 56–76.MathSciNetzbMATHGoogle Scholar
  60. Sato, H., Kasai, H., & Mishra, B. (2017). Riemannian stochastic variance reduced gradient. Technical report, arXiv preprint arXiv:1702.05594.
  61. Shah, D. (2009). Gossip algorithms. Foundations and Trend in Networking, 3(1), 1–125.zbMATHGoogle Scholar
  62. Tron, R., Afsari, B., & Vidal, R. (2011). Average consensus on Riemannian manifolds with bounded curvature. In IEEE conference on decision and control and European control conference (CDC-ECC) (pp. 7855–7862).Google Scholar
  63. Tron, R., Afsari, B., & Vidal, R. (2013). Riemannian consensus for manifolds with bounded curvature. IEEE Transactions on Automatic Control, 58(4), 921–934.MathSciNetzbMATHGoogle Scholar
  64. Turaga, P., Veeraraghavan, A., & Chellappa, R. (2008). Statistical analysis on Stiefel and Grassmann manifolds with applications in computer vision. In IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  65. Wen, Z., Yin, W., & Zhang, Y. (2012). Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Mathematical Programming Computation, 4(4), 333–361.MathSciNetzbMATHGoogle Scholar
  66. Yuan, M., & Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society Series B, 68, 49–67.MathSciNetzbMATHGoogle Scholar
  67. Zhang, Y. (2015). Parallel multi-task learning. In IEEE international conference on data mining (ICDM).Google Scholar
  68. Zhang, Y., & Yang, Q. (2017). A survey on multi-task learning. Technical report, arXiv:1707.08114v1.
  69. Zhang, Y., & Yeung, D. Y. (2010). A convex formulation for learning task relationships in multi-task learning. In Uncertainty in artificial intelligence.Google Scholar
  70. Zhang, H., Reddi, S. J., & Sra, S. (2016). Riemannian svrg: Fast stochastic optimization on Riemannian manifolds. In Advances in neural information processing systems (NIPS) (pp. 4592–4600).Google Scholar
  71. Zhang, J., Ghahramani, Z., & Yang, Y. (2008). Flexible latent variable models for multi-task learning. Machine Learning, 73(3), 221–242.Google Scholar
  72. Zhong, L. W., & Kwok, J. T. (2012). Convex multitask learning with flexible task clusters. In International conference on machine learning (ICML).Google Scholar
  73. Zhou, Y., Wilkinson, D., Schreiber, R., & Pan, R. (2008). Large-scale parallel collaborative filtering for the Netflix prize. In International conference on algorithmic aspects in information and management (AAIM) (pp. 337–348).Google Scholar

Copyright information

© The Author(s), under exclusive licence to Springer Science+Business Media LLC, part of Springer Nature 2019

Authors and Affiliations

  • Bamdev Mishra
    • 1
    Email author
  • Hiroyuki Kasai
    • 2
  • Pratik Jawanpuria
    • 1
  • Atul Saroop
    • 3
  1. 1.MicrosoftHyderabadIndia
  2. 2.The University of Electro-CommunicationsTokyoJapan
  3. 3.Amazon.comBengaluruIndia

Personalised recommendations