Matrix Factorization and Topic Modeling



Most document collections are defined by document-term matrices in which the rows (or columns) are highly correlated with one another. These correlations can be leveraged to create a low-dimensional representation of the data, and this process is referred to as dimensionality reduction.


  1. [5]
    C. Aggarwal. On the effects of dimensionality reduction on high dimensional similarity search. ACM PODS Conference, pp. 256–266, 2001.Google Scholar
  2. [9]
    C. Aggarwal and S. Sathe. Outlier ensembles: An introduction. Springer, 2017.Google Scholar
  3. [14]
    C. Aggarwal, and C. Zhai, Mining text data. Springer, 2012.Google Scholar
  4. [29]
    A. Asuncion, M. Welling, P. Smyth, and Y. Teh. On smoothing and inference for topic models. Uncertainty in Artificial Intelligence, pp. 27–34, 2009.Google Scholar
  5. [48]
    D. Bertsekas. Nonlinear programming. Athena Scientific, 1999.Google Scholar
  6. [52]
    D. Blei. Probabilistic topic models. Communications of the ACM, 55(4), pp. 77–84, 2012.CrossRefGoogle Scholar
  7. [54]
    D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3, pp. 993–1022, 2003.zbMATHGoogle Scholar
  8. [55]
    D. Blei and J. Lafferty. Dynamic topic models. ICML Conference, pp. 113–120, 2006.Google Scholar
  9. [68]
    R. Bunescu and R. Mooney. Subsequence kernels for relation extraction. NIPS Conference, pp. 171–178, 2005.Google Scholar
  10. [88]
    Y. Chang, C. Hsieh, K. Chang, M. Ringgaard, and C. J. Lin. Training and testing low-degree polynomial data mappings via linear SVM. Journal of Machine Learning Research, 11, pp. 1471–1490, 2010.MathSciNetzbMATHGoogle Scholar
  11. [136]
    C. Ding, T. Li, and M. Jordan. Convex and semi-nonnegative matrix factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(1), pp. 45–55, 2010.CrossRefGoogle Scholar
  12. [137]
    C. Ding, T. Li, and W. Peng. On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Computational Statistics and Data Analysis, 52(8), pp. 3913–3927, 2008.MathSciNetCrossRefGoogle Scholar
  13. [138]
    C. Ding, T. Li, W. Peng, and H. Park. Orthogonal nonnegative matrix t-factorizations for clustering. ACM KDD Conference, pp. 126–135, 2006.Google Scholar
  14. [145]
    S. Dumais. Latent semantic indexing (LSI) and TREC-2. Text Retrieval Conference (TREC), pp. 105–115, 1993.Google Scholar
  15. [146]
    S. Dumais. Latent semantic indexing (LSI): TREC-3 Report. Text Retrieval Conference (TREC), pp. 219–230, 1995.Google Scholar
  16. [148]
    S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 41(6), pp. 391–407, 1990.CrossRefGoogle Scholar
  17. [149]
    C. Eckart and G. Young. The approximation of one matrix by another of lower rank. Psychometrika, 1(3), pp. 211–218, 1936.CrossRefGoogle Scholar
  18. [180]
    T. Gärtner. A survey of kernels for structured data. ACM SIGKDD Explorations Newsletter, 5(1), pp. 49–58, 2003.CrossRefGoogle Scholar
  19. [185]
    E. Gaussier and C. Goutte. Relation between PLSA and NMF and implications. ACM SIGIR Conference, pp. 601–602, 2005.Google Scholar
  20. [190]
    M. Girolami and A. Kabán. On an equivalence between PLSI and LDA. ACM SIGIR Conference, pp. 433–434, 2003.Google Scholar
  21. [218]
    G. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786), pp. 504–507, 2006.MathSciNetCrossRefGoogle Scholar
  22. [224]
    T. Hofmann. Probabilistic latent semantic indexing. ACM SIGIR Conference, pp. 50–57, 1999.Google Scholar
  23. [225]
    T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine learning, 41(1–2), pp. 177–196, 2001.CrossRefGoogle Scholar
  24. [226]
    K. Hornik and B. Grün. topicmodels: An R package for fitting topic models. Journal of Statistical Software, 40(13), pp. 1–30, 2011.Google Scholar
  25. [255]
    A. Karatzoglou, A. Smola A, K. Hornik, and A. Zeileis. kernlab – An S4 Package for Kernel Methods in R. Journal of Statistical Software, 11(9), 2004.
  26. [272]
    A. Langville, C. Meyer, R. Albright, J. Cox, and D. Duling. Initializations for the nonnegative matrix factorization. ACM KDD Conference, pp. 23–26, 2006.Google Scholar
  27. [275]
    Q. Le and T. Mikolov. Distributed representations of sentences and documents. ICML Conference, pp. 1188–196, 2014.Google Scholar
  28. [276]
    D. Lee and H. Seung. Algorithms for non-negative matrix factorization. Advances in Meural Information Processing Systems, pp. 556–562, 2001.Google Scholar
  29. [277]
    D. Lee and H. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), pp. 788–791, 2001.zbMATHGoogle Scholar
  30. [294]
    C. Lin. Projected gradient methods for nonnegative matrix factorization. Neural Computation, 19(10), pp. 2756–2779, 2007.MathSciNetCrossRefGoogle Scholar
  31. [308]
    H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins. Text classification using string kernels. Journal of Machine Learning Research, 2, pp. 419–444, 2002.zbMATHGoogle Scholar
  32. [314]
    U. von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4), pp. 395–416, 2007.MathSciNetCrossRefGoogle Scholar
  33. [337]
    D. Metzler, S. Dumais, and C. Meek. Similarity measures for short segments of text. European Conference on Information Retrieval, pp. 16-27, 2007.Google Scholar
  34. [341]
    T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv:1301.3781, 2013.
  35. [388]
    J. Pritchard, M. Stephens, and P. Donnelly. Inference of population structure using multilocus genotype data. Genetics, 155(2), pp. 945–959, 2000.Google Scholar
  36. [401]
    R. Rehurek and P. Sojka. Software framework for topic modelling with large corpora. LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50, 2010.
  37. [417]
    S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290, no. 5500, pp. 2323–2326, 2000.CrossRefGoogle Scholar
  38. [418]
    M. Sahami and T. D. Heilman. A Web-based kernel function for measuring the similarity of short text snippets. WWW Conference, pp. 377–386, 2006.Google Scholar
  39. [436]
    B. Schölkopf, A. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), pp. 1299–1319, 1998.CrossRefGoogle Scholar
  40. [460]
    G. Strang. An introduction to linear algebra. Wellesley Cambridge Press, 2009.Google Scholar
  41. [473]
    J. Tenenbaum, V. De Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290 (5500), pp. 2319–2323, 2000.CrossRefGoogle Scholar
  42. [488]
    H. Wallach, D. Mimno, and A. McCallum. Rethinking LDA: Why priors matter. NIPS Conference, pp. 1973–1981, 2009.Google Scholar
  43. [493]
    X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. ACM SIGIR Conference, pp. 178–185, 2006.Google Scholar
  44. [501]
    C. Williams and M. Seeger. Using the Nyström method to speed up kernel machines. NIPS Conference, 2000.Google Scholar
  45. [519]
    Y. Yang and X. Liu. A re-examination of text categorization methods. ACM SIGIR Conference, pp. 42–49, 1999.Google Scholar
  46. [550]
  47. [557]
  48. [558]
  49. [559]
  50. [560]
  51. [561]
  52. [562]
  53. [563]
  54. [564]
  55. [565]
  56. [566]
  57. [567]
  58. [568]
  59. [605]

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.IBM T. J. Watson Research CenterYorktown HeightsUSA

Personalised recommendations