Matrix Factorization and Topic Modeling

Aggarwal, Charu C.

doi:10.1007/978-3-319-73531-3_3

Charu C. Aggarwal²

10k Accesses
3 Citations

Abstract

Most document collections are defined by document-term matrices in which the rows (or columns) are highly correlated with one another. These correlations can be leveraged to create a low-dimensional representation of the data, and this process is referred to as dimensionality reduction.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 84.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Here, we are assuming a specific type of factorization, referred to as non-negative matrix factorization, because of its interpretability. Other factorizations might not obey these properties.
2.
The factorization is unique up to multiplication by − 1 of any particular column of P and Q.
3.
This solution is unique up to multiplication of any column of U or V with − 1.
4.
In other words, the columns of P, the columns of Q, and the diagonal of Σ each sum to 1.
5.
The Dirichlet is selected because it is the posterior distribution of multinomial parameters, if the prior distribution of these parameters is a Dirichlet (although the parameters of the prior and posterior Dirichlet may be different). If we throw a loaded dice repeatedly with its faces showing various topics, the resulting observations are referred to as multinomial. In LDA, the selection of the latent components of the different tokens in a document is achieved by throwing such a dice repeatedly. Formally, the Dirichlet distribution is a conjugate prior to the multinomial distribution. The use of conjugate priors is widespread in Bayesian statistics because of this property.
6.
For a positive integer n, the value of Γ(n) is (n − 1)! . For a positive real value x, the value of Γ(x) is defined by interpolating the values at integer points with a smooth curve, which works out to an interpolated value of Γ(x) = ∫ ₀ ^∞ y ^x−1 e ^−ydy. More details of an exact definition and a specific functional form may be found at http://mathworld.wolfram.com/GammaFunction.html.
7.
There does not seem to be a clear consensus on this issue. For the classification problem, slightly better results have been claimed in [519] for the linear kernel. On the other hand, the work in [88] shows that slightly better results are obtained with the Gaussian kernel method with proper tuning. Theoretically, the latter claim seems to be a better justified because linear kernels can be roughly simulated by the Gaussian by using a large bandwidth.
8.
For simplicity, we are including stop words in the 2-grams.

Bibliography

C. Aggarwal. On the effects of dimensionality reduction on high dimensional similarity search. ACM PODS Conference, pp. 256–266, 2001.
Google Scholar
C. Aggarwal and S. Sathe. Outlier ensembles: An introduction. Springer, 2017.
Google Scholar
C. Aggarwal, and C. Zhai, Mining text data. Springer, 2012.
Google Scholar
A. Asuncion, M. Welling, P. Smyth, and Y. Teh. On smoothing and inference for topic models. Uncertainty in Artificial Intelligence, pp. 27–34, 2009.
Google Scholar
D. Bertsekas. Nonlinear programming. Athena Scientific, 1999.
Google Scholar
D. Blei. Probabilistic topic models. Communications of the ACM, 55(4), pp. 77–84, 2012.
Article Google Scholar
D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3, pp. 993–1022, 2003.
MATH Google Scholar
D. Blei and J. Lafferty. Dynamic topic models. ICML Conference, pp. 113–120, 2006.
Google Scholar
R. Bunescu and R. Mooney. Subsequence kernels for relation extraction. NIPS Conference, pp. 171–178, 2005.
Google Scholar
Y. Chang, C. Hsieh, K. Chang, M. Ringgaard, and C. J. Lin. Training and testing low-degree polynomial data mappings via linear SVM. Journal of Machine Learning Research, 11, pp. 1471–1490, 2010.
MathSciNet MATH Google Scholar
C. Ding, T. Li, and M. Jordan. Convex and semi-nonnegative matrix factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(1), pp. 45–55, 2010.
Article Google Scholar
C. Ding, T. Li, and W. Peng. On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Computational Statistics and Data Analysis, 52(8), pp. 3913–3927, 2008.
Article MathSciNet Google Scholar
C. Ding, T. Li, W. Peng, and H. Park. Orthogonal nonnegative matrix t-factorizations for clustering. ACM KDD Conference, pp. 126–135, 2006.
Google Scholar
S. Dumais. Latent semantic indexing (LSI) and TREC-2. Text Retrieval Conference (TREC), pp. 105–115, 1993.
Google Scholar
S. Dumais. Latent semantic indexing (LSI): TREC-3 Report. Text Retrieval Conference (TREC), pp. 219–230, 1995.
Google Scholar
S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 41(6), pp. 391–407, 1990.
Article Google Scholar
C. Eckart and G. Young. The approximation of one matrix by another of lower rank. Psychometrika, 1(3), pp. 211–218, 1936.
Article Google Scholar
T. Gärtner. A survey of kernels for structured data. ACM SIGKDD Explorations Newsletter, 5(1), pp. 49–58, 2003.
Article Google Scholar
E. Gaussier and C. Goutte. Relation between PLSA and NMF and implications. ACM SIGIR Conference, pp. 601–602, 2005.
Google Scholar
M. Girolami and A. Kabán. On an equivalence between PLSI and LDA. ACM SIGIR Conference, pp. 433–434, 2003.
Google Scholar
G. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786), pp. 504–507, 2006.
Article MathSciNet Google Scholar
T. Hofmann. Probabilistic latent semantic indexing. ACM SIGIR Conference, pp. 50–57, 1999.
Google Scholar
T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine learning, 41(1–2), pp. 177–196, 2001.
Article Google Scholar
K. Hornik and B. Grün. topicmodels: An R package for fitting topic models. Journal of Statistical Software, 40(13), pp. 1–30, 2011.
Google Scholar
A. Karatzoglou, A. Smola A, K. Hornik, and A. Zeileis. kernlab – An S4 Package for Kernel Methods in R. Journal of Statistical Software, 11(9), 2004. http://epub.wu.ac.at/1048/1/document.pdf http://CRAN.R-project.org/package=kernlab
A. Langville, C. Meyer, R. Albright, J. Cox, and D. Duling. Initializations for the nonnegative matrix factorization. ACM KDD Conference, pp. 23–26, 2006.
Google Scholar
Q. Le and T. Mikolov. Distributed representations of sentences and documents. ICML Conference, pp. 1188–196, 2014.
Google Scholar
D. Lee and H. Seung. Algorithms for non-negative matrix factorization. Advances in Meural Information Processing Systems, pp. 556–562, 2001.
Google Scholar
D. Lee and H. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), pp. 788–791, 2001.
MATH Google Scholar
C. Lin. Projected gradient methods for nonnegative matrix factorization. Neural Computation, 19(10), pp. 2756–2779, 2007.
Article MathSciNet Google Scholar
H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins. Text classification using string kernels. Journal of Machine Learning Research, 2, pp. 419–444, 2002.
MATH Google Scholar
U. von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4), pp. 395–416, 2007.
Article MathSciNet Google Scholar
D. Metzler, S. Dumais, and C. Meek. Similarity measures for short segments of text. European Conference on Information Retrieval, pp. 16-27, 2007.
Google Scholar
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv:1301.3781, 2013. https://arxiv.org/abs/1301.3781
J. Pritchard, M. Stephens, and P. Donnelly. Inference of population structure using multilocus genotype data. Genetics, 155(2), pp. 945–959, 2000.
Google Scholar
R. Rehurek and P. Sojka. Software framework for topic modelling with large corpora. LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50, 2010. https://radimrehurek.com/gensim/index.html
S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290, no. 5500, pp. 2323–2326, 2000.
Article Google Scholar
M. Sahami and T. D. Heilman. A Web-based kernel function for measuring the similarity of short text snippets. WWW Conference, pp. 377–386, 2006.
Google Scholar
B. Schölkopf, A. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), pp. 1299–1319, 1998.
Article Google Scholar
G. Strang. An introduction to linear algebra. Wellesley Cambridge Press, 2009.
Google Scholar
J. Tenenbaum, V. De Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290 (5500), pp. 2319–2323, 2000.
Article Google Scholar
H. Wallach, D. Mimno, and A. McCallum. Rethinking LDA: Why priors matter. NIPS Conference, pp. 1973–1981, 2009.
Google Scholar
X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. ACM SIGIR Conference, pp. 178–185, 2006.
Google Scholar
C. Williams and M. Seeger. Using the Nyström method to speed up kernel machines. NIPS Conference, 2000.
Google Scholar
Y. Yang and X. Liu. A re-examination of text categorization methods. ACM SIGIR Conference, pp. 42–49, 1999.
Google Scholar
http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
https://cran.r-project.org/web/packages/lsa/index.html
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html
http://weka.sourceforge.net/doc.stable/weka/attributeSelection/LatentSemanticAnalysis.html
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.LatentDirichletAllocation.html
https://cran.r-project.org/
http://www.cs.princeton.edu/~blei/lda-c/
http://scikit-learn.org/stable/modules/manifold.html
https://code.google.com/archive/p/word2vec/
https://www.tensorflow.org/tutorials/word2vec/
http://www.netlib.org/svdpack
http://scikit-learn.org/stable/modules/kernel_approximation.html
http://mallet.cs.umass.edu/

Download references

Author information

Authors and Affiliations

IBM T. J. Watson Research Center, Yorktown Heights, NY, USA
Charu C. Aggarwal

Authors

Charu C. Aggarwal
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aggarwal, C.C. (2018). Matrix Factorization and Topic Modeling. In: Machine Learning for Text. Springer, Cham. https://doi.org/10.1007/978-3-319-73531-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-73531-3_3
Published: 20 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73530-6
Online ISBN: 978-3-319-73531-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics