Abstract
Most document collections are defined by document-term matrices in which the rows (or columns) are highly correlated with one another. These correlations can be leveraged to create a low-dimensional representation of the data, and this process is referred to as dimensionality reduction.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Here, we are assuming a specific type of factorization, referred to as non-negative matrix factorization, because of its interpretability. Other factorizations might not obey these properties.
- 2.
The factorization is unique up to multiplication by − 1 of any particular column of P and Q.
- 3.
This solution is unique up to multiplication of any column of U or V with − 1.
- 4.
In other words, the columns of P, the columns of Q, and the diagonal of Σ each sum to 1.
- 5.
The Dirichlet is selected because it is the posterior distribution of multinomial parameters, if the prior distribution of these parameters is a Dirichlet (although the parameters of the prior and posterior Dirichlet may be different). If we throw a loaded dice repeatedly with its faces showing various topics, the resulting observations are referred to as multinomial. In LDA, the selection of the latent components of the different tokens in a document is achieved by throwing such a dice repeatedly. Formally, the Dirichlet distribution is a conjugate prior to the multinomial distribution. The use of conjugate priors is widespread in Bayesian statistics because of this property.
- 6.
For a positive integer n, the value of Γ(n) is (n − 1)! . For a positive real value x, the value of Γ(x) is defined by interpolating the values at integer points with a smooth curve, which works out to an interpolated value of Γ(x) = ∫ 0 ∞ y x−1 e −ydy. More details of an exact definition and a specific functional form may be found at http://mathworld.wolfram.com/GammaFunction.html.
- 7.
There does not seem to be a clear consensus on this issue. For the classification problem, slightly better results have been claimed in [519] for the linear kernel. On the other hand, the work in [88] shows that slightly better results are obtained with the Gaussian kernel method with proper tuning. Theoretically, the latter claim seems to be a better justified because linear kernels can be roughly simulated by the Gaussian by using a large bandwidth.
- 8.
For simplicity, we are including stop words in the 2-grams.
Bibliography
C. Aggarwal. On the effects of dimensionality reduction on high dimensional similarity search. ACM PODS Conference, pp. 256–266, 2001.
C. Aggarwal and S. Sathe. Outlier ensembles: An introduction. Springer, 2017.
C. Aggarwal, and C. Zhai, Mining text data. Springer, 2012.
A. Asuncion, M. Welling, P. Smyth, and Y. Teh. On smoothing and inference for topic models. Uncertainty in Artificial Intelligence, pp. 27–34, 2009.
D. Bertsekas. Nonlinear programming. Athena Scientific, 1999.
D. Blei. Probabilistic topic models. Communications of the ACM, 55(4), pp. 77–84, 2012.
D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3, pp. 993–1022, 2003.
D. Blei and J. Lafferty. Dynamic topic models. ICML Conference, pp. 113–120, 2006.
R. Bunescu and R. Mooney. Subsequence kernels for relation extraction. NIPS Conference, pp. 171–178, 2005.
Y. Chang, C. Hsieh, K. Chang, M. Ringgaard, and C. J. Lin. Training and testing low-degree polynomial data mappings via linear SVM. Journal of Machine Learning Research, 11, pp. 1471–1490, 2010.
C. Ding, T. Li, and M. Jordan. Convex and semi-nonnegative matrix factorizations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 32(1), pp. 45–55, 2010.
C. Ding, T. Li, and W. Peng. On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Computational Statistics and Data Analysis, 52(8), pp. 3913–3927, 2008.
C. Ding, T. Li, W. Peng, and H. Park. Orthogonal nonnegative matrix t-factorizations for clustering. ACM KDD Conference, pp. 126–135, 2006.
S. Dumais. Latent semantic indexing (LSI) and TREC-2. Text Retrieval Conference (TREC), pp. 105–115, 1993.
S. Dumais. Latent semantic indexing (LSI): TREC-3 Report. Text Retrieval Conference (TREC), pp. 219–230, 1995.
S. Deerwester, S. Dumais, G. Furnas, T. Landauer, and R. Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 41(6), pp. 391–407, 1990.
C. Eckart and G. Young. The approximation of one matrix by another of lower rank. Psychometrika, 1(3), pp. 211–218, 1936.
T. Gärtner. A survey of kernels for structured data. ACM SIGKDD Explorations Newsletter, 5(1), pp. 49–58, 2003.
E. Gaussier and C. Goutte. Relation between PLSA and NMF and implications. ACM SIGIR Conference, pp. 601–602, 2005.
M. Girolami and A. Kabán. On an equivalence between PLSI and LDA. ACM SIGIR Conference, pp. 433–434, 2003.
G. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786), pp. 504–507, 2006.
T. Hofmann. Probabilistic latent semantic indexing. ACM SIGIR Conference, pp. 50–57, 1999.
T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine learning, 41(1–2), pp. 177–196, 2001.
K. Hornik and B. Grün. topicmodels: An R package for fitting topic models. Journal of Statistical Software, 40(13), pp. 1–30, 2011.
A. Karatzoglou, A. Smola A, K. Hornik, and A. Zeileis. kernlab – An S4 Package for Kernel Methods in R. Journal of Statistical Software, 11(9), 2004. http://epub.wu.ac.at/1048/1/document.pdf http://CRAN.R-project.org/package=kernlab
A. Langville, C. Meyer, R. Albright, J. Cox, and D. Duling. Initializations for the nonnegative matrix factorization. ACM KDD Conference, pp. 23–26, 2006.
Q. Le and T. Mikolov. Distributed representations of sentences and documents. ICML Conference, pp. 1188–196, 2014.
D. Lee and H. Seung. Algorithms for non-negative matrix factorization. Advances in Meural Information Processing Systems, pp. 556–562, 2001.
D. Lee and H. Seung. Learning the parts of objects by non-negative matrix factorization. Nature, 401(6755), pp. 788–791, 2001.
C. Lin. Projected gradient methods for nonnegative matrix factorization. Neural Computation, 19(10), pp. 2756–2779, 2007.
H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins. Text classification using string kernels. Journal of Machine Learning Research, 2, pp. 419–444, 2002.
U. von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4), pp. 395–416, 2007.
D. Metzler, S. Dumais, and C. Meek. Similarity measures for short segments of text. European Conference on Information Retrieval, pp. 16-27, 2007.
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv:1301.3781, 2013. https://arxiv.org/abs/1301.3781
J. Pritchard, M. Stephens, and P. Donnelly. Inference of population structure using multilocus genotype data. Genetics, 155(2), pp. 945–959, 2000.
R. Rehurek and P. Sojka. Software framework for topic modelling with large corpora. LREC 2010 Workshop on New Challenges for NLP Frameworks, pp. 45–50, 2010. https://radimrehurek.com/gensim/index.html
S. Roweis and L. Saul. Nonlinear dimensionality reduction by locally linear embedding. Science, 290, no. 5500, pp. 2323–2326, 2000.
M. Sahami and T. D. Heilman. A Web-based kernel function for measuring the similarity of short text snippets. WWW Conference, pp. 377–386, 2006.
B. Schölkopf, A. Smola, and K.-R. Müller. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), pp. 1299–1319, 1998.
G. Strang. An introduction to linear algebra. Wellesley Cambridge Press, 2009.
J. Tenenbaum, V. De Silva, and J. Langford. A global geometric framework for nonlinear dimensionality reduction. Science, 290 (5500), pp. 2319–2323, 2000.
H. Wallach, D. Mimno, and A. McCallum. Rethinking LDA: Why priors matter. NIPS Conference, pp. 1973–1981, 2009.
X. Wei and W. B. Croft. LDA-based document models for ad-hoc retrieval. ACM SIGIR Conference, pp. 178–185, 2006.
C. Williams and M. Seeger. Using the Nyström method to speed up kernel machines. NIPS Conference, 2000.
Y. Yang and X. Liu. A re-examination of text categorization methods. ACM SIGIR Conference, pp. 42–49, 1999.
http://scikit-learn.org/stable/tutorial/text_analytics/working_with_text_data.html
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html
http://weka.sourceforge.net/doc.stable/weka/attributeSelection/LatentSemanticAnalysis.html
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html
http://scikit-learn.org/stable/modules/kernel_approximation.html
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Aggarwal, C.C. (2018). Matrix Factorization and Topic Modeling. In: Machine Learning for Text. Springer, Cham. https://doi.org/10.1007/978-3-319-73531-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-73531-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73530-6
Online ISBN: 978-3-319-73531-3
eBook Packages: Computer ScienceComputer Science (R0)