Local Linear Matrix Factorization for Document Modeling

Bai, Lu; Guo, Jiafeng; Lan, Yanyan; Cheng, Xueqi

doi:10.1007/978-3-319-06028-6_33

Lu Bai²²,
Jiafeng Guo²²,
Yanyan Lan²² &
…
Xueqi Cheng²²

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8416))

Included in the following conference series:

European Conference on Information Retrieval

2964 Accesses
1 Citations
1 Altmetric

Abstract

Mining low dimensional semantic representations of document is a key problem in many document analysis and information retrieval tasks. Previous studies show better representation mining results by incorporating geometric relationships among documents. However, existing methods model the geometric relationships between a document and its neighbors as independent pairwise relationship; while the pairwise relationship relies on some heuristic similarity/dissimilarity measures and predefined threshold. To address these problems, we propose a Local Linear Matrix Factorization (LLMF), for low dimensional representation learning. Specifically, LLMF exploits the geometric relationships between a document and its neighbors based on local linear combination assumption, which encodes richer geometric information among the documents. Moreover, the linear combination relationships can be learned from the data without any heuristic parameter definition. We present an iterative model fitting algorithm based on quasi-Newton method for the optimization of LLMF. In the experiments, we compare LLMF with the state-of-the-art semantic mining methods on two text data sets. The experimental results show that LLMF can produce better document representations and higher accuracy in document classification task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Airoldi, E.M., Blei, D.M., Fienberg, S.E., Xing, E.P.: Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9, 1981–2014 (2008)
MATH Google Scholar
Albright, R., Cox, J., Duling, D., Langville, A.N., Meyer, C.D.: Algorithms, initializations, and convergence for the nonnegative matrix factorization. Matrix (919), 1–18 (2006)
Google Scholar
Andrew, G., Gao, J.: Scalable training of l1-regularized log-linear models. In: ICML 2007, pp. 33–40. ACM, New York (2007)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I., Lafferty, J.: Latent dirichlet allocation. Journal of Machine Learning Research 3 (2003)
Google Scholar
Cai, D., He, X., Han, J.: Locally consistent concept factorization for document clustering. IEEE Transactions on Knowledge and Data Engineering 23(6), 902–913 (2011)
Article Google Scholar
Cai, D., He, X., Han, J., Huang, T.S.: Graph regularized nonnegative matrix factorization for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 33(8), 1548–1560 (2011)
Article Google Scholar
Cai, D., Mei, Q., Han, J., Zhai, C.: Modeling hidden topics on document manifold. In: CIKM 2008, pp. 911–920. ACM, New York (2008)
Google Scholar
Cai, D., Wang, X., He, X.: Probabilistic dyadic data analysis with local and global consistency. In: ICML 2009, pp. 105–112. ACM, New York (2009)
Google Scholar
Deerwester, S.C., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. JASIS 41(6), 391–407 (1990)
Article Google Scholar
Ding, C., Li, T., Peng, W.: On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Comput. Stat. Data Anal. 52(8), 3913–3927 (2008)
Article MathSciNet MATH Google Scholar
Erosheva, E., Fienberg, S., Lafferty, J.: Mixed-membership models of scientific publications 101(suppl. 1), 5220–5227 (2004)
Google Scholar
Gaussier, E., Goutte, C.: Relation between plsa and nmf and implications. In: SIGIR 2005, pp. 601–602. ACM, New York (2005)
Google Scholar
Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. In: Machine Learning (2001)
Google Scholar
Huh, S., Fienberg, S.E.: Discriminative topic modeling based on manifold learning. In: KDD 2010, pp. 653–662. ACM, New York (2010)
Google Scholar
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Article Google Scholar
Nallapati, R.M., Ahmed, A., Xing, E.P., Cohen, W.W.: Joint latent topic models for text and citations. In: KDD 2008, pp. 542–550. ACM, New York (2008)
Google Scholar
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. SCIENCE 290, 2323–2326 (2000)
Article Google Scholar
Salakhutdinov, R., Mnih, A.: Probabilistic matrix factorization. In: NIPS (2007)
Google Scholar
Sun, C., Gao, B., Cao, Z., Li, H.: Htm: a topic model for hypertexts. In: EMNLP 2008, Stroudsburg, PA, USA, pp. 514–522. Association for Computational Linguistics (2008)
Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B 58, 267–288 (1994)
MathSciNet Google Scholar
Wang, C., Blei, D.M.: Collaborative topic modeling for recommending scientific articles. In: KDD 2011, pp. 448–456. ACM, New York (2011)
Google Scholar
Wei, X., Croft, W.B.: Lda-based document models for ad-hoc retrieval. In: SIGIR 2006, pp. 178–185. ACM, New York (2006)
Google Scholar
Zheng, M., Bu, J., Chen, C., Wang, C., Zhang, L., Qiu, G., Cai, D.: Graph regularized sparse coding for image representation. Trans. Img. Proc. 20(5), 1327–1336 (2011)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computing Technology, Chinese Academy of Sciences, BeiJing, China
Lu Bai, Jiafeng Guo, Yanyan Lan & Xueqi Cheng

Authors

Lu Bai
View author publications
You can also search for this author in PubMed Google Scholar
Jiafeng Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yanyan Lan
View author publications
You can also search for this author in PubMed Google Scholar
Xueqi Cheng
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Amsterdam, Amsterdam, The Netherlands
Maarten de Rijke & Tom Kenter &
Centrum Wiskunde en Informatica, Amsterdam, The Netherlands and Delft University of Technology, Delft, The Netherlands
Arjen P. de Vries
University of Illinois at Urbana-Champaign, Urbana, IL, USA
ChengXiang Zhai
University of Twente, Twente, The Netheralnds and Erasmus University Rotterdam, Rotterdam, The Netherlands
Franciska de Jong
SalesPredict, Haifa, Israel
Kira Radinsky
Microsoft Research, Cambridge, UK
Katja Hofmann

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bai, L., Guo, J., Lan, Y., Cheng, X. (2014). Local Linear Matrix Factorization for Document Modeling. In: de Rijke, M., et al. Advances in Information Retrieval. ECIR 2014. Lecture Notes in Computer Science, vol 8416. Springer, Cham. https://doi.org/10.1007/978-3-319-06028-6_33

Download citation

DOI: https://doi.org/10.1007/978-3-319-06028-6_33
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06027-9
Online ISBN: 978-3-319-06028-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics