Tripartite-Replicated Softmax Model for Document Representations

  • Bo Xu
  • Hongfei LinEmail author
  • Lin Wang
  • Yuan Lin
  • Kan Xu
  • Xiaocong Wei
  • Dong Huang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10390)


Text mining tasks based on machine learning require inputs to be represented as fixed-length vectors, and effective vectors of words, phrases, sentences and even documents may greatly improve the performance of these tasks. Recently, distributed word representations based on neural networks have been demonstrated powerful in many tasks by encoding abundant semantic and linguistic information. However, it remains a great challenge for document representations because of the complex semantic structures in different documents. To meet the challenge, we propose two novel tripartite graphical models for document representations by incorporating word representations into the Replicated Softmax model, and we name the models as Tripartite-Replicated Softmax model (TRPS) and directed Tripartite-Replicated Softmax model (d-TRPS), respectively. We also introduce some optimization strategies for training the proposed models to learn better document representations. The proposed models can capture linear relationships among words and latent semantic information within documents simultaneously, thus learning both linear and nonlinear document representations. We examine the learned document representations in a document classification task and a document retrieval task. Experimental results show that the learned representations by our models outperform the state-of-the-art models in improving the performance of these two tasks.


Document representations Replicated softmax model Text mining 



This work is partially supported by grant from the Natural Science Foundation of China (No. 61632011, 61572102, 61402075, 61602078, 61562080), State Education Ministry and The Research Fund for the Doctoral Program of Higher Education (No. 20090041110002), the Fundamental Research Funds for the Central Universities.


  1. 1.
    Grefenstette, E., Dinu, G., Zhang, Y.Z., et al.: Multi-step regression learning for compositional distributional semantics. arXiv preprint arXiv:1301.6939 (2013)
  2. 2.
    Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168 (2013)
  3. 3.
    Mitchell, J., Lapata, M.: Composition in distributional models of semantics. Cogn. Sci. 34(8), 1388–1429 (2010)CrossRefGoogle Scholar
  4. 4.
    Nam, J., Mencía, E.L., Fürnkranz, J.: All-in text: learning document, label, and word representations jointly. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)Google Scholar
  5. 5.
    Yessenalina, A., Cardie, C.: Compositional matrix-space models for sentiment analysis. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 172–182. Association for Computational Linguistics (2011)Google Scholar
  6. 6.
    Zanzotto, F.M., Korkontzelos, I., Fallucchi, F., et al.: Estimating linear models for compositional distributional semantics. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 1263–1271. Association for Computational Linguistics (2010)Google Scholar
  7. 7.
    Gehler, P.V., Holub, A.D., Welling, M.: The rate adapting Poisson model for information retrieval and object recognition. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 337–344. ACM (2006)Google Scholar
  8. 8.
    Xing, E.P., Yan, R., Hauptmann, A.G.: Mining associated text and images with dual-wing harmoniums. arXiv preprint arXiv:1207.1423 (2012)
  9. 9.
    Hinton, G.E., Salakhutdinov, R.R.: Replicated softmax: an undirected topic model. In: Advances in Neural Information Processing Systems, pp. 1607–1614 (2009)Google Scholar
  10. 10.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)zbMATHGoogle Scholar
  11. 11.
    Srivastava, N., Salakhutdinov, R.R., Hinton, G.E.: Modeling documents with deep Boltzmann machines. arXiv preprint arXiv:1309.6865 (2013)
  12. 12.
    Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
  13. 13.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  14. 14.
    Niu, L.Q., Dai, X.Y.: Topic2Vec: learning distributed representations of topics. arXiv preprint arXiv:1506.08422 (2015)
  15. 15.
    Nguyen, D.Q., Billingsley, R., Du, L., et al.: Improving topic models with latent feature word representations. Trans. Assoc. Comput. Linguist. 3, 299–313 (2015)Google Scholar
  16. 16.
    Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)CrossRefzbMATHGoogle Scholar
  17. 17.
    Tieleman, T.: Training restricted Boltzmann machines using approximations to the likelihood gradient. In: Proceedings of the 25th International Conference on Machine Learning, pp. 1064–1071. ACM (2008)Google Scholar
  18. 18.
    Salakhutdinov, R., Hinton, G.E.: Deep Boltzmann machines. In: International Conference on Artificial Intelligence and Statistics, pp. 448–455 (2009)Google Scholar
  19. 19.
    Zeiler, M.D.: ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Bo Xu
    • 1
  • Hongfei Lin
    • 1
    Email author
  • Lin Wang
    • 1
  • Yuan Lin
    • 2
  • Kan Xu
    • 1
  • Xiaocong Wei
    • 1
  • Dong Huang
    • 1
  1. 1.School of Computer Science and TechnologyDalian University of TechnologyDalianChina
  2. 2.WISE labDalian University of TechnologyDalianChina

Personalised recommendations