Text Classification with Document Embeddings

Huang, Chaochao; Qiu, Xipeng; Huang, Xuanjing

doi:10.1007/978-3-319-12277-9_12

Text Classification with Document Embeddings

Chaochao Huang^21,22,
Xipeng Qiu^21,22 &
Xuanjing Huang^21,22

Conference paper

1842 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8801))

Abstract

Distributed representations have gained a lot of interests in natural language processing community. In this paper, we propose a method to learn document embedding with neural network architecture for text classification task. In our architecture, each document can be represented as a fine-grained representation of different meanings so that the classification can be done more accurately. The results of our experiments show that our method achieve better performances on two popular datasets.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D.: An evaluation of naive bayesian anti-spam filtering. arXiv preprint cs/0006013 (2000)
Google Scholar
Bengio, Y., Schwenk, H., Senécal, J.S., Morin, F., Gauvain, J.L.: Neural probabilistic language models. In: Holmes, D.E., Jain, L.C. (eds.) Innovations in Machine Learning. STUDFUZZ, vol. 194, pp. 137–186. Springer, Heidelberg (2006)
Chapter Google Scholar
Carvalho, V.R., Cohen, W.W.: On the collective classification of email speech acts. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 345–352. ACM (2005)
Google Scholar
Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2(3), 27 (2011)
Google Scholar
Cohen, W.W.: Learning rules that classify e-mail. In: AAAI Spring Symposium on Machine Learning in Information Access, California, vol. 18, p. 25 (1996)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. The Journal of Machine Learning Research 12, 2493–2537 (2011)
MATH Google Scholar
Dumais, S., Platt, J., Heckerman, D., Sahami, M.: Inductive learning algorithms and representations for text categorization. In: Proceedings of the Seventh International Conference on Information and Knowledge Management, pp. 148–155. ACM (1998)
Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet MATH Google Scholar
Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-vol. 1, pp. 873–882. Association for Computational Linguistics (2012)
Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. Springer (1998)
Google Scholar
Khosravi, H., Wilks, Y.: Routing email automatically by purpose not topic. Natural Language Engineering 5(3), 237–250 (1999)
Article Google Scholar
Larochelle, H., Bengio, Y.: Classification using discriminative restricted boltzmann machines. In: Proceedings of the 25th International Conference on Machine Learning, pp. 536–543. ACM (2008)
Google Scholar
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053 (2014)
Google Scholar
Liu, T.: A novel text classification approach based on deep belief network. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010, Part I. LNCS, vol. 6443, pp. 314–321. Springer, Heidelberg (2010)
Chapter Google Scholar
McCallum, A., Nigam, K., et al.: A comparison of event models for naive bayes text classification. In: AAAI 1998 Workshop on Learning for Text Categorization, vol. 752, pp. 41–48. Citeseer (1998)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings of NAACL-HLT, pp. 746–751 (2013)
Google Scholar
Mitchell, J., Lapata, M.: Composition in distributional models of semantics. Cognitive Science 34(8), 1388–1429 (2010)
Article Google Scholar
Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: NIPS, pp. 1081–1088 (2008)
Google Scholar
Nasr, G.E., Badr, E., Joun, C.: Cross entropy error function in neural networks: Forecasting gasoline demand. In: FLAIRS Conference, pp. 381–384 (2002)
Google Scholar
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval 2(1-2), 1–135 (2008)
Article Google Scholar
Quinlan, J.R.: Induction of decision trees. Machine Learning 1(1), 81–106 (1986)
Google Scholar
Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. MIT Press, Cambridge (1988)
Google Scholar
Salton, G., Wong, A., Yang, C.: A vector space model for automatic indexing. Communications of the ACM 18(11), 613–620 (1975)
Article MATH Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Socher, R., Lin, C.C., Ng, A.Y., Manning, C.D.: Parsing Natural Scenes and Natural Language with Recursive Neural Networks. In: Proceedings of the 26th International Conference on Machine Learning, ICML (2011)
Google Scholar
Yang, Y., Pedersen, J.: A comparative study on feature selection in text categorization. In: Proc. of Int. Conf. on Mach. Learn. (ICML), vol. 97 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Shanghai Key Laboratory of Intelligent Information Processing, China
Chaochao Huang, Xipeng Qiu & Xuanjing Huang
School of Computer Science, Fudan University, Shanghai, China
Chaochao Huang, Xipeng Qiu & Xuanjing Huang

Authors

Chaochao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Xipeng Qiu
View author publications
You can also search for this author in PubMed Google Scholar
Xuanjing Huang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Technology, Tsinghua University, Haidian District, 100084, Beijing, China
Maosong Sun & Yang Liu &
Chinese Academy of Sciences, Institute of Automation, 100190, Beijing, China
Jun Zhao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huang, C., Qiu, X., Huang, X. (2014). Text Classification with Document Embeddings. In: Sun, M., Liu, Y., Zhao, J. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2014 2014. Lecture Notes in Computer Science(), vol 8801. Springer, Cham. https://doi.org/10.1007/978-3-319-12277-9_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-12277-9_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12276-2
Online ISBN: 978-3-319-12277-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics