Learning Document Representation for Deceptive Opinion Spam Detection

Li, Luyang; Ren, Wenjing; Qin, Bing; Liu, Ting

doi:10.1007/978-3-319-25816-4_32

Learning Document Representation for Deceptive Opinion Spam Detection

Luyang Li¹⁹,
Wenjing Ren¹⁹,
Bing Qin¹⁹ &
…
Ting Liu¹⁹

Conference paper
First Online: 08 November 2015

7466 Accesses
12 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9427))

Abstract

Deceptive opinion spam in reviews of products or service is very harmful for customers in decision making. Existing approaches to detect deceptive spam are concern on feature designing. Hand-crafted features can show some linguistic phenomenon, but is time-consuming and can not reveal the connotative semantic meaning of the review. We present a neural network to learn document-level representation. In our model, we not only learn to represent each sentence but also represent the whole document of the review. We apply traditional convolutional neural network to represent the semantic meaning of sentences. We present two variant convolutional neural-network models to learn the document representation. The model taking sentence importance into consideration shows the better performance in deceptive spam detection which enhances the value of F1 by 5 %.

This is a preview of subscription content, log in via an institution.

References

Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)
MATH Google Scholar
Castillo, C., Donato, D., Gionis, A., Murdock, V., Silvestri, F.: Know your neighbors: Web spam detection using the web topology. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 423–430. ACM (2007)
Google Scholar
Chirita, P.A., Diederich, J., Nejdl, W.: Mailrank: using ranking for spam detection. In: Proceedings of the 14th ACM International Conference on Information and Knowledge Management, pp. 373–380. ACM (2005)
Google Scholar
Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. J. Mach. Learn. Res. 12, 2493–2537 (2011)
MATH Google Scholar
Drucker, H., Wu, D., Vapnik, V.N.: Support vector machines for spam categorization. IEEE Trans. Neural Netw. 10(5), 1048–1054 (1999)
Article Google Scholar
Feng, S., Banerjee, R., Choi, Y.: Syntactic stylometry for deception detection. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, vol. 2, pp. 171–175. Association for Computational Linguistics (2012)
Google Scholar
Feng, V.W., Hirst, G.: Detecting deceptive opinions with profile compatibility. In: Proceedings of the 6th International Joint Conference on Natural Language Processing, Nagoya, Japan, pp. 14–18 (2013)
Google Scholar
Fetterly, D., Manasse, M., Najork, M.: Detecting phrase-level duplication on the world wide web. In: Proceedings of the 28th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 170–177. ACM (2005)
Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: A deep learning approach. In: Proceedings of the 28th International Conference on Machine Learning (ICML-2011), pp. 513–520 (2011)
Google Scholar
Gyöngyi, Z., Garcia-Molina, H.: Link spam alliances. In: Proceedings of the 31st International Conference on Very Large Data Bases, pp. 517–528. VLDB Endowment (2005)
Google Scholar
Gyöngyi, Z., Garcia-Molina, H., Pedersen, J.: Combating web spam with trustrank. In: Proceedings of the Thirtieth International Conference on Very Large Data Bases, vol. 30, pp. 576–587. VLDB Endowment (2004)
Google Scholar
Hermann, K.M., Blunsom, P.: The role of syntax in vector space models of compositional semantics. In: ACL, vol. 1, pp. 894–904 (2013)
Google Scholar
Huang, E.H., Socher, R., Manning, C.D., Ng, A.Y.: Improving word representations via global context and multiple word prototypes. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers, vol. 1, pp. 873–882. Association for Computational Linguistics (2012)
Google Scholar
Jindal, N., Liu, B.: Opinion spam and analysis. In: Proceedings of the 2008 International Conference on Web Search and Data Mining, pp. 219–230. ACM (2008)
Google Scholar
Jindal, N., Liu, B., Lim, E.P.: Finding unusual review patterns using unexpected rules. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 1549–1552. ACM (2010)
Google Scholar
Kalchbrenner, N., Grefenstette, E., Blunsom, P.: A convolutional neural network for modelling sentences. arXiv preprint arXiv:1404.2188 (2014)
Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053 (2014)
Li, J.: Feature weight tuning for recursive neural networks. arXiv preprint arXiv:1412.3714 (2014)
Li, J., Jurafsky, D., Hovy, E.: When are tree structures necessary for deep learning of representations? arXiv preprint arXiv:1503.00185 (2015)
Li, J., Ott, M., Cardie, C.: Identifying manipulated offerings on review portals. In: EMNLP, pp. 1933–1942 (2013)
Google Scholar
Li, J., Ott, M., Cardie, C., Hovy, E.: Towards a general rule for identifying deceptive opinion spam
Google Scholar
Lim, E.P., Nguyen, V.A., Jindal, N., Liu, B., Lauw, H.W.: Detecting product review spammers using rating behaviors. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp. 939–948. ACM (2010)
Google Scholar
Metaxas, P.T., DeStefano, J.: Web spam, propaganda and trust. In: AIRWeb, pp. 70–78 (2005)
Google Scholar
Meyer, D.: Fake reviews prompt belkin apology. CNet News (2009)
Google Scholar
Miller, C.: Company settles case of reviews it faked. New York Times (2009)
Google Scholar
Mnih, A., Hinton, G.E.: A scalable hierarchical distributed language model. In: Advances in Neural Information Processing Systems, pp. 1081–1088 (2009)
Google Scholar
Mukherjee, A., Kumar, A., Liu, B., Wang, J., Hsu, M., Castellanos, M., Ghosh, R.: Spotting opinion spammers using behavioral footprints. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 632–640. ACM (2013)
Google Scholar
Mukherjee, A., Liu, B., Glance, N.: Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st International Conference on World Wide Web, pp. 191–200. ACM (2012)
Google Scholar
Mukherjee, A., Liu, B., Wang, J., Glance, N., Jindal, N.: Detecting group review spam. In: Proceedings of the 20th International Conference Companion on World Wide Web, pp. 93–94. ACM (2011)
Google Scholar
Ntoulas, A., Najork, M., Manasse, M., Fetterly, D.: Detecting spam web pages through content analysis. In: Proceedings of the 15th International Conference on World Wide Web, pp. 83–92. ACM (2006)
Google Scholar
Ott, M.: Computational linguistic models of deceptive opinion spam (2013)
Google Scholar
Ott, M., Cardie, C., Hancock, J.: Estimating the prevalence of deception in online review communities. In: Proceedings of the 21st International Conference on World Wide Web, pp. 201–210. ACM (2012)
Google Scholar
Ott, M., Cardie, C., Hancock, J.T.: Negative deceptive opinion spam. In: HLT-NAACL, pp. 497–501 (2013)
Google Scholar
Ott, M., Choi, Y., Cardie, C., Hancock, J.T.: Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 309–319. Association for Computational Linguistics (2011)
Google Scholar
Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of the Empiricial Methods in Natural Language Processing (EMNLP 2014) vol. 12, pp. 1532–1543 (2014)
Google Scholar
Socher, R., Bauer, J., Manning, C.D., Ng, A.Y.: Parsing with compositional vector grammars. In: Proceedings of the ACL Conference. Citeseer (2013)
Google Scholar
Socher, R., Huval, B., Manning, C.D., Ng, A.Y.: Semantic compositionality through recursive matrix-vector spaces. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 1201–1211. Association for Computational Linguistics (2012)
Google Scholar
Socher, R., Lin, C.C., Manning, C., Ng, A.Y.: Parsing natural scenes and natural language with recursive neural networks. In: Proceedings of the 28th International Conference on Machine Learning (ICML-2011), pp. 129–136 (2011)
Google Scholar
Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), vol. 1631, p. 1642. Citeseer (2013)
Google Scholar
Streitfeld, D.: For 2 a star, an online retailer gets 5 star product reviews. New York Times, 26 January 2012
Google Scholar
Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 1555–1565 (2014)
Google Scholar
Topping, A.: Historian orlando figes agrees to pay damages for fake reviews. The Guardian, 16 July 2010
Google Scholar
Turian, J., Ratinov, L., Bengio, Y.: Word representations: a simple and general method for semi-supervised learning. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 384–394. Association for Computational Linguistics (2010)
Google Scholar
Wang, G., Xie, S., Liu, B., Yu, P.S.: Review graph based online store review spammer detection. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp. 1242–1247. IEEE (2011)
Google Scholar
Wu, B., Davison, B.D.: Identifying link farm spam pages. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web, pp. 820–829. ACM (2005)
Google Scholar
Yessenalina, A., Cardie, C.: Compositional matrix-space models for sentiment analysis. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 172–182. Association for Computational Linguistics (2011)
Google Scholar
Yoo, K.H., Gretzel, U.: Comparison of deceptive and truthful travel reviews. In: Höpken, W., Gretzel, U., Law, R. (eds.) Information and Communication Technologies in Tourism, pp. 37–47. Springer, Heidelberg (2009)
Google Scholar

Download references

Acknowledgments

This work was supported by the National High Technology Development 863 Program of China (NSFC) via grant 2015 AA015407, NSFC via grant 61133012 and NSFC via grant 61273321.

Author information

Authors and Affiliations

Research Center for Social Computing and Information Retrieval, Harbin Institute of Technology, Harbin, China
Luyang Li, Wenjing Ren, Bing Qin & Ting Liu

Authors

Luyang Li
View author publications
You can also search for this author in PubMed Google Scholar
Wenjing Ren
View author publications
You can also search for this author in PubMed Google Scholar
Bing Qin
View author publications
You can also search for this author in PubMed Google Scholar
Ting Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bing Qin .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Maosong Sun
Tsinghua University, Beijing, China
Zhiyuan Liu
Soochow University, Suzhou, Jiangsu, China
Min Zhang
Tsinghua University, Beijing, China
Yang Liu

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (http://creativecommons.org/licenses/by-nc/2.5/), which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, L., Ren, W., Qin, B., Liu, T. (2015). Learning Document Representation for Deceptive Opinion Spam Detection. In: Sun, M., Liu, Z., Zhang, M., Liu, Y. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. CCL NLP-NABD 2015 2015. Lecture Notes in Computer Science(), vol 9427. Springer, Cham. https://doi.org/10.1007/978-3-319-25816-4_32

Download citation

DOI: https://doi.org/10.1007/978-3-319-25816-4_32
Published: 08 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25815-7
Online ISBN: 978-3-319-25816-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics