Skip to main content
Log in

The distributed representation for societal risk classification toward BBS posts

  • Published:
Journal of Systems Science and Complexity Aims and scope Submit manuscript

Abstract

The risk classification of BBS posts is important to the evaluation of societal risk level within a period. Using the posts collected from Tianya forum as the data source, the authors adopted the societal risk indicators from socio psychology, and conduct document-level multiple societal risk classification of BBS posts. To effectively capture the semantics and word order of documents, a shallow neural network as Paragraph Vector is applied to realize the distributed vector representations of the posts in the vector space. Based on the document vectors, the authors apply one classification method KNN to identify the societal risk category of the posts. The experimental results reveal that paragraph vector in document-level societal risk classification achieves much faster training speed and at least 10% improvements of F-measures than Bag-of-Words. Furthermore, the performance of paragraph vector is also superior to edit distance and Lucene-based search method. The present work is the first attempt of combining document embedding method with socio psychology research results to public opinions area.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Zheng Y and Tok S K, “Harmonious Society” and “Harmonious World”: China’s policy discourse under Hu Jintao, China Policy Institute, The University of Nottingham, UK. Briefing Series, 2007, 26.

    Google Scholar 

  2. Tang X J, Exploring online societal risk perception for harmonious society measurement, J. of Systems Science and Systems Engineering, 2013, 22(4): 469–486.

    Article  Google Scholar 

  3. Tang X J, Qualitative meta-synthesis techniques for analysis of public opinions for in-depth study, Proceedings of the 1st International Conference on Complex Sciences: Theory and Applications II (ed. by Zhou J), Springer, LNICST, Shanghai, 2009, 5: 2338–2353.

    Article  Google Scholar 

  4. Gu J F, Tang X J, and Niu W Y, Meta-synthesis system approach for solving social complex problems, The 1st International Congress of the International Federation for Systems Research (IFSR2005), Kobe, Japan, 2005.

    Google Scholar 

  5. Song L F, Societal risk index and mechanism of social fluctuation, Sociological Research, 1995, 6: 90–95 (in Chinese).

    Google Scholar 

  6. Wang E P, Social monitoring system based on public attitudes survey, Bulletin of the Chinese Academy of Sciences, 2006, 21(2): 125–131 (in Chinese).

    Google Scholar 

  7. Zheng R, Zhou J, and Chen X F, Applying the social psychological behavior research to promote the innovation of social management, Bulletin of Chinese Academy of Sciences (in Chinese), 2012, 27(1): 24–30.

    Google Scholar 

  8. Dodds P S and Danforth C M, Measuring the happiness of large-scale written expression: Songs, blogs, and presidents, Journal of Happiness Study, 2010, 11: 441–456.

    Article  Google Scholar 

  9. Cao L N and Tang X J, Topics and threads of the online public concerns based on Tianya forum, Journal of Systems Science and Systems Engineering, 2014, 23(2): 212–230.

    Article  Google Scholar 

  10. Hao B, Li L, Gao R, et al., Sensing subjective well-being from social media. Proceedings of the 10th International Conference on Active Media Technology (eds. by ślezak D, Schaefer G, and Vuong S, et al.), Springer, LNCS, 2014, 324–335.

  11. Hu Y and Tang X J, Using support vector machine for classification of Baidu hot word, Proceedings of the 2013 International Conference on Knowledge Smcience, Engineering and Management (KSEM2013) (ed. by Wang M, Dalian, China), Springer, LNCS, 2013, 580–590.

    Google Scholar 

  12. Zheng R, Shi K, and Li S, The influence factors and mechanism of societal risk perception, Proceedings of the First International Conference on Complex Sciences: Theory and Application (ed. by Zhou J, Shanghai, China), Springer, Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 2009, 2266–2275.

    Google Scholar 

  13. Chen J D and Tang X J, Exploring societal risk classification of the posts of Tianya club, International Journal of Knowledge and Systems Science, 2014, 5(1): 36–48.

    Article  Google Scholar 

  14. Zhang W, Yoshida T, and Tang X J, Text classification based on multi-word with support vector machine, Knowledge-Based Systems, 2008, 21(8): 879–886.

    Article  Google Scholar 

  15. Bengio Y, Ducharme R, Vincent P, et al., A neural probabilistic language model, Journal of Machine Learning Research, 2003, 3: 1137–1155.

    MATH  Google Scholar 

  16. Zhang W, Yoshida T, and Tang X J, A comparative study of TFIDF, LSI and multi-words for text classification, Expert Systems with Applications, 2011, 38(3): 2758–2765.

    Article  Google Scholar 

  17. Cover T Mand Hart P E, Nearest neighbor pattern classification, IEEE Transactions on Information Theory, 1967, 13(1): 21–27.

    Article  MATH  Google Scholar 

  18. Qiu L, Cao Y, Nie Z Q, et al., Learning word representation considering proximity and ambiguity, Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (eds. by Brodley C E and Stone P, Québec, Canada), AAAI, 2014, 1572–1578.

    Google Scholar 

  19. Collobert R, Weston J, Bottou L, et al., Natural language processing (almost) from scratch, Journal of Machine Learning Research, 2011, 12: 2461–2505.

    MATH  Google Scholar 

  20. Mikolov T, Chen K, Corrado G, et al., Efficient estimation of word representations in vector space, Proceedings of Workshop at the International Conference on Learning Representations 2013 (Scottsdale, Arizona, US), 2013, 1–12.

    Google Scholar 

  21. Jeffrey P, Richard S, and Christopher M, Glove: Global vectors for word representation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (Doha, Qatar), Stroudsburg, Association for Computational Linguistics, 2014, 1532–1543.

    Google Scholar 

  22. Mitchell J and Lapata M, Composition in distributional models of semantics, Cognitive Science, 2010, 34(8): 1388–1429.

    Article  Google Scholar 

  23. Mikolov T, Sutskever I, Chen K, et al., Distributed representations of words and phrases and their compositionality, Proceedings of Advances in Neural Information Processing Systems 2013 (NIPS 2013) (eds. by Burges C J C, Bottou L, and Welling M, et al., Lake Tahoe, Nevada, US), 2013, 3111–3119.

    Google Scholar 

  24. Richard S, Cliff C L, Andrew Y N, et al., Parsing natural scenes and natural language with recursive neural networks, Proceedings of the 28th International Conference on Machine Learning (ICML-11) (Bellevue, Washington, USA), JMLR Workshop and Conference Proceedings, 2011, 129–136.

    Google Scholar 

  25. Richard S, Alex P, Jean W, et al., Recursive deep models for semantic compositionality over a sentiment treebank, Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (Seattle, Washington), Stroudsburg, Association for Computational Linguistics, 2013, 1631–1642.

    Google Scholar 

  26. Le Q and Mikolov T, Distributed representations of sentences and documents, Proceedings of the 31st International Conference on Machine Learning (ICML-14) (Beijing, China), JMLR Workshop and Conference Proceedings, 2014, 1188–1196.

    Google Scholar 

  27. Zhao Y L and Tang X J, A preliminary research of pattern of users’ behavior based on Tianya forum, The 14th International Symposium on Knowledge and Systems Sciences (eds. by Wang S Y, Nakamori Y, and Jin W L, Ningbo, China), JAIST Press, 2013, 139–145.

    Google Scholar 

  28. Zhang Z D, A web mining system based on Tianya Forum—The design and realization of Tianya forum Vision 1.0, Graduate University of Chinese Academy of Sciences, 2012.

    Google Scholar 

  29. Wen S Y and Wan X J, Emotion classification in Microblog texts using class sequential rules, Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (eds. by Brodley C E and Stone P, Québec, Canada), AAAI, 2014, 187–193.

    Google Scholar 

  30. Wagner R and Fischer M, The string-to-string correction problem, Journal of ACM, 1974, 21(1): 168–178.

    Article  MathSciNet  MATH  Google Scholar 

  31. Hirsch L, Hirsch R, and Saeedi M, Evolving Lucene search queries for text classification, Proceeding of 9th Annual Conference on Genetic and Evolutionary Computation (London, England), ACM, 2007, 1604–1611.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jindong Chen.

Additional information

This research is supported by the National Natural Science Foundation of China under Grant Nos. 71171187, 71371107, and 61473284.

This paper was recommended for publication by Editor WANG Shouyang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, J., Tang, X. The distributed representation for societal risk classification toward BBS posts. J Syst Sci Complex 30, 627–644 (2017). https://doi.org/10.1007/s11424-016-5099-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11424-016-5099-z

Keywords

Navigation