Towards Semantic Quality Enhancement of User Generated Content

  • José María González PintoEmail author
  • Niklas Kiehne
  • Wolf-Tilo Balke
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11279)


With the increasing amount of user-generated content such as scientific blogs, questioning-answering archives (Quora or Stack Overflow), and Wikipedia, the challenge to evaluate quality naturally arises. Previous work has shown the potential to evaluate automatically such content focusing on syntactic and pragmatic levels such as conciseness, organization, and readability. We push forward these efforts and focus on how to develop an intelligent service to ease the engagement of users in two semantic attributes: factual accuracy, e.g., whether facts are correct and validity, e.g., whether reliable sources support the content. To do so, we deploy a Deep Learning approach to learn citation categories from Wikipedia. Thus, we introduce an automatic mechanism that can accurately determine what specific citation category is needed to help users increase the value of their contribution at a semantic level. To that end, we automatically learn linguistic patterns from Wikipedia to support a broad range of fields. We extensively evaluated several machine learning models to learn from more than one million annotated sentences from the massive effort of Wikipedia contributors. We evaluate the performance of the different methods and present a profound analysis focusing on the balance accuracy achieved.


Automatic quality enhancement User-generated content Data curation 


  1. 1.
    Adler, B.T., de Alfaro, L.: A content-driven reputation system for the Wikipedia. In: Proceedings of the 16th International Conference on World Wide Web - WWW 2007, pp. 261–270 (2007)Google Scholar
  2. 2.
    Bahdanau, D., et al.: Neural Machine Translation by Jointly Learning to Align and Translate, pp. 1–15 (2014)Google Scholar
  3. 3.
    Bird, S., Loper, E.: NLTK: the natural language toolkit. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, pp. 1–4 (2004)Google Scholar
  4. 4.
    Bojanowski, P., et al.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)Google Scholar
  5. 5.
    Brown, A.R.: Wikipedia as a data source for political scientists: accuracy and completeness of coverage. PS Polit. Sci. Polit. 44(02), 339–343 (2011)CrossRefGoogle Scholar
  6. 6.
    Clauson, K.A., et al.: Scope, completeness, and accuracy of drug information in Wikipedia. Ann. Pharmacother. 42(12), 1814–1821 (2008)CrossRefGoogle Scholar
  7. 7.
    Dang, Q.V., Ignat, C.-L.: Quality assessment of Wikipedia articles without feature engineering. In: Proceedings of 16th ACM/IEEE-CS on Joint Conference on Digital Libraries - JCDL 2016, pp. 27–30 (2016)Google Scholar
  8. 8.
    Fetahu, B., et al.: Finding News Citations for Wikipedia (2017)Google Scholar
  9. 9.
    Goodfellow, I., et al.: Deep learning. Nature 521(7553), 800 (2016)MathSciNetzbMATHGoogle Scholar
  10. 10.
    Gorla, N., et al.: Organizational impact of system quality, information quality, and service quality. J. Strateg. Inf. Syst. 19(3), 207–228 (2010)MathSciNetCrossRefGoogle Scholar
  11. 11.
    Graves, A., et al.: Speech recognition with deep recurrent neural networks. In: ICASSP, vol. 3, pp. 6645–6649 (2013)Google Scholar
  12. 12.
    Greff, K., et al.: LSTM: a search space odyssey (2016)Google Scholar
  13. 13.
    Dalip, D.H., et al.: A general multiview framework for assessing the quality of collaboratively created content on web 2.0. J. Assoc. Inf. Sci. Technol. 68(2), 286–308 (2017)CrossRefGoogle Scholar
  14. 14.
    Hochreiter, S., Urgen Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  15. 15.
    Kim, Y.: Convolutional Neural Networks for Sentence Classification, pp. 1746–1751 (2014)Google Scholar
  16. 16.
    Kräenbring, J., et al.: Accuracy and completeness of drug information in Wikipedia: a comparison with standard textbooks of pharmacology. PLoS One 9(9), e106930 (2014)CrossRefGoogle Scholar
  17. 17.
    Krizhevsky, A., et al.: ImageNet classification with deep convolutional neural networks. Adv. Neural. Inf. Process. Syst. 25, 1–9 (2012)Google Scholar
  18. 18.
    Lee, Y.W., et al.: AIMQ: a methodology for information quality assessment. Inf. Manag. 40(2), 133–146 (2002)CrossRefGoogle Scholar
  19. 19.
    Madnick, S.E., et al.: Overview and framework for data and information quality research. ACM J. Data Inf. Q. 1(1), 1–22 (2009)Google Scholar
  20. 20.
    Manning, C.D., Raghavan, P.: An Introduction to Information Retrieval (2009)Google Scholar
  21. 21.
    Mesgari, M., et al.: “The Sum of All Human Knowledge”: A Systematic Review of Scholarly Research on the Content of Wikipedia (2015)Google Scholar
  22. 22.
    Mikolov, T., et al.: Efficient estimation of word representations in vector space. In: Proceedings of International Conference on Learning Representations (ICLR 2013), pp. 1–12 (2013)Google Scholar
  23. 23.
    Palangi, H., et al.: Deep sentence embedding using long short-term memory networks: analysis and application to information retrieval. IEEE/ACM Trans. Audio Speech Lang. Process. 24(4), 694–707 (2016)CrossRefGoogle Scholar
  24. 24.
    Royal, C., Kapila, D.: What’s on Wikipedia, and what’s not… ? Assessing completeness of information. Soc. Sci. Comput. Rev. 27(1), 138–148 (2009)CrossRefGoogle Scholar
  25. 25.
    Stvilia, B., et al.: Information quality work organization in Wikipedia. J. Am. Soc. Inf. Sci. Technol. 59(6), 983–1001 (2008)CrossRefGoogle Scholar
  26. 26.
    Sun, Y. et al.: Deep convolutional network cascade for facial point detection. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 3476–3483 (2013)Google Scholar
  27. 27.
    Zhang, Y., Wallace, B.: A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification, pp. 253–263 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • José María González Pinto
    • 1
    Email author
  • Niklas Kiehne
    • 1
  • Wolf-Tilo Balke
    • 1
  1. 1.Institut für InformationssystemeTU BraunschweigBrunswickGermany

Personalised recommendations