Identifying Unclear Questions in Community Question Answering Websites

  • Jan TrienesEmail author
  • Krisztian Balog
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11437)


Thousands of complex natural language questions are submitted to community question answering websites on a daily basis, rendering them as one of the most important information sources these days. However, oftentimes submitted questions are unclear and cannot be answered without further clarification questions by expert community members. This study is the first to investigate the complex task of classifying a question as clear or unclear, i.e., if it requires further clarification. We construct a novel dataset and propose a classification approach that is based on the notion of similar questions. This approach is compared to state-of-the-art text classification baselines. Our main finding is that the similar questions approach is a viable alternative that can be used as a stepping stone towards the development of supportive user interfaces for question formulation.



We would like to thank Dolf Trieschnigg and Djoerd Hiemstra for their insightful comments on this paper. This work was partially funded by the University of Twente Tech4People Datagrant project.


  1. 1.
    Arora, P., Ganguly, D., Jones, G.J.F.: The good, the bad and their kins: identifying questions with negative scores in stackoverflow. In: Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015, ASONAM 2015, pp. 1232–1239. ACM, New York (2015)Google Scholar
  2. 2.
    Asaduzzaman, M., Mashiyat, A.S., Roy, C.K., Schneider, K.A.: Answering questions about unanswered questions of stack overflow. In: Proceedings of the 10th Working Conference on Mining Software Repositories, MSR 2013, pp. 97–100. IEEE Press, Piscataway (2013)Google Scholar
  3. 3.
    Braslavski, P., Savenkov, D., Agichtein, E., Dubatovka, A.: What do you mean exactly?: analyzing clarification questions in cqa. In: Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval, CHIIR 2017, pp. 345–348. ACM, New York (2017)Google Scholar
  4. 4.
    Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. J. Appl. Psychol. 60(2), 283 (1975)CrossRefGoogle Scholar
  5. 5.
    Correa, D., Sureka, A.: Chaff from the wheat: characterization and modeling of deleted questions on stack overflow. In: Proceedings of the 23rd International Conference on World Wide Web, WWW 2014, pp. 631–642. ACM, New York (2014)Google Scholar
  6. 6.
    Kato, M.P., White, R.W., Teevan, J., Dumais, S.T.: Clarifications and question specificity in synchronous social q&a. In: CHI 2013 Extended Abstracts on Human Factors in Computing Systems, CHI EA 2013, pp. 913–918. ACM, New York (2013)Google Scholar
  7. 7.
    Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, 25–29 October 2014, Doha, Qatar, A meeting of SIGDAT, a Special Interest Group of the ACL, pp. 1746–1751 (2014)Google Scholar
  8. 8.
    Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: Proceedings of the 3rd International Conference on Learning Representations (ICLR) (2015)Google Scholar
  9. 9.
    Li, B., Jin, T., Lyu, M.R., King, I., Mak, B.: Analyzing and predicting question quality in community question answering services. In: Proceedings of the 21st International Conference on World Wide Web, WWW 2012 Companion, pp. 775–782. ACM, New York (2012)Google Scholar
  10. 10.
    Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefGoogle Scholar
  11. 11.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, NIPS 2013, vol. 2, pp. 3111–3119. Curran Associates Inc., USA (2013)Google Scholar
  12. 12.
    Nandi, T., et al.: IIT-UHH at SemEval-2017 Task 3: exploring multiple features for community question answering and implicit dialogue identification. In: Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), pp. 90–97. Association for Computational Linguistics (2017)Google Scholar
  13. 13.
    Ponzanelli, L., Mocci, A., Bacchelli, A., Lanza, M.: Understanding and classifying the quality of technical forum questions. In: Proceedings of the 2014 14th International Conference on Quality Software, QSIC 2014, pp. 343–352. IEEE Computer Society, Washington (2014)Google Scholar
  14. 14.
    Rao, S., Daumé III, H.: Learning to ask good questions: ranking clarification questions using neural expected value of perfect information. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 2737–2746. Association for Computational Linguistics (2018)Google Scholar
  15. 15.
    Ravi, S., Pang, B., Rastagori, V., Kumar, R.: Great question! question quality in community q&a. Int. AAAI Conf. Weblogs Social Media 1, 426–435 (2014)Google Scholar
  16. 16.
    Rose, S., Engel, D., Cramer, N., Cowley, W.: Automatic keyword extraction from individual documents. Text Min. Appl. Theory, 1–20 (2010)Google Scholar
  17. 17.
    Srba, I., Bielikova, M.: A comprehensive survey and classification of approaches for community question answering. ACM Trans. Web 10(3), 18:1–18:63 (2016). ISSN 1559–1131CrossRefGoogle Scholar
  18. 18.
    Tausczik, Y.R., Pennebaker, J.W.: Predicting the perceived quality of online mathematics contributions from users’ reputations. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI 2011, pp. 1885–1888. ACM, New York (2011)Google Scholar
  19. 19.
    Yang, J., Hauff, C., Bozzon, A., Houben, G.-J.: Asking the right question in collaborative q&a systems. In: Proceedings of the 25th ACM Conference on Hypertext and Social Media, HT 2014, pp. 179–189. ACM, New York (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.University of TwenteEnschedeNetherlands
  2. 2.University of StavangerStavangerNorway

Personalised recommendations