Predicting closed questions on community question answering sites using convolutional neural network

Roy, Pradeep Kumar; Singh, Jyoti Prakash

doi:10.1007/s00521-019-04592-0

Predicting closed questions on community question answering sites using convolutional neural network

Original Article
Published: 07 November 2019

Volume 32, pages 10555–10572, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

651 Accesses
19 Citations
1 Altmetric
Explore all metrics

Abstract

Community questions answering sites receive a huge number of questions and answers everyday. It has been observed that a number of questions among them are marked as closed by the site moderators. Such questions increase overhead of the moderators and also create user dissatisfaction. This paper aims to predict whether a newly posted question would be marked as closed in the future or not and also give a tentative reason of being closed. Two models: (1) a baseline model based on traditional machine learning techniques and (2) deep learning models such as convolutional neural network (CNN) and long short-term memory (LSTM) network are used to classify a question into one of the five classes: (1) open, (2) off-topic, (3) not a real question, (4) too constructive and (5) too localized. The baseline model requires the handcrafted features and hence does not preserve semantics. However, CNN and LSTM networks are capable of preserving the semantics of question’s word and extracting the hidden features from the textual content using multiple hidden layers. The LSTM network performs better compared to CNN and traditional machine learning models. The proposed model can be used as an initial filter to screen the closed question at the time of posting, which reduced the overheads of site moderators. To the best of our knowledge, this is the first work that predicts the closed question along with the reason the question will be closed. This helps the questioner to modify the question before posting. The experimental results with the dataset of Stack Overflow prove the effectiveness of the proposed model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey on deep learning approaches for text-to-SQL

Article Open access 23 January 2023

George Katsogiannis-Meimarakis & Georgia Koutrika

The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives

Article Open access 17 February 2024

Marco Cascella, Federico Semeraro, … Elena Bignami

Effectiveness of Fine-tuned BERT Model in Classification of Helpful and Unhelpful Online Customer Reviews

Article 29 April 2022

Muhammad Bilal & Abdulwahab Ali Almazroi

Notes

Master question is the question that has similar content to the new question.
The greatest integer that is less than or equal to the value x.

References

Agichtein E, Castillo C, Donato D, Gionis A, Mishne G (2008) Finding high-quality content in social media. In: Proceedings of the 2008 international conference on web search and data mining. ACM, pp 183–194
Roy PK, Ahmad Z, Singh JP, Alryalat MAA, Rana NP, Dwivedi YK (2018) Finding and ranking high-quality answers in community question answering sites. Glob J Flex Syst Manag 19:53–68
Article Google Scholar
ClosedQuestion (2018) https://stackoverflow.com/help/closed-questions. Accessed 16 Feb 2018
Correa D, Sureka A (2013) Fit or unfit: analysis and prediction of ‘closed questions’ on stack overflow. In: Proceedings of the first ACM conference on Online social networks. ACM, pp 201–212
Ahasanuzzaman M, Asaduzzaman M, Roy C K, Schneider KA (2016) Mining duplicate questions of stack overflow. In: 2016 IEEE/ACM 13th working conference on mining software repositories (MSR). IEEE, pp 402–412
Zhang Y, Lo D, Xia X, Sun J-L (2015) Multi-factor duplicate question detection in stack overflow. J Comput Sci Technol 30(5):981–997
Article Google Scholar
Zhang WE, Sheng QZ, Lau JH, Abebe E, Ruan W (2018) Duplicate detection in programming question answering communities. ACM Trans Internet Technol (TOIT) 18(3):37
Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Article Google Scholar
Jeon J, Croft W B, Lee JH, Park S (2006) A framework to predict the quality of answers with non-textual features. In: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp 228–235
Blooma MJ, Chua AY, Goh DH-L (2008) A predictive framework for retrieving the best answer. In: Proceedings of the 2008 ACM symposium on applied computing. ACM, pp 1107–1111
Toba H, Ming Z-Y, Adriani M, Chua T-S (2014) Discovering high quality answers in community question answering archives using a hierarchy of classifiers. Inf Sci 261:101–115
Article MathSciNet Google Scholar
Wu H, Tian Z, Wu W, Chen E (2017) An unsupervised approach for low-quality answer detection in community question-answering. In: International conference on database systems for advanced applications. Springer, pp 85–101
Lee CT, Rodrigues EM, Kazai G, Milic-Frayling N, Ignjatovic A (2009) Model for voter scoring and best answer selection in community Q&A services. In: IEEE/WIC/ACM international joint conferences on web intelligence and intelligent agent technologies, 2009. WI-IAT’09, vol 1. IEEE, pp 116–123
Shah C, Pomerantz J (2010) Evaluating and predicting answer quality in community QA. In: Proceedings of the 33rd international ACM SIGIR conference on research and development in information retrieval. ACM, pp 411–418
Zhu Z, Bernhard D, Gurevych I (2009) A multi-dimensional model for assessing the quality of answers in social Q&A sites. PhD thesis
Blooma MJ, Chua AY-K, Goh DH-L (2010) Selection of the best answer in CQA services. In: 2010 Seventh international conference on information technology: new generations (ITNG). IEEE, pp 534–539
Srba I, Bielikova M (2016) Why is stack overflow failing? Preserving sustainability in community question answering. IEEE Softw 33(4):80–89
Article Google Scholar
Correa D, Sureka A (2014) Chaff from the wheat: characterization and modeling of deleted questions on stack overflow. In: Proceedings of the 23rd international conference on world wide web. ACM, pp 631–642
Ponzanelli L, Mocci A, Bacchelli A, Lanza M, Fullerton D (2014) Improving low quality stack overflow post detection. In: 2014 IEEE international conference on software maintenance and evolution (ICSME). IEEE, pp 541–544
Zhang WE, Sheng QZ, Shu Y, Nguyen VK (2017) Feature analysis for duplicate detection in programming QA communities. In: International conference on advanced data mining and applications. Springer, pp 623–638
Mizobuchi Y, Takayama K (2017) Two improvements to detect duplicates in stack overflow. In: 2017 IEEE 24th international conference on software analysis, evolution and reengineering (SANER). IEEE, pp 563–564
Zhang WE, Sheng QZ, Lau JH, Abebe E (2017) Detecting duplicate posts in programming QA communities via latent semantics and association rules. In: Proceedings of the 26th international conference on world wide web. International World Wide Web Conferences Steering Committee, pp 1221–1229
Hoogeveen D, Bennett A, Li Y, Verspoor KM, Baldwin T (2018) Detecting misflagged duplicate questions in community question-answering archives. In: Twelfth international AAAI conference on web and social media, pp 112–120
Liang D, Zhang F, Zhang W, Zhang Q, Fu J, Peng M, Gui T, Huang X (2019) Adaptive multi-attention network incorporating answer information for duplicate question detection. In: Proceedings of the 42Nd international ACM SIGIR conference on research and development in information retrieval, SIGIR’19, New York, NY, USA. ACM, pp 95–104
Abric D, Clark OE, Caminiti M, Gallaba K, McIntosh S (2019) Can duplicate questions on stack overflow benefit the software development community? In: Proceedings of the 16th international conference on mining software repositories. IEEE Press, pp 230–234
Yang L, Bao S, Lin Q, Wu X, Han D, Su Z, Yu Y (2011) Analyzing and predicting not-answered questions in community-based question answering services. In: AAAI, vol 11, pp 1273–1278
Dror G, Maarek Y, Szpektor I (2013) Will my question be answered? Predicting “question answerability” in community question-answering sites. In: ECML/PKDD, vol 3, pp499–514
Asaduzzaman M, Mashiyat AS, Roy CK, Schneider KA (2013) Answering questions about unanswered questions of stack overflow. In: 2013 10th IEEE working conference on mining software repositories (MSR). IEEE, pp 97–100
Liu J, Shen H, Yu L (2017) Question quality analysis and prediction in community question answering services with coupled mutual reinforcement. IEEE Trans Serv Comput 10(2):286–301
Article Google Scholar
Xia X, Lo D, Correa D, Sureka A, Shihab E (2016) It takes two to tango: deleted stack overflow question prediction with text and meta features. In: 2016 IEEE 40th annual computer software and applications conference (COMPSAC), vol 1. IEEE, pp 73–82
Rish I (2001) An empirical study of the naive Bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, vol 3. IBM, pp 41–46
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232
Article MathSciNet Google Scholar
Liaw A, Wiener M et al (2002) Classification and regression by randomforest. R News 2(3):18–22
Google Scholar
Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
Article Google Scholar
Yin Z, Kong D, Shao G, Ning X, Jin W, Wang JY (2016) A-optimal convolutional neural network. Neural Comput Appl 30(7):2295–2304
Article Google Scholar
Becherer N, Pecarina J, Nykl S, Hopkinson K (2017) Improving optimization of convolutional neural networks through parameter fine-tuning. Neural Comput Appl 31(8):3469–3479
Article Google Scholar
Singh JP, Irani S, Rana NP, Dwivedi YK, Saumya S, Roy PK (2017) Predicting the “helpfulness” of online consumer reviews. J Bus Res 70:346–355
Article Google Scholar
LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436
Article Google Scholar
Pennington J, Socher R, Manning CD (2014) Glove: global vectors for word representation. In: Empirical methods in natural language processing (EMNLP), pp 1532–1543
Severyn A, Moschitti A (2015) Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th international ACM SIGIR conference on research and development in information retrieval. ACM, pp 373–382
Xu B, Wang N, Chen T, Li M (2015) Empirical evaluation of rectified activations in convolutional network. arXiv preprint arXiv:1505.00853
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12:2493–2537
MATH Google Scholar
Ruder S (2016) An overview of gradient descent optimization algorithms. arXiv preprint arXiv:1609.04747
Pascanu R, Gulcehre C, Cho K, Bengio Y (2013) How to construct deep recurrent neural networks. arXiv preprint arXiv:1312.6026
Jiang K, Feng S, Song Q, Calix RA, Gupta M, Bernard GR (2018) Identifying tweets of personal health experience through word embedding and LSTM neural network. BMC Bioinform 19(8):210
Article Google Scholar
Lee JY, Dernoncourt F (2016) Sequential short-text classification with recurrent and convolutional neural networks. arXiv preprint arXiv:1603.03827
Zhou C, Sun C, Liu Z, Lau F (2015) A C-LSTM neural network for text classification. arXiv preprint arXiv:1511.08630
Hua Y, Zhao Z, Li R, Chen X, Liu Z, Zhang H (2019) Deep learning with long short-term memory for time series prediction. IEEE Commun Mag 57:114–119
Article Google Scholar
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, pp 315–323
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357
Article Google Scholar
Chall JS, Dale E (1995) Manual for use of the new Dale-Chall readability formula. Brookline Books, Brookline
Google Scholar
Kincaid JP, Fishburne RP Jr, Rogers RL, Chissom BS (1975) Derivation of new readability formulas (automated readability index, fog count and Flesch reading ease formula) for navy enlisted personnel. Technical report, DTIC Document

Download references

Author information

Pradeep Kumar Roy
Present address: Department of Information Technology, Vellore Institute of Technology, Vellore, TN, India

Authors and Affiliations

Department of Computer Science and Engineering, National Institute of Technology Patna, Patna, India
Pradeep Kumar Roy & Jyoti Prakash Singh

Authors

Pradeep Kumar Roy
View author publications
You can also search for this author in PubMed Google Scholar
Jyoti Prakash Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pradeep Kumar Roy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: List of extracted textual features with their description

See Fig. 24 and Table 21.

Table 21 List of selected features with their description on stack overflow dataset

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roy, P.K., Singh, J.P. Predicting closed questions on community question answering sites using convolutional neural network. Neural Comput & Applic 32, 10555–10572 (2020). https://doi.org/10.1007/s00521-019-04592-0

Download citation

Received: 13 June 2018
Accepted: 24 October 2019
Published: 07 November 2019
Issue Date: July 2020
DOI: https://doi.org/10.1007/s00521-019-04592-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Predicting closed questions on community question answering sites using convolutional neural network

Abstract

Access this article

Similar content being viewed by others

A survey on deep learning approaches for text-to-SQL

The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives

Effectiveness of Fine-tuned BERT Model in Classification of Helpful and Unhelpful Online Customer Reviews

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: List of extracted textual features with their description

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting closed questions on community question answering sites using convolutional neural network

Abstract

Access this article

Similar content being viewed by others

A survey on deep learning approaches for text-to-SQL

The Breakthrough of Large Language Models Release for Medical Applications: 1-Year Timeline and Perspectives

Effectiveness of Fine-tuned BERT Model in Classification of Helpful and Unhelpful Online Customer Reviews

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: List of extracted textual features with their description

Appendix: List of extracted textual features with their description

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation