Multilayer Convolutional Neural Network to Filter Low Quality Content from Quora

Abstract

Question answering (QA) websites now play a crucial role in meeting Internet users’ information needs. Quora is a growing QA platform where users get quick answers to their questions from their peers. Nonetheless, it is noted that a significant number of questions remained unanswered for a long time. Questions that have long been unable to receive any answer, opinion-based, need a debate to get the answers, or a valid answer does not exist, fall under Insincere question group. It is therefore important to weed out Insincere questions in order to maintain the integrity of the site. Quora have a huge number of such questions that can not be filtered manually. To overcome this problem, this paper proposes a multi-layer convolutional neural network model that helps to minimize Insincere questions from the website. Two embeddings were created from Quora dataset: (i) using Skipgram, and (ii) using Continuous Bag of Word model. The created embeddings and a pre-trained GloVe embedding vector were used for system development. The proposed model needs only the question text to predict the question is Insincere question or not and hence free from manual feature engineering. The experimental results indicated that the proposed multilayer CNN model outperforming over the earlier works by achieving the F1-score of 0.98 for the best case.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2

References

  1. 1.

    Blooma MJ, Kurian JC (2011) Research issues in community based question answering. In: PACIS, pp 1–9

  2. 2.

    Roy PK, Singh JP, Baabdullah AM, Kizgin H, Rana NP (2018) Identifying reputation collectors in community question answering (CQA) sites: exploring the dark side of social media. Int J Inf Manag 42:25–35

    Article  Google Scholar 

  3. 3.

    Anderson A, Huttenlocher D, Kleinberg J, Leskovec J (2012) Discovering value from community activity on focused question answering sites: a case study of stack overflow. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 850–858

  4. 4.

    Paul SA, Hong L, Chi EH (2012) Who is authoritative? understanding reputation mechanisms in quora. pp 1–8. arXiv preprint arXiv:1204.3724

  5. 5.

    Guzmán F, Nakov P, Màrquez L (2016) MTE-NN at SemEval-2016 task 3: can machine translation evaluation help community question answering? In: Proceedings of the 10th international workshop on semantic evaluation (SemEval-2016), pp 887–895

  6. 6.

    Tian Y, Kochhar PS, Lim EP, Zhu F, Lo D (2013) Predicting best answerers for new questions: an approach leveraging topic modeling and collaborative voting. In: Workshops at the international conference on social informatics, Springer, Berlin, pp 55–68

  7. 7.

    Maity SK, Kharb A, Mukherjee A (2018) Analyzing the linguistic structure of question texts to characterize answerability in quora. IEEE Trans Comput Soc Syst 5(3):816–828

    Article  Google Scholar 

  8. 8.

    Wang G, Gill K, Mohanlal M, Zheng H, Zhao BY (2013) Wisdom in the social crowd: an analysis of quora. In: 22nd international world wide web conference, WWW ’13, Rio de Janeiro, Brazil, May 13–17, 2013, pp 1341–1352

  9. 9.

    Ahasanuzzaman M, Asaduzzaman M, Roy CK, Schneider KA (2016) Mining duplicate questions of stack overflow. In: IEEE/ACM 13th working conference on mining software repositories (MSR), 2016, IEEE, pp 402–412

  10. 10.

    Hoogeveen D, Bennett A, Li Y, Verspoor KM, Baldwin T (2018) Detecting misflagged duplicate questions in community question-answering archives. In: ICWSM, pp 112–120

  11. 11.

    Zhang WE, Sheng QZ, Lau JH, Abebe E, Ruan W (2018) Duplicate detection in programming question answering communities. ACM Trans Int Technol (TOIT) 18(3):37

    Google Scholar 

  12. 12.

    Al-Ramahi M, Alsmadi I (2020) Using data analytics to filter insincere posts from online social networks a case study: quora insincere questions. In: Proceedings of the 53rd Hawaii international conference on system sciences, pp 2489–2497

  13. 13.

    Jain DK, Jain R, Upadhyay Y, Kathuria A, Lan X (2019) Deep refinement: capsule network with attention mechanism-based system for text classification. Neural Comput Appl 1–18

  14. 14.

    Mungekar A, Parab N, Nima P, Pereira S (2019) Quora insincere question classification. Natl College Irel 1–7

  15. 15.

    Priyambowo H, Adriani M (2019) Insincere question classification on question answering forum. In: 2019 International conference on electrical engineering and informatics (ICEEI), IEEE, pp 390–394

  16. 16.

    Gabbard S, Yang J, Liu J (2018) Quora insincere question classification. Baskin Engineering, University of California, Santa Cruz pp 1–6

  17. 17.

    Silva RF, Paixão K, de Almeida Maia M (2018) Duplicate question detection in stack overflow: a reproducibility study. In: 2018 IEEE 25th international conference on software analysis evolution and reengineering (SANER), IEEE, pp 572–581

  18. 18.

    Yih Wt, He X, Meek C (2014) Semantic parsing for single-relation question answering. In: Proceedings of the 52nd annual meeting of the association for computational linguistics (Volume 2: Short Papers), vol 2, pp 643–648

  19. 19.

    Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537

    MATH  Google Scholar 

  20. 20.

    Zhang Y, Lo D, Xia X, Sun JL (2015) Multi-factor duplicate question detection in stack overflow. J Comput Sci Technol 30(5):981–997

    Article  Google Scholar 

  21. 21.

    Roy PK, Ahmad Z, Singh JP, Alryalat MAA, Rana NP, Dwivedi YK (2018) Finding and ranking high-quality answers in community question answering sites. Glob J Flex Syst Manag 19(1):53–68

    Article  Google Scholar 

  22. 22.

    Wang XJ, Tu X, Feng D, Zhang L (2009) Ranking community answers by modeling question–answer relationships via analogical reasoning. In: Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval, ACM, pp 179–186

  23. 23.

    Blooma MJ, Chua AYK, Goh DHL (2010) Selection of the best answer in CQA services. In: Seventh international conference on information technology: new generations (ITNG), 2010, IEEE, pp 534–539

  24. 24.

    Patil S, Lee K (2016) Detecting experts on quora: by their activity, quality of answers, linguistic characteristics and temporal behaviors. Soc Netw Anal Min 6(1):1–25

    Article  Google Scholar 

  25. 25.

    Abishek K, Hariharan BR, Valliyammai C (2019) An enhanced deep learning model for duplicate question pairs recognition. In: Soft computing in data analytics, Springer, Berlin, pp 769–777

  26. 26.

    Mueller J, Thyagarajan A (2016) Siamese recurrent architectures for learning sentence similarity. In: Thirtieth AAAI conference on artificial intelligence, pp 2786–2792

  27. 27.

    Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532–1543

  28. 28.

    Saedi C, Rodrigues J, Silva J, Branco A, Maraev V (2017) Learning profiles in duplicate question detection. In: 2017 IEEE international conference on information reuse and integration (IRI), pp 544–550. https://doi.org/10.1109/IRI.2017.39

  29. 29.

    Bacchelli A (2013) Dataset: mining challenge 2013: “mining challenge 2013: stack overflow”. In: 10th international conference on mining software repositories (MSR)

  30. 30.

    Ying AT (2015) Mining challenge 2015: comparing and combining different information sources on the stack overflow data set. In: The 12th working conference on mining software repositories

  31. 31.

    Dror G, Maarek Y, Szpektor I (2013) Will my question be answered? predicting “question answerability” in community question-answering sites. In: Blockeel H, Kersting K, Nijssen S, Železný F (eds) Machine learning and knowledge discovery in databases. Springer, Heidelberg, pp 499–514

    Google Scholar 

  32. 32.

    Yang L, Bao S, Lin Q, Wu X, Han D, Su Z, Yu Y (2011) Analyzing and predicting not-answered questions in community-based question answering services. In: AAAI, pp 1273–1278

  33. 33.

    Srba I, Bielikova M (2016) Why is stack overflow failing? preserving sustainability in community question answering. IEEE Softw 33(4):80–89

    Article  Google Scholar 

  34. 34.

    Wang G, Gill K, Mohanlal M, Zheng H, Zhao BY (2013) Wisdom in the social crowd: an analysis of quora. In: Proceedings of the 22nd international conference on world wide web, ACM, pp 1341–1352

  35. 35.

    Gaire B, Rijal B, Gautam D, Sharma S, Lamichhane N (2019) Insincere question classification using deep learning. Int J Sci Eng Res 10:2001–2004

    Google Scholar 

  36. 36.

    Singh JP, Irani S, Rana NP, Dwivedi YK, Saumya S, Roy PK (2017) Predicting the “helpfulness” of online consumer reviews. J Bus Res 70:346–355

    Article  Google Scholar 

  37. 37.

    Saumya S, Singh JP, Dwivedi YK (2019) Predicting the helpfulness score of online reviews using convolutional neural network. Soft Comput 1–17

  38. 38.

    Kalchbrenner N, Grefenstette E, Blunsom P (2014) A convolutional neural network for modelling sentences. pp 1–11. arXiv preprint arXiv:1404.2188

  39. 39.

    Lee Y, Chung M, Cho S, Choi J (2019) Extraction of product evaluation factors with a convolutional neural network and transfer learning. Neural Process Lett 1–16

  40. 40.

    Kim Y (2014) Convolutional neural networks for sentence classification. pp 1–6. arXiv preprint arXiv:1408.5882

  41. 41.

    Sadr H, Pedram MM, Teshnehlab M (2019) A robust sentiment analysis method based on sequential combination of convolutional and recursive neural networks. Neural Process Lett 1–17

  42. 42.

    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105

  43. 43.

    Wen S, Liu W, Yang Y, Zhou P, Guo Z, Yan Z, Chen Y, Huang T (2020) Multilabel image classification via feature/label co-projection. IEEE Trans Syst Man Cybern Syst. https://doi.org/10.1109/TSMC.2020.2967071

    Article  Google Scholar 

  44. 44.

    Wen S, Dong M, Yang Y, Zhou P, Huang T, Chen Y (2019a) End-to-end detection-segmentation system for face labeling. IEEE Trans Emerg Top Comput Intell 1–11

  45. 45.

    Wen S, Wei H, Yan Z, Guo Z, Yang Y, Huang T, Chen Y (2019b) Memristor-based design of sparse compact convolutional neural network. IEEE Transactions on Network Science and Engineering pp 1–11

  46. 46.

    Zhang Y, Zhao D, Sun J, Zou G, Li W (2016) Adaptive convolutional neural network and its application in face recognition. Neural Process Lett 43(2):389–399

    Article  Google Scholar 

  47. 47.

    Kumar A, Singh JP (2019) Location reference identification from tweets during emergencies: a deep learning approach. Int J Disas Risk Reduct 33:365–375

    Article  Google Scholar 

  48. 48.

    Roy PK, Singh JP (2019) Predicting closed questions on community question answering sites using convolutional neural network. Neural Comput Appl 1–18

  49. 49.

    Bottou L (2010) Large-scale machine learning with stochastic gradient descent. In: Proceedings of COMPSTAT’2010, Springer, Berlin, pp 177–186

  50. 50.

    Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  Google Scholar 

  51. 51.

    Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 1189–1232

  52. 52.

    Rish I (2001) An empirical study of the naive bayes classifier. In: IJCAI 2001 workshop on empirical methods in artificial intelligence, IBM, pp 41–46

  53. 53.

    Bishop CM (2006) Pattern recognition and machine learning. Springer, Berlin

    Google Scholar 

  54. 54.

    Yan Z, Piramuthu R, Jagadeesh V, Di W, Decoste D (2019) Hierarchical deep convolutional neural network for image classification. US Patent 10,387,773

  55. 55.

    Hassan J, Shoaib U (2019) Multi-class review rating classification using deep recurrent neural network. Neural Process Lett 1–18

  56. 56.

    Roy PK, Singh JP, Banerjee S (2020) Deep learning to filter sms spam. Future Gener Comput Syst 102:524–533

    Article  Google Scholar 

  57. 57.

    Ponzanelli L, Mocci A, Bacchelli A, Lanza M, Fullerton D (2014) Improving low quality stack overflow post detection. In: IEEE international conference on software maintenance and evolution (ICSME), 2014, IEEE, pp 541–544

  58. 58.

    Mizobuchi Y, Takayama K (2017) Two improvements to detect duplicates in stack overflow. In: IEEE 24th international conference on software analysis, evolution and reengineering (SANER), 2017, IEEE, pp 563–564

  59. 59.

    Zhang WE, Sheng QZ, Lau JH, Abebe E (2017a) Detecting duplicate posts in programming qa communities via latent semantics and association rules. In: Proceedings of the 26th international conference on world wide web, international world wide web conferences steering committee, pp 1221–1229

  60. 60.

    Zhang WE, Sheng QZ, Shu Y, Nguyen VK (2017b) Feature analysis for duplicate detection in programming qa communities. In: International conference on advanced data mining and applications, Springer, Berlin, pp 623–638

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Pradeep Kumar Roy.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Roy, P.K. Multilayer Convolutional Neural Network to Filter Low Quality Content from Quora. Neural Process Lett (2020). https://doi.org/10.1007/s11063-020-10284-x

Download citation

Keywords

  • Quora
  • Deep learning
  • Question–answering
  • Convolutional neural network