Skip to main content
Log in

Discovering High-Quality Threaded Discussions in Online Forums

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Archives of threaded discussions generated by users in online forums and discussion boards contain valuable knowledge on various topics. However, not all threads are useful because of deliberate abuses, such as trolling and flaming, that are commonly observed in online conversations. The existence of various users with different levels of expertise also makes it difficult to assume that every discussion thread stored online contains high-quality contents. Although finding high-quality threads automatically can help both users and search engines sift through a huge amount of thread archives and make use of these potentially useful resources effectively, no previous work to our knowledge has performed a study on such task. In this paper, we propose an automatic method for distinguishing high-quality threads from low-quality ones in online discussion sites. We first suggest four different artificial measures for inducing overall quality of a thread based on ratings of its posts. We then propose two tasks involving prediction of thread quality without using post rating information. We adopt a popular machine learning framework to solve the two prediction tasks. Experimental results on a real world forum archive demonstrate that our method can significantly improve the prediction performance across all four measures of thread quality on both tasks. We also compare how different types of features derived from various aspects of threads contribute to the overall performance and investigate key features that play a crucial role in discovering high-quality threads in online discussion sites.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Bhatia S, Mitra P. Adopting inference networks for online thread retrieval. In Proc. the 24th AAAI, July 2010, pp.1300-1305.

  2. Elsas J L, Carbonell J G. It pays to be picky: An evaluation of thread retrieval in online forums. In Proc. the 32nd SIGIR, July 2009, pp.714-715.

  3. Seo J, Croft W B, Smith D A. Online community search using thread structure. In Proc. the 18th CIKM, Nov. 2009, pp.1907-1910.

  4. Joachims T. Optimizing search engines using clickthrough data. In Proc. the 8th ACM KDD, July 2002, pp.133-142.

  5. Agichtein E, Castillo C, Donato D, Gionis A, Mishne G. Finding high-quality content in social media. In Proc. WSDM, Feb. 2008, pp.183-194.

  6. Jeon J, Croft W B, Lee J H, Park S. A framework to predict the quality of answers with non-textual features. In Proc. the 29th SIGIR, Aug. 2006, pp.228-235.

  7. Gómez V, Kaltenbrunner A, López V. Statistical analysis of the social network and discussion threads in slashdot. In Proc. the 17th WWW, April 2008, pp.645-654.

  8. Joachims T. Making large-scale SVM learning practical. In Advances in Kernel Methods: Support Vector Learning, Schölkopf B, Burges C J C, Smola A J (eds.), The MIT Press, 1999, pp.169-184.

  9. Järvelin K, Kekäläinen J. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems, 2002, 20(4): 422-446.

    Article  Google Scholar 

  10. Liu J, Cao Y, Lin C Y, Huang Y, Zhou M. Low-quality product review detection in opinion summarization. In Proc. EMNLP-CoNLL, June 2007, pp.334-342.

  11. Xu G, MaWY. Building implicit links from content for forum search. In Proc. the 29th SIGIR, Aug. 2006, pp.300-307.

  12. Weimer M, Gurevych I. Predicting the perceived quality of web forum posts. In Proc. RANLP, Sept. 2007, pp.643-648.

  13. Weimer M, Gurevych I, Mühlhäuser M. Automatically assessing the post quality in online discussions on software. In Proc. the 45th ACL, June 2007, pp.125-128.

  14. Wanas N, El-Saban M, Ashour H, Ammar W. Automatic scoring of online discussion posts. In Proc. the 2nd WICOW, Oct. 2008, pp.19-26.

  15. Chai K, Hayati P, Potdar V, Wu C, Talevski A. Assessing post usage for measuring the quality of forum posts. In Proc. the 4th DEST, April 2010, pp.233-238.

  16. FitzGerald N, Carenini G, Murray G, Joty S. Exploiting conversational features to detect high-quality blog comments. In Proc. the 24th Canadian Conf. Advances in Artificial Intelligence, June 2011, pp.122-127.

  17. Lin C, Yang J M, Cai R, Wang X J, Wang W. Simultaneously modeling semantics and structure of threaded discussions: A sparse coding approach and its applications. In Proc. the 32nd SIGIR, July 2009, pp.131-138.

  18. Cong G, Wang L, Lin C Y, Song Y I, Sun Y. Finding question-answer pairs from online forums. In Proc. the 31st SIGIR, July 2008, pp.467-474.

  19. Zhang J, Ackerman M S, Adamic L. Expertise networks in online communities: Structure and algorithms. In Proc. the 16th WWW, May 2007, pp.221-230.

  20. Zhou L, Hovy E. On the summarization of dynamically introduced information: Online discussions and blogs. In Proc. AAAI Spring Symposium 2006 — Computational Approaches to Analyzing Weblogs, July 2006, pp.237-242.

  21. Morzy M. On mining and social role discovery in Internet forums. In Proc. SOCINFO, June 2009, pp.74-79.

  22. Kaiser C, Bodendorf F. Opinion and relationship mining in online forums. In Proc. WI-IAT, Sept. 2009, pp.128-131.

  23. Castro-Herrera C, Cleland-Huang J, Mobasher B. A recommender system for dynamically evolving online forums. In Proc. the 3rd RecSys, Oct. 2009, pp.213-216.

  24. Bratitsis T, Dimitracopoulou A. Indicators for measuring quality in asynchronous discussion forae. In Proc. CELDA, Dec. 2006.

  25. Simoff S J. Monitoring and evaluation in collaborative learning environments. In Proc. CSCL, Dec. 1999.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hae-Chang Rim.

Additional information

This research was partially supported by the Ministry of Knowledge Economy (MKE), Korea, and Microsoft Research through the IT/SW Creative Research Program supervised by the National IT Industry Promotion Agency (NIPA) of Korea under Grant No. NIPA-2012-H0503-12-1012, and the Next-Generation Information Computing Development Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Science, ICT & Future Planning of Korea under Grant No. NRF-2012M3C4A7033344.

Part of this work was done while the first author was an intern at Microsoft Research Asia, Beijing.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 59 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, JT., Yang, MC. & Rim, HC. Discovering High-Quality Threaded Discussions in Online Forums. J. Comput. Sci. Technol. 29, 519–531 (2014). https://doi.org/10.1007/s11390-014-1446-5

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-014-1446-5

Keywords

Navigation