Skip to main content

Mining of Relevant and Informative Posts from Text Forums

  • Conference paper
  • First Online:
  • 967 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 947))

Abstract

In the modern world, the competitive advantage for every person is the possibility to obtain the information in a fast and comfortable way. Web forums occupy a significant place among the sources of information. It is a good place to gain professionally significant knowledge on different topics. However, sometimes it is not easy to identify the places on the forum, which contains useful information corresponding user demands. In this paper we consider the problem of automatic forum text summarization and describe the methods, which can help to solve it. We study the difference between relevance-oriented and useful-oriented query types. We will describe our dataset, that contains over 4000 of marked posts from web forums about various subject domains. The posts were marked by experts, by estimating them on a scale from 0 to 5 for selected query types. The results of our study can provide background for creation informational retrieval applications that will decrease the time of user’s searching and increase the quality of search results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Agresti, A., Kateri, M.: Categorical Data Analysis. In: Lovric, M. (ed.) International Encyclopedia of Statistical Science. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-04898-2

  2. Al-Hashemi, R.: Text summarization extraction system (TSES) using extracted keywords. Int. Arab J. e-Technol. 1(4), 164–168 (2010)

    Google Scholar 

  3. Almahy, I., Salim, N.: Web discussion summarization: study review. In: Herawan, T., Deris, M.M., Abawajy, J. (eds.) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). LNEE, vol. 285, pp. 649–656. Springer, Singapore (2014). https://doi.org/10.1007/978-981-4585-18-7_73

    Chapter  Google Scholar 

  4. Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: An overview of graph-based keyword extraction methods and approaches. J. Inf. Organ. Sci. 39(1), 1–20 (2015)

    Google Scholar 

  5. Bishop, C.M.: Pattern recognition. Mach. Learn. 128 (2006)

    Google Scholar 

  6. Biyani, P., Bhatia, S., Caragea, C., Mitra, P.: Using non-lexical features for identifying factual and opinionative threads in online forums. Knowl. Based Syst. 69, 170–178 (2014)

    Article  Google Scholar 

  7. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)

    MATH  Google Scholar 

  8. Bottenberg, R.A., Ward, J.H.: Applied multiple linear regression. Technical report, DTIC Document (1963)

    Google Scholar 

  9. Elbedweihy, K.M., Wrigley, S.N., Clough, P., Ciravegna, F.: An overview of semantic search evaluation initiatives. Web Semant. Sci. Serv. Agents World Wide Web 30, 82–105 (2015)

    Article  Google Scholar 

  10. Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)

    Article  MathSciNet  Google Scholar 

  11. Grozin, V., Dobrenko, N., Gusarova, N., Ning, T.: The application of machine learning methods for analysis of text forums for creating learning objects. Comput. Linguist. Intellect. Technol. 1, 199–209 (2015)

    Google Scholar 

  12. Grozin, V.A., Gusarova, N.F., Dobrenko, N.V.: Feature selection for language independent text forum summarization. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2015. CCIS, vol. 518, pp. 63–71. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24543-0_5

    Chapter  Google Scholar 

  13. Harman, D.: Information Retrieval Evaluation. Synthesis Lectures on Information Concepts, Retrieval, and Services, vol. 3, no. 2, pp. 1–119 (2011

    Article  Google Scholar 

  14. Kelly, D.: Methods for evaluating interactive information retrieval systems with users. Found. Trends Inf. Retr. 3(12), 1–224 (2009)

    Google Scholar 

  15. Lomakina, L., Rodionov, V., Surkova, A.: Hierarchical clustering of text documents. Autom. Remote Control 75(7), 1309–1315 (2014)

    Article  Google Scholar 

  16. Lott, B.: Survey of keyword extraction techniques. UNM Education (2012)

    Google Scholar 

  17. Mikolov, T., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)

    Google Scholar 

  18. Nenkova, A., McKeown, K.: A survey of text summarization techniques. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 43–76. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_3

    Chapter  Google Scholar 

  19. Nettleton, D.F.: Data mining of social networks represented as graphs. Comput. Sci. Rev. 7, 1–34 (2013)

    Article  MathSciNet  Google Scholar 

  20. Oufaida, H., Nouali, O., Blache, P.: Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization. J. King Saud Univ. Comput. Inf. Sci. 26(4), 450–461 (2014)

    Google Scholar 

  21. Petrelli, D.: On the role of user-centred evaluation in the advancement of interactive information retrieval. Inf. Process. Manage. 44(1), 22–38 (2008)

    Article  Google Scholar 

  22. Ren, Z., Ma, J., Wang, S., Liu, Y.: Summarizing web forum threads based on a latent topic propagation process. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 879–884. ACM (2011). Mining of relevant and informative posts from text forums 15

    Google Scholar 

  23. Romero, C., López, M.I., Luna, J.M., Ventura, S.: Predicting students’ final performance from participation in on-line discussion forums. Comput. Educ. 68, 458–472 (2013)

    Article  Google Scholar 

  24. Saracevic, T.: Evaluation of evaluation in information retrieval. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 138–146. ACM (1995)

    Google Scholar 

  25. Schütze, H.: Introduction to information retrieval. In: Proceedings of the International Communication of Association for Computing Machinery Conference (2008)

    Google Scholar 

  26. Sizov, G.: Extraction-based automatic summarization: theoretical and empirical investigation of summarization techniques (2010)

    Google Scholar 

  27. Smine, B., Faiz, R., Desclés, J.P.: Relevant learning objects extraction based on semantic annotation. Int. J. Metadata Semant. Ontol. 8(1), 13–27 (2013)

    Article  Google Scholar 

  28. Sondhi, P., Gupta, M., Zhai, C., Hockenmaier, J.: Shallow information extraction from medical forum data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 1158–1166. Association for Computational Linguistics (2010)

    Google Scholar 

  29. Tang, J., Yao, L., Chen, D.: Multi-topic based query-oriented summarization. In: SDM, vol. 9, pp. 1147–1158. SIAM (2009)

    Google Scholar 

  30. Wang, J.Z., Yan, Z., Yang, L.T., Huang, B.X.: An approach to rank reviews by fusing and mining opinions based on review pertinence. Inf. Fusion 23, 3–15 (2015)

    Article  Google Scholar 

  31. Wartena, C., Brussee, R.: Topic detection by clustering keywords. In: 2008 19th International Workshop on Database and Expert Systems Applications, pp. 54–58. IEEE (2008)

    Google Scholar 

  32. Zhao, H., Zeng, Q.: Micro-blog keyword extraction method based on graph model and semantic space. J. Multimed. 8(5), 611–617 (2013)

    Google Scholar 

Download references

Acknowledgements

This work was financially supported by the Government of the Russian Federation (Grant 08-08).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kseniya Buraya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Buraya, K., Grozin, V., Trofimov, V., Vinogradov, P., Gusarova, N. (2019). Mining of Relevant and Informative Posts from Text Forums. In: Chugunov, A., Misnikov, Y., Roshchin, E., Trutnev, D. (eds) Electronic Governance and Open Society: Challenges in Eurasia. EGOSE 2018. Communications in Computer and Information Science, vol 947. Springer, Cham. https://doi.org/10.1007/978-3-030-13283-5_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-13283-5_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-13282-8

  • Online ISBN: 978-3-030-13283-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics