Abstract
In the modern world, the competitive advantage for every person is the possibility to obtain the information in a fast and comfortable way. Web forums occupy a significant place among the sources of information. It is a good place to gain professionally significant knowledge on different topics. However, sometimes it is not easy to identify the places on the forum, which contains useful information corresponding user demands. In this paper we consider the problem of automatic forum text summarization and describe the methods, which can help to solve it. We study the difference between relevance-oriented and useful-oriented query types. We will describe our dataset, that contains over 4000 of marked posts from web forums about various subject domains. The posts were marked by experts, by estimating them on a scale from 0 to 5 for selected query types. The results of our study can provide background for creation informational retrieval applications that will decrease the time of user’s searching and increase the quality of search results.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agresti, A., Kateri, M.: Categorical Data Analysis. In: Lovric, M. (ed.) International Encyclopedia of Statistical Science. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-04898-2
Al-Hashemi, R.: Text summarization extraction system (TSES) using extracted keywords. Int. Arab J. e-Technol. 1(4), 164–168 (2010)
Almahy, I., Salim, N.: Web discussion summarization: study review. In: Herawan, T., Deris, M.M., Abawajy, J. (eds.) Proceedings of the First International Conference on Advanced Data and Information Engineering (DaEng-2013). LNEE, vol. 285, pp. 649–656. Springer, Singapore (2014). https://doi.org/10.1007/978-981-4585-18-7_73
Beliga, S., Meštrović, A., Martinčić-Ipšić, S.: An overview of graph-based keyword extraction methods and approaches. J. Inf. Organ. Sci. 39(1), 1–20 (2015)
Bishop, C.M.: Pattern recognition. Mach. Learn. 128 (2006)
Biyani, P., Bhatia, S., Caragea, C., Mitra, P.: Using non-lexical features for identifying factual and opinionative threads in online forums. Knowl. Based Syst. 69, 170–178 (2014)
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)
Bottenberg, R.A., Ward, J.H.: Applied multiple linear regression. Technical report, DTIC Document (1963)
Elbedweihy, K.M., Wrigley, S.N., Clough, P., Ciravegna, F.: An overview of semantic search evaluation initiatives. Web Semant. Sci. Serv. Agents World Wide Web 30, 82–105 (2015)
Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)
Grozin, V., Dobrenko, N., Gusarova, N., Ning, T.: The application of machine learning methods for analysis of text forums for creating learning objects. Comput. Linguist. Intellect. Technol. 1, 199–209 (2015)
Grozin, V.A., Gusarova, N.F., Dobrenko, N.V.: Feature selection for language independent text forum summarization. In: Klinov, P., Mouromtsev, D. (eds.) KESW 2015. CCIS, vol. 518, pp. 63–71. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24543-0_5
Harman, D.: Information Retrieval Evaluation. Synthesis Lectures on Information Concepts, Retrieval, and Services, vol. 3, no. 2, pp. 1–119 (2011
Kelly, D.: Methods for evaluating interactive information retrieval systems with users. Found. Trends Inf. Retr. 3(12), 1–224 (2009)
Lomakina, L., Rodionov, V., Surkova, A.: Hierarchical clustering of text documents. Autom. Remote Control 75(7), 1309–1315 (2014)
Lott, B.: Survey of keyword extraction techniques. UNM Education (2012)
Mikolov, T., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems (2013)
Nenkova, A., McKeown, K.: A survey of text summarization techniques. In: Aggarwal, C., Zhai, C. (eds.) Mining Text Data, pp. 43–76. Springer, Boston (2012). https://doi.org/10.1007/978-1-4614-3223-4_3
Nettleton, D.F.: Data mining of social networks represented as graphs. Comput. Sci. Rev. 7, 1–34 (2013)
Oufaida, H., Nouali, O., Blache, P.: Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization. J. King Saud Univ. Comput. Inf. Sci. 26(4), 450–461 (2014)
Petrelli, D.: On the role of user-centred evaluation in the advancement of interactive information retrieval. Inf. Process. Manage. 44(1), 22–38 (2008)
Ren, Z., Ma, J., Wang, S., Liu, Y.: Summarizing web forum threads based on a latent topic propagation process. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, pp. 879–884. ACM (2011). Mining of relevant and informative posts from text forums 15
Romero, C., López, M.I., Luna, J.M., Ventura, S.: Predicting students’ final performance from participation in on-line discussion forums. Comput. Educ. 68, 458–472 (2013)
Saracevic, T.: Evaluation of evaluation in information retrieval. In: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 138–146. ACM (1995)
Schütze, H.: Introduction to information retrieval. In: Proceedings of the International Communication of Association for Computing Machinery Conference (2008)
Sizov, G.: Extraction-based automatic summarization: theoretical and empirical investigation of summarization techniques (2010)
Smine, B., Faiz, R., Desclés, J.P.: Relevant learning objects extraction based on semantic annotation. Int. J. Metadata Semant. Ontol. 8(1), 13–27 (2013)
Sondhi, P., Gupta, M., Zhai, C., Hockenmaier, J.: Shallow information extraction from medical forum data. In: Proceedings of the 23rd International Conference on Computational Linguistics: Posters, pp. 1158–1166. Association for Computational Linguistics (2010)
Tang, J., Yao, L., Chen, D.: Multi-topic based query-oriented summarization. In: SDM, vol. 9, pp. 1147–1158. SIAM (2009)
Wang, J.Z., Yan, Z., Yang, L.T., Huang, B.X.: An approach to rank reviews by fusing and mining opinions based on review pertinence. Inf. Fusion 23, 3–15 (2015)
Wartena, C., Brussee, R.: Topic detection by clustering keywords. In: 2008 19th International Workshop on Database and Expert Systems Applications, pp. 54–58. IEEE (2008)
Zhao, H., Zeng, Q.: Micro-blog keyword extraction method based on graph model and semantic space. J. Multimed. 8(5), 611–617 (2013)
Acknowledgements
This work was financially supported by the Government of the Russian Federation (Grant 08-08).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Buraya, K., Grozin, V., Trofimov, V., Vinogradov, P., Gusarova, N. (2019). Mining of Relevant and Informative Posts from Text Forums. In: Chugunov, A., Misnikov, Y., Roshchin, E., Trutnev, D. (eds) Electronic Governance and Open Society: Challenges in Eurasia. EGOSE 2018. Communications in Computer and Information Science, vol 947. Springer, Cham. https://doi.org/10.1007/978-3-030-13283-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-13283-5_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13282-8
Online ISBN: 978-3-030-13283-5
eBook Packages: Computer ScienceComputer Science (R0)