A Web-Based Theme-Related Word Set Construction Algorithm

Wu, Yingkai; Li, Yukun; Hao, Gang

doi:10.1007/978-3-030-01298-4_17

Yingkai Wu¹⁵,
Yukun Li^15,16 &
Gang Hao^15,17

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11268))

Included in the following conference series:

Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data

967 Accesses
1 Citations

Abstract

Constructing theme-related word set is a basic work for establishing theme-oriented information retrieval systems. Nowadays, most of previous studies focus on identifying representative words of a specific document, and few studies pay attention to constructing a word set related to a theme. By analyzing existing keywords extraction methods, this paper proposes a method to automatically construct theme-related word set based on the primary theme-related word set given by domain experts and the well-known websites related to the theme. As the first step, the method uses existing information extraction techniques to obtain the documents from the websites and every document’s keyword set. Then it calculates the correlation degree between the known theme-related word set and the document keyword set, further gets a word set of the document related to the theme based on the document-theme relevance, and merges the word set to the theme-related word set. By using the method, the theme-related word set is supplemented by iteration based on the documents gotten from the theme-related websites. Because there is little research work focusing on this problem and no relevant experimental data set, this paper uses the proposed method to construct theme-related word sets towards two themes “electricity” and “college entrance examination”, and we invite domain experts to evaluate the word sets. The results show that a relatively complete theme-related word set can be obtained based on this method, which shows the feasibility of our methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, Y., Liu, Y., Zhou, K., Wang, M., Zhang, M., Ma, S.: Does vertical bring more satisfaction? Predicting search satisfaction in a heterogeneous environment. In: CIKM ‘15 Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1581–1590. Melbourne, Australia (2015). https://doi.org/10.1145/2806416.2806473
Zhou, K., Cummins, R., Lalmas, M., Jose, J.M.: Which vertical search engines are relevant? In: WWW ‘13 Proceedings of the 22nd international conference on World Wide Web, pp. 1557–1568. Rio de Janeiro, Brazil (2013). https://doi.org/10.1145/2488388.2488524
Bokaetf, M.H., Sameti, H., Liu, Y.: Unsupervised approach to extract summary keywords in meeting domain. In: Signal Processing Conference, pp. 1406–1410. IEEE, Nice, France (2015). https://doi.org/10.1109/eusipco.2015.7362615
Hofmann, K., Tsagkias, M., Meij, E., Rijke, M.D.: A comparative study of features for keyphrase extraction in scientific literature. In: Proceedings of the 18th ACM Conference on Information And Knowledge Management, Hong Kong, China (2009)
Google Scholar
Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. EMNLP 4, 404–411 (2004)
Google Scholar
Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge, In AAAI’08 Proceedings of the 23rd national conference on Artificial intelligence, pp. 855–860. Chicago, Illinois (2008)
Google Scholar
Gollapalli, S. D., Caragea, C.: Extracting keyphrases from research papers using citation networks. In: AAAI’14 Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pp. 1629–1635. Québec City, Québec, Canada (2014)
Google Scholar
Rafiei-Asl, J., Nickabadi, A.: TSAKE: a topical and structural automatic keyphrase extractor. Appl. Soft Comput. 58, 620–630 (2017). https://doi.org/10.1016/j.asoc.2017.05.014
Article Google Scholar
Florescu, C., Caragea, C.: A position-biased pagerank algorithm for keyphrase extraction. In: Proceedings of the 31st American Association for Artificial Intelligence (AAAI 2017), San Francisco, California, USA (2017)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(4–5), 993–1022 (2003)
MATH Google Scholar
Nguyen, D.Q., Billingsley, R., Du, L., Johnson, M.: Improving topic models with latent feature word representations. Trans. Assoc. Comput. Linguist. 3, 299–313 (2015)
Google Scholar
Sfikas, G., Gatos, B., Nikou, C.: Semicca: A new semi-supervised probabilistic CCA model for keyword spotting. In: 2017 IEEE International Conference on Image Processing, pp. 1107–1111. Beijing, China (2017). https://doi.org/10.1109/icip.2017.8296453
Xie, F., Wu, X., Zhu, X.: Efficient sequential pattern mining with wildcards for keyphrase extraction. Knowl.-Based Syst. 115, 27–39 (2017). https://doi.org/10.1016/j.knosys.2016.10.011
Article Google Scholar
Joorabchi, A., Mahdi, A.E.: Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms. J. Inf. Sci. 39(3), 410–426 (2013). https://doi.org/10.1177/0165551512472138
Article Google Scholar
Sterckx, L., Caragea, C., Demeester, T., Develder, C.: Supervised keyphrase extraction as positive unlabeled learning. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1924–1929. Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1198
Yiqun, C., Ruqi, Z., Weiheng, Z., Mengting, L., Jian, Y.: Mining patent knowledge for automatic keyword extraction. J. Comput. Res. Dev. 53(8), 1740–1752 (2016)
Google Scholar
Gollapalli, S.D., Li, X., Yang, P.: Incorporating expert knowledge into keyphrase extraction. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), San Francisco, California USA (2017)
Google Scholar

Download references

Acknowledgement

This research was supported by the Training plan of Tianjin University Innovation Team (No.TD13-5025), the Natural Science Foundation of Tianjin (No.15JCYBJC46500) and the Major Project of Tianjin Smart Manufacturing (No.15ZXZNCX00050).

Author information

Authors and Affiliations

Tianjin University of Technology, 300384, Tianjin, China
Yingkai Wu, Yukun Li & Gang Hao
Tianjin Key Laboratory of Intelligence Computing and Novel Software Technology, Tianjin, China
Yukun Li
Key Laboratory of Computer Vision and System, Ministry of Education Tianjin, Tianjin, China
Gang Hao

Authors

Yingkai Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yukun Li
View author publications
You can also search for this author in PubMed Google Scholar
Gang Hao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yukun Li .

Editor information

Editors and Affiliations

University of Macau, Macao, China
Leong Hou U
Education University of Hong Kong, Hong Kong, China
Haoran Xie

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, Y., Li, Y., Hao, G. (2018). A Web-Based Theme-Related Word Set Construction Algorithm. In: U, L., Xie, H. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 11268. Springer, Cham. https://doi.org/10.1007/978-3-030-01298-4_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-01298-4_17
Published: 21 October 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-01297-7
Online ISBN: 978-3-030-01298-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics