Skip to main content

A Web-Based Theme-Related Word Set Construction Algorithm

  • Conference paper
  • First Online:
Web and Big Data (APWeb-WAIM 2018)

Abstract

Constructing theme-related word set is a basic work for establishing theme-oriented information retrieval systems. Nowadays, most of previous studies focus on identifying representative words of a specific document, and few studies pay attention to constructing a word set related to a theme. By analyzing existing keywords extraction methods, this paper proposes a method to automatically construct theme-related word set based on the primary theme-related word set given by domain experts and the well-known websites related to the theme. As the first step, the method uses existing information extraction techniques to obtain the documents from the websites and every document’s keyword set. Then it calculates the correlation degree between the known theme-related word set and the document keyword set, further gets a word set of the document related to the theme based on the document-theme relevance, and merges the word set to the theme-related word set. By using the method, the theme-related word set is supplemented by iteration based on the documents gotten from the theme-related websites. Because there is little research work focusing on this problem and no relevant experimental data set, this paper uses the proposed method to construct theme-related word sets towards two themes “electricity” and “college entrance examination”, and we invite domain experts to evaluate the word sets. The results show that a relatively complete theme-related word set can be obtained based on this method, which shows the feasibility of our methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, Y., Liu, Y., Zhou, K., Wang, M., Zhang, M., Ma, S.: Does vertical bring more satisfaction? Predicting search satisfaction in a heterogeneous environment. In: CIKM ‘15 Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1581–1590. Melbourne, Australia (2015). https://doi.org/10.1145/2806416.2806473

  2. Zhou, K., Cummins, R., Lalmas, M., Jose, J.M.: Which vertical search engines are relevant? In: WWW ‘13 Proceedings of the 22nd international conference on World Wide Web, pp. 1557–1568. Rio de Janeiro, Brazil (2013). https://doi.org/10.1145/2488388.2488524

  3. Bokaetf, M.H., Sameti, H., Liu, Y.: Unsupervised approach to extract summary keywords in meeting domain. In: Signal Processing Conference, pp. 1406–1410. IEEE, Nice, France (2015). https://doi.org/10.1109/eusipco.2015.7362615

  4. Hofmann, K., Tsagkias, M., Meij, E., Rijke, M.D.: A comparative study of features for keyphrase extraction in scientific literature. In: Proceedings of the 18th ACM Conference on Information And Knowledge Management, Hong Kong, China (2009)

    Google Scholar 

  5. Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. EMNLP 4, 404–411 (2004)

    Google Scholar 

  6. Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge, In AAAI’08 Proceedings of the 23rd national conference on Artificial intelligence, pp. 855–860. Chicago, Illinois (2008)

    Google Scholar 

  7. Gollapalli, S. D., Caragea, C.: Extracting keyphrases from research papers using citation networks. In: AAAI’14 Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, pp. 1629–1635. Québec City, Québec, Canada (2014)

    Google Scholar 

  8. Rafiei-Asl, J., Nickabadi, A.: TSAKE: a topical and structural automatic keyphrase extractor. Appl. Soft Comput. 58, 620–630 (2017). https://doi.org/10.1016/j.asoc.2017.05.014

    Article  Google Scholar 

  9. Florescu, C., Caragea, C.: A position-biased pagerank algorithm for keyphrase extraction. In: Proceedings of the 31st American Association for Artificial Intelligence (AAAI 2017), San Francisco, California, USA (2017)

    Google Scholar 

  10. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3(4–5), 993–1022 (2003)

    MATH  Google Scholar 

  11. Nguyen, D.Q., Billingsley, R., Du, L., Johnson, M.: Improving topic models with latent feature word representations. Trans. Assoc. Comput. Linguist. 3, 299–313 (2015)

    Google Scholar 

  12. Sfikas, G., Gatos, B., Nikou, C.: Semicca: A new semi-supervised probabilistic CCA model for keyword spotting. In: 2017 IEEE International Conference on Image Processing, pp. 1107–1111. Beijing, China (2017). https://doi.org/10.1109/icip.2017.8296453

  13. Xie, F., Wu, X., Zhu, X.: Efficient sequential pattern mining with wildcards for keyphrase extraction. Knowl.-Based Syst. 115, 27–39 (2017). https://doi.org/10.1016/j.knosys.2016.10.011

    Article  Google Scholar 

  14. Joorabchi, A., Mahdi, A.E.: Automatic keyphrase annotation of scientific documents using Wikipedia and genetic algorithms. J. Inf. Sci. 39(3), 410–426 (2013). https://doi.org/10.1177/0165551512472138

    Article  Google Scholar 

  15. Sterckx, L., Caragea, C., Demeester, T., Develder, C.: Supervised keyphrase extraction as positive unlabeled learning. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 1924–1929. Austin, Texas, USA (2016). https://doi.org/10.18653/v1/d16-1198

  16. Yiqun, C., Ruqi, Z., Weiheng, Z., Mengting, L., Jian, Y.: Mining patent knowledge for automatic keyword extraction. J. Comput. Res. Dev. 53(8), 1740–1752 (2016)

    Google Scholar 

  17. Gollapalli, S.D., Li, X., Yang, P.: Incorporating expert knowledge into keyphrase extraction. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17), San Francisco, California USA (2017)

    Google Scholar 

Download references

Acknowledgement

This research was supported by the Training plan of Tianjin University Innovation Team (No.TD13-5025), the Natural Science Foundation of Tianjin (No.15JCYBJC46500) and the Major Project of Tianjin Smart Manufacturing (No.15ZXZNCX00050).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yukun Li .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wu, Y., Li, Y., Hao, G. (2018). A Web-Based Theme-Related Word Set Construction Algorithm. In: U, L., Xie, H. (eds) Web and Big Data. APWeb-WAIM 2018. Lecture Notes in Computer Science(), vol 11268. Springer, Cham. https://doi.org/10.1007/978-3-030-01298-4_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-01298-4_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-01297-7

  • Online ISBN: 978-3-030-01298-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics