Skip to main content

Rule-Based HierarchicalRank: An Unsupervised Approach to Visible Tag Extraction from Semi-structured Chinese Text

  • Conference paper
  • First Online:
PRICAI 2019: Trends in Artificial Intelligence (PRICAI 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11672))

Included in the following conference series:

  • 2652 Accesses

Abstract

The large and growing amounts of semi-structured Chinese text present both challenges and opportunities to enhance text mining and knowledge discovery. One such challenge is to automatically extract a small set of visible tag from a document that can accurately reveal the document’s topic and can facilitate fast information processing. Unfortunately, at this stage, there is still a certain gap between the existing methods and truly engineering application.

In order to narrow this gap, we propose Rule-Based HierarchicalRank (RBH), an unsupervised method for visible tag extraction from semi-structured Chinese text via a documents’ title and non-title two levels. In different level, we use inconsistent methods to extract the candidate visible tags. The experiment results show that the performance of the RBH method is far better than all the baseline methods on visible tag extraction task on two distinct experiment datasets. Specifically, On Paper-Dataset, the rule-based HierarchicalRank methods’ precision and F1-score achieves 18.6% and 14.1%, while TOP K = 5. In addition, on Event-Dataset, the best precision of our method is higher 7% than the state-of-the-art method PositionRank with TOP K = 1. Furthermore, the best Recall of RBH achieves 37.7% when TOP K = 5.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://pypi.org/project/jieba/.

  2. 2.

    http://www.jos.org.cn/jos/ch/index.aspx.

  3. 3.

    http://ef.zhiweidata.com/#!/down.

References

  1. Abujbara, A., Arbor, A.: Coherent Citation-Based Summarization of Scientific Papers. Meeting of the Association for Computational Linguistics: Human Language Technologies. DBLP (2011)

    Google Scholar 

  2. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the International Conference on Machine Learning (1997)

    Google Scholar 

  3. Liu, T.Y.: Learning to rank for information retrieval. ACM SIGIR Forum 41(2), 904 (2010)

    Google Scholar 

  4. Li, Y., Nie, J., Yi, Z., Wang, B., Yan, B., Weng, F.: Contextual recommendation based on text mining. In: International Conference on Computational Linguistics: Posters (2010)

    Google Scholar 

  5. Caragea, C., Bulgarov, F.A., Godea, A., Gollapalli, S.D.: Citation-enhanced keyphrase extraction from research papers: a supervised approach (2014)

    Google Scholar 

  6. Wang, M., Zhao, B., Huang, Y.: PTR: phrase-based topical ranking for automatic keyphrase extraction in scientific publications. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds.) ICONIP 2016. LNCS, vol. 9950, pp. 120–128. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46681-1_15

    Chapter  Google Scholar 

  7. Kim, S.N.: Automatic keyphrase extraction from scientific articles. Lang. Resour. Eval. 47(3), 723–742 (2013)

    Article  Google Scholar 

  8. Florescu, C., Caragea, C.: PositionRank: an unsupervised approach to keyphrase extraction from scholarly documents. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1105–1115 (2017)

    Google Scholar 

  9. Huang, C.M., Wu, C.Y.: Effects of word assignment in LDA for news topic discovery. In: IEEE International Congress on Big Data (BigData Congress), pp. 374–380. IEEE (2015)

    Google Scholar 

  10. Zhang, J.N., Wang, S.G., Sun, Q.B., Yang, F.C.: SLA-Aware fault-tolerant approach for transactional composite service. J. Softw. 29(12), 3614–3634 (2018). http://www.jos.org.cn/1000-9825/5313.htm. (in Chinese)

    Google Scholar 

  11. Nguyen, T.D., Kan, M.-Y.: Keyphrase extraction in scientific publications. In: Goh, D.H.-L., Cao, T.H., Sølvberg, I.T., Rasmussen, E. (eds.) ICADL 2007. LNCS, vol. 4822, pp. 317–326. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-77094-7_41

    Chapter  Google Scholar 

  12. Page, L., Brin, S., Motwani, R., Winograd, T.: The PageRank citation ranking: bringing order to the web. Stanford InfoLab (1999)

    Google Scholar 

  13. Hasan, K.S., Ng, V.: Automatic keyphrase extraction: a survey of the state of the art. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), vol. 1, pp. 1262–1273 (2014)

    Google Scholar 

  14. Merrouni, Z.A., Frikh, B., Ouhbi, B.: Automatic keyphrase extraction: an overview of the state of the art. In: 4th IEEE International Colloquium on Information Science and Technology (CiSt), pp. 306–313. IEEE (2016)

    Google Scholar 

  15. Frank, E., Paynter, G.W., Witten, I.H., et al.: Domain-specific keyphrase extraction. In: International Joint Conference on Artificial Intelligence (1999)

    Google Scholar 

  16. Turney, P.D.: Learning algorithms for keyphrase extraction. Inf. Retrieval 2(4), 303–336 (2002)

    Article  Google Scholar 

  17. Lopez, P., Romary, L.: HUMB: automatic key term extraction from scientific articles in GROBID. In: Proceedings of International Workshop on Semantic Evaluation, pp. 248–251 (2010)

    Google Scholar 

  18. Chuang, J., Manning, C.D., Heer, J.: “Without the clutter of unimportant words”: ldescriptive keyphrases for text visualization. ACM Trans. Comput. Hum. Interact. 19(3), 1–29 (2012)

    Article  Google Scholar 

  19. Sheeba, J.I., Vivekanandan, K.: Improved keyword and keyphrase extraction from meeting transcripts. Int. J. Comput. Appl. 52(13), 11–15 (2013)

    Google Scholar 

  20. Basaldella, M., Antolli, E., Serra, G., Tasso, C.: Bidirectional LSTM recurrent neural network for keyphrase extraction. In: Serra, G., Tasso, C. (eds.) IRCDL 2018. CCIS, vol. 806, pp. 180–187. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73165-0_18

    Chapter  Google Scholar 

  21. Alqaryouti, O., Khwileh, H., Farouk, T., Nabhan, A., Shaalan, K.: Graph-based keyword extraction. In: Shaalan, K., Hassanien, A.E., Tolba, F. (eds.) Intelligent Natural Language Processing: Trends and Applications. SCI, vol. 740, pp. 159–172. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-67056-0_9

    Chapter  Google Scholar 

  22. Zhang, Y., Zincirheywood, N., Milios, E.: Narrative text classification for automatic key phrase extraction in web document corpora (2005)

    Google Scholar 

  23. Li, J., Zhang, K.: Keyword extraction based on tf/idf for Chinese news document. Wuhan Univ. J. Nat. Sci. 12(5), 917–921 (2007)

    Article  Google Scholar 

  24. Mihalcea, R., Tarau, P.: TextRank: bringing order into texts. In: EMNLP, pp. 404–411 (2004)

    Google Scholar 

  25. Wan, X., Xiao, J.: Single document keyphrase extraction using neighborhood knowledge. In: National Conference on Artificial Intelligence. AAAI Press (2008)

    Google Scholar 

  26. Liu, Z., Huang, W., Zheng, Y., Sun, M.: Automatic keyphrase extraction via topic decomposition. In: Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP 2010, 9–11 October 2010, MIT Stata Center, Massachusetts, A meeting of SIGDAT, a Special Interest Group of the ACL. Association for Computational Linguistics (2010)

    Google Scholar 

  27. Liu, Z., Chen, X., Zheng, Y., Sun, M.: Automatic keyphrase extraction by bridging vocabulary gap. In: Fifteenth Conference on Computational Natural Language Learning. Association for Computational Linguistics (2011)

    Google Scholar 

  28. Hu, J., Li, S., Yao, Y., Yu, L., Yang, G., Hu, J.: Patent keyword extraction algorithm based on distributed representation for patent classification. Entropy 20(2), 104 (2018)

    Article  Google Scholar 

  29. Naidu, R., Bharti, S.K., Babu, K.S., Mohapatra, R.K.: Text summarization with automatic Keyword extraction in Telugu e-Newspapers. In: Satapathy, S.C., Bhateja, V., Das, S. (eds.) Smart Computing and Informatics. SIST, vol. 77, pp. 555–564. Springer, Singapore (2018). https://doi.org/10.1007/978-981-10-5544-7_54

    Chapter  Google Scholar 

  30. Yuan, M., Zou, C.: Text keyword extraction based on meta-learning strategy. In: International Conference on Big Data and Artificial Intelligence (BDAI), pp. 78–81. IEEE (2018)

    Google Scholar 

  31. Biswas, S.K.: Keyword extraction from tweets using weighted graph. In: Mallick, P.K., Balas, V.E., Bhoi, A.K., Zobaa, A.F. (eds.) Cognitive Informatics and Soft Computing. AISC, vol. 768, pp. 475–483. Springer, Singapore (2019). https://doi.org/10.1007/978-981-13-0617-4_47

    Chapter  Google Scholar 

  32. Ge, B., He, C.H., Hu, S.Z., Guo, C.: Chinese news hot subtopic discovery and recommendation method based on key phrase and the LDA model. DEStech Transactions on Engineering and Technology Research, ECAR (2018)

    Google Scholar 

Download references

Acknowledgment

This paper was supported by the National Natural Science Foundation of China (NSFC) via grant No. 61872446 and Natural Science Foundation of Hunan Province, China via grant No. 2018JJ2475 and No. 2018JJ2476.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chunhui He .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Lei, J., Yu, J., He, C., Zhang, C., Ge, B., Bao, Y. (2019). Rule-Based HierarchicalRank: An Unsupervised Approach to Visible Tag Extraction from Semi-structured Chinese Text. In: Nayak, A., Sharma, A. (eds) PRICAI 2019: Trends in Artificial Intelligence. PRICAI 2019. Lecture Notes in Computer Science(), vol 11672. Springer, Cham. https://doi.org/10.1007/978-3-030-29894-4_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-29894-4_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-29893-7

  • Online ISBN: 978-3-030-29894-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics