Skip to main content

Online Chinese-Vietnamese Bilingual Topic Detection Based on RCRP Algorithm with Event Elements

  • Conference paper
Natural Language Processing and Chinese Computing (NLPCC 2014)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 496))

  • 1795 Accesses

Abstract

On account of the characteristics of online Chinese-Vietnamese topic detection, we propose a Chinese-Vietnamese bilingual topic model based on the Recurrent Chinese Restaurant Process and integrated with event elements. First, the event elements, including the characters, the place and the time, will be extracted from the new dynamic bilingual news texts. Then the word pairs are tagged and aligned from the bilingual news and comments. Both the event elements and the aligned words are integrated into RCRP algorithm to construct the proposed bilingual topic detection model. Finally, we use the model to determine if the new documents will be grouped into a new category or classified into the existing categories, as a result, to detect a topic. Through the contrast experiment, the proposed model achieves a good effect on topic detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wang, D., Liu, W., Xu, W.: Topic Tracking Based on Event Network. In: 2011 4th International Conference on Cyber, Physical and Social Computing Internet of Things (iThings/CPSCom), pp. 488–493 (2011)

    Google Scholar 

  2. De Smet, W., Moens, M.F.: Cross-language linking of news stories on the web using interlingual topic modelling. In: Proceedings of the 2nd ACM Workshop on Social Web Search and Mining, pp. 57–64. ACM (2009)

    Google Scholar 

  3. Ni., X., Sun, J.-T., Hu, J., Chen, Z.: Cross Lingual Text Classification by Mining Multilingual Topics From Wikipedia. In: Proceedings of the Fourth ACM International Confernce on Web Search and Data Mining, pp. 375–384. ACM (2011)

    Google Scholar 

  4. Ahmed, A., Xing, E.P.: Dynamic Non-parametric Mixture Models and the Recurrent Chinese Restaurant Process: With Applications to Evolutionary Clustering. In: SDM (2008)

    Google Scholar 

  5. Ahmed, A., Ho, Q., Eisenstein, J., et al.: Unified analysis of streaming news. In: Proceedings of the 20th International Conference on World Wide Web, pp. 267–276. ACM (2011)

    Google Scholar 

  6. Ahmed, Q., Ho, C., Teo, J., Eisenstein, A.J., Smola, E.P.: Xing The Online Infinite Topic-Cluster Model: Storylines From Streaming Text. CMU-ML-11-100 (2011)

    Google Scholar 

  7. Blei, D.M., Andrew, Y.N., Michael, I.J.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)

    MATH  Google Scholar 

  8. Sproat, R., Tao, T., Zhai, C.X.: Named Entity Transliteration with Comparable Corpora. In: Proceeding ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 73–80 (2006)

    Google Scholar 

  9. Espla-Gomis, M., Sanchez-Martinez, F., Forcada, M.L.: A Simple Approach to Use Bilingual Information Sources for Word Alignment. Procesamiento del Lenguaje Natural, 93–100 (2012)

    Google Scholar 

  10. Fahrni, A., Strube, M.: HITS’ Cross-lingual Entity Linking System at TAC 2011:One Model for All Languages. In: Proceeding of Text Analysis Conference, November 14-15 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Long, Wx., Gao, Jx., Yu, Zt., Gao, Sx., Hong, Xd. (2014). Online Chinese-Vietnamese Bilingual Topic Detection Based on RCRP Algorithm with Event Elements. In: Zong, C., Nie, JY., Zhao, D., Feng, Y. (eds) Natural Language Processing and Chinese Computing. NLPCC 2014. Communications in Computer and Information Science, vol 496. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-45924-9_38

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-45924-9_38

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-45923-2

  • Online ISBN: 978-3-662-45924-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics