Finding Division Points for Time-Series Corpus Based on Topic Changes

  • Hiroshi Kobayashi
  • Ryosuke Saga
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8521)


This paper describes the discovery method of finding proper points for dividing a corpus with time series information for extracting local and frequent keywords. Local and frequent keywords express a corpus with time series information and are useful for comprehending it. To extract keywords from the corpus, the previous works proposed corpus separating method. However, this method divides the corpus at equal intervals so that it cannot take into account the change of topic. To consider the change of topics and divide the corpus based on it, we utilize the idea of topic model and the topic extracted by Latent Dirichlet Allocation (LDA). In the experiment using newspaper articles during five years topics, we confirm that the topics of each document change as time passed by using the output from LDA and the point which is available on dividing the corpus by the change of topics notably is observable.


keyword extraction time series information LDA 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Liu, V., Curran, R.: Web Text Corpus for Natural Language Processing. In: Proceedings of the 11th Conference of The European Chapter of The Association for Computational Linguistics, Trento, Italy, pp. 233–240 (2006)Google Scholar
  2. 2.
    Liu, F., Liu, F., Liu, Y.: Automatic Keyword Extraction for TheMeeting Corpus Using Supervised Approach and Bigram Expansion. In: IEEE Workshop on Spoken Language Technology, pp. 181–184 (2008)Google Scholar
  3. 3.
    Dredze, M., Wallach, H., Puller, D., Pereira, F.: Generating Summary Keywords for Emails UsingTopics. In: Proceedings of The 2008 International Conference on Intelligent User Interfaces, pp. 199–206 (2008)Google Scholar
  4. 4.
    Litvak, M., Last, M.: Graph-based Keyword Extraction for Single-Document Summarization. In: Proceeding of The Workshop on Multi-source Multilingual Information Extraction and Summarization, pp. 17–24. Association for Computational Linguistics (2008)Google Scholar
  5. 5.
    Saga, R., Tsuji, H.: Improved Keyword Extraction by Separation into Multiple Document Sets According to Time Series. In: HCII, CCIS 374, pp. 450–453 (2013)Google Scholar
  6. 6.
    Church, K., Gale, W.: Inverse Document Frequency (IDF): A Measure of Deviations from Poisson. In: Proceedings of the Third Workshop on Very Large Corpora, pp. 121–130 (1995)Google Scholar
  7. 7.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  8. 8.
    McCallum, K.A.: MALLET,

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Hiroshi Kobayashi
    • 1
  • Ryosuke Saga
    • 1
  1. 1.Osaka Prefecture UniversitySakai-shiJapan

Personalised recommendations