Finding Division Points for Time-Series Corpus Based on Topic Changes

Kobayashi, Hiroshi; Saga, Ryosuke

doi:10.1007/978-3-319-07731-4_37

Hiroshi Kobayashi¹⁶ &
Ryosuke Saga¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8521))

Included in the following conference series:

International Conference on Human Interface and the Management of Information

2901 Accesses

Abstract

This paper describes the discovery method of finding proper points for dividing a corpus with time series information for extracting local and frequent keywords. Local and frequent keywords express a corpus with time series information and are useful for comprehending it. To extract keywords from the corpus, the previous works proposed corpus separating method. However, this method divides the corpus at equal intervals so that it cannot take into account the change of topic. To consider the change of topics and divide the corpus based on it, we utilize the idea of topic model and the topic extracted by Latent Dirichlet Allocation (LDA). In the experiment using newspaper articles during five years topics, we confirm that the topics of each document change as time passed by using the output from LDA and the point which is available on dividing the corpus by the change of topics notably is observable.

Download to read the full chapter text

Chapter PDF

Finding division points for a time series corpus based on structural change point detection

Article 11 March 2016

ChronoSAGE: Diversifying Topic Modeling Chronologically

The Research About Topic Extraction Method Based on the DTS-ILDA Model

Keywords

References

Liu, V., Curran, R.: Web Text Corpus for Natural Language Processing. In: Proceedings of the 11th Conference of The European Chapter of The Association for Computational Linguistics, Trento, Italy, pp. 233–240 (2006)
Google Scholar
Liu, F., Liu, F., Liu, Y.: Automatic Keyword Extraction for TheMeeting Corpus Using Supervised Approach and Bigram Expansion. In: IEEE Workshop on Spoken Language Technology, pp. 181–184 (2008)
Google Scholar
Dredze, M., Wallach, H., Puller, D., Pereira, F.: Generating Summary Keywords for Emails UsingTopics. In: Proceedings of The 2008 International Conference on Intelligent User Interfaces, pp. 199–206 (2008)
Google Scholar
Litvak, M., Last, M.: Graph-based Keyword Extraction for Single-Document Summarization. In: Proceeding of The Workshop on Multi-source Multilingual Information Extraction and Summarization, pp. 17–24. Association for Computational Linguistics (2008)
Google Scholar
Saga, R., Tsuji, H.: Improved Keyword Extraction by Separation into Multiple Document Sets According to Time Series. In: HCII, CCIS 374, pp. 450–453 (2013)
Google Scholar
Church, K., Gale, W.: Inverse Document Frequency (IDF): A Measure of Deviations from Poisson. In: Proceedings of the Third Workshop on Very Large Corpora, pp. 121–130 (1995)
Google Scholar
Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
MATH Google Scholar
McCallum, K.A.: MALLET, http://mallet.cs.umass.edu

Download references

Author information

Authors and Affiliations

Osaka Prefecture University, 1-1, Gakuen-cho, Naka-ku, Sakai-shi, Osaka, Japan
Hiroshi Kobayashi & Ryosuke Saga

Authors

Hiroshi Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar
Ryosuke Saga
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Management Science, Tokyo University of Science, Kagurazaka, 162-8601, Shinjuku-ku, Tokyo, Japan
Sakae Yamamoto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kobayashi, H., Saga, R. (2014). Finding Division Points for Time-Series Corpus Based on Topic Changes. In: Yamamoto, S. (eds) Human Interface and the Management of Information. Information and Knowledge Design and Evaluation. HIMI 2014. Lecture Notes in Computer Science, vol 8521. Springer, Cham. https://doi.org/10.1007/978-3-319-07731-4_37

Download citation

DOI: https://doi.org/10.1007/978-3-319-07731-4_37
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07730-7
Online ISBN: 978-3-319-07731-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Finding Division Points for Time-Series Corpus Based on Topic Changes

Abstract

Chapter PDF

Similar content being viewed by others

Finding division points for a time series corpus based on structural change point detection

ChronoSAGE: Diversifying Topic Modeling Chronologically

The Research About Topic Extraction Method Based on the DTS-ILDA Model

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Finding Division Points for Time-Series Corpus Based on Topic Changes

Abstract

Chapter PDF

Similar content being viewed by others

Finding division points for a time series corpus based on structural change point detection

ChronoSAGE: Diversifying Topic Modeling Chronologically

The Research About Topic Extraction Method Based on the DTS-ILDA Model

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation