Abstract
In this paper, we propose a sophisticated technique for topic identification of documents based on event sequences using co-occurrence words. Here we consider each document as an event sequence, each event as a verb and some words correlated with the verb. We propose a new method for topic classification of documents by using Markov stochastic model. We show some experimental results to examine the method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic Detection and Tracking Pilot Study: Final Report. In: proc. DARPA Broadcast News Transcription and Understanding Workshop (1998)
Barzilay, R., Lee, L.: Catching the Drift: Probabilistic Content Models, with Applications to Generation and Summarization. In: Proceedings of the NAACL/HLT, pp. 113–120 (2004)
Japan Electronic Dictionary Research Institute (EDR): Japanese Co-occurrence Dictionary, http://www2.nict.go.jp/r/r312/EDR/J_index.html
Shibata, T., Kurohashi, S.: Unsupervised Topic Identification by Integrating Linguistics and Visual Information Based on Hidden Markov Model, COLING (2005)
Wakabayashi, K., Miura, T.: Identifying Event Sequences using Hidden Markov Model. In: Kedad, Z., Lammari, N., Métais, E., Meziane, F., Rezgui, Y. (eds.) NLDB 2007. LNCS, vol. 4592, pp. 84–95. Springer, Heidelberg (2007)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wakabayashi, K., Miura, T. (2008). Topics Identification Based on Event Sequence Using Co-occurrence Words. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds) Natural Language and Information Systems. NLDB 2008. Lecture Notes in Computer Science, vol 5039. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-69858-6_22
Download citation
DOI: https://doi.org/10.1007/978-3-540-69858-6_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-69857-9
Online ISBN: 978-3-540-69858-6
eBook Packages: Computer ScienceComputer Science (R0)