Towards Automatic Detection and Tracking of Topic Change

Holz, Florian; Teresniak, Sven

doi:10.1007/978-3-642-12116-6_27

Florian Holz¹⁷ &
Sven Teresniak¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6008))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1963 Accesses
8 Citations

Abstract

We present an approach for automatic detection of topic change. Our approach is based on the analysis of statistical features of topics in time-sliced corpora and their dynamics over time. Processing large amounts of time-annotated news text, we identify new facets regarding a stream of topics consisting of latest news of public interest. Adaptable as an addition to the well known task of topic detection and tracking we aim to boil down a daily news stream to its novelty. For that we examine the contextual shift of the concepts over time slices. To quantify the amount of change, we adopt the volatility measure from econometrics and propose a new algorithm for frequency-independent detection of topic drift and change of meaning. The proposed measure does not rely on plain word frequency but the mixture of the co-occurrences of words. So, the analysis is highly independent of the absolute word frequencies and works over the whole frequency spectrum, especially also well for low-frequent words. Aggregating the computed time-related data of the terms allows to build overview illustrations of the most evolving terms for a whole time span.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allan, J.: Introduction to topic detection and tracking, pp. 1–16. Kluwer Academic Publishers, Norwell (2002)
Google Scholar
Allan, J., et al.: Topic detection and tracking pilot study final report. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 194–218 (1998)
Google Scholar
Dunning, T.E.: Accurate methods for the statistics of surprise and coincidence. Computational Linguistics 19(1), 61–74 (1993)
Google Scholar
Heyer, G., Quasthoff, U., Wittig, T.: Text Mining: Wissensrohstoff Text – Konzepte, Algorithmen, Ergebnisse, 2nd edn. W3L-Verlag (2008)
Google Scholar
Kleinberg, J.: Bursty and hierarchical structure in streams. In: KDD 2002: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 91–101. ACM Press, New York (2002)
Chapter Google Scholar
Kumaran, G., Allan, J.: Text classification and named entities for new event detection. In: SIGIR 2004: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 297–304. ACM, New York (2004)
Google Scholar
Swan, R., Allan, J.: Extracting significant time varying features from text. In: CIKM 1999: Proceedings of the eighth international conference on Information and knowledge management, pp. 38–45. ACM, New York (1999)
Chapter Google Scholar
Swan, R., Allan, J.: Automatic generation of overview timelines. In: SIGIR 2000: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 49–56. ACM, New York (2000)
Chapter Google Scholar
Taylor, S.J.: Introduction to asset price dynamics, volatility, and prediction. In: Asset Price Dynamics, Volatility, and Prediction. Introductory Chapters. Princeton University Press, Princeton (2007)
Google Scholar
Wang, X., McCallum, A.: Topics over time: a non-markov continuous-time model of topical trends. In: KDD 2006: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 424–433. ACM, New York (2006)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

NLP Group, Department of Computer Science, University of Leipzig,
Florian Holz & Sven Teresniak

Authors

Florian Holz
View author publications
You can also search for this author in PubMed Google Scholar
Sven Teresniak
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Computing Research, National Polytechnic Institute, 07738, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Holz, F., Teresniak, S. (2010). Towards Automatic Detection and Tracking of Topic Change. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2010. Lecture Notes in Computer Science, vol 6008. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12116-6_27

Download citation

DOI: https://doi.org/10.1007/978-3-642-12116-6_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12115-9
Online ISBN: 978-3-642-12116-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics