Abstract
With the rapid development of the Internet and social media technology, both the scale of information and the speed at which it is shared and disseminated are increasing. While people enjoy the convenience brought by the rich information on the Internet, they also suffer from the challenges brought by the information explosion. First, it is difficult for people to extract the information they need quickly and accurately from the vast amount of information on the Internet. Second, the information related to a given topic is often scattered across different periods of time and locations, which makes it more difficult for people to gain a comprehensive grasp of a topic. Faced with this massive, multisource, and diverse information, there is an urgent need for a technology that can effectively organize and aggregate information based on topics or events and that can also efficiently detect and track topics that users are interested in.
Topic detection and tracking (TDT) technology has emerged and developed against the abovementioned background, and its fundamental purpose is to help people cope with the information explosion. It can automatically identify new topics or track known topics in news and social media data streams and help users fully understand the development and evolution of a topic. In detail, TDT collects and organizes the scattered information on the Internet and determines the relationship between various factors related to a topic. This helps users obtain the full details of a topic and the relationship between one topic and the others.
In this chapter, we will firstly review the history of this field, as well as the terminology and task definition in it. Then, we will introduce topic detection and tracking methods in traditional newswire and social media respectively. Finally, we focus on a special type of topic detection task called bursty topic detection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Allan, J. (2012). Topic detection and tracking: Event-based information organization. New York, NY: Springer.
Allan, J., Carbonell, J., Doddington, G., Yamron, J., & Yang, Y. (1998a). Topic detection and tracking pilot study final report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop (pp. 194–218).
Allan, J., Lavrenko, V., & Jin, H. (2000). First story detection in tdt is hard. In Proceedings of CIKM (pp. 374–381).
Allan, J., Papka, R., & Lavrenko, V. (1998b). On-line new event detection and tracking. In Proceedings of SIGIR (pp. 37–45).
Becker, H., Naaman, M., & Gravano, L. (2011). Beyond trending topics: Real-world event identification on twitter. In Proceedings of ICWSM (pp. 438–441).
Cataldi, M., Di Caro, L., & Schifanella, C. (2010). Emerging topic detection on twitter based on temporal and social terms evaluation. In Proceedings of MDM/KDD (pp. 1–10).
Chen, Y., Amiri, H., Li, Z., & Chua, T.-S. (2013). Emerging topic detection for organizations from microblogs. In Proceedings of SIGIR (pp. 43–52).
Connell, M., Feng, A., Kumaran, G., Raghavan, H., Shah, C., & Allan, J. (2004). UMass at TDT. In Proceedings of TDT (Vol. 19, pp. 109–155).
Diao, Q., Jiang, J., Zhu, F., & Lim, E.-P. (2012). Finding bursty topics from microblogs. In Proceedings of ACL (pp. 536–544).
Fang, A., Macdonald, C., Ounis, I., & Habel, P. (2016). Using word embedding to evaluate the coherence of topics from twitter data. In Proceedings of SIGIR (pp. 1057–1060).
Feng, W., Zhang, C., Zhang, W., Han, J., Wang, J., Aggarwal, C., et al. (2015). Streamcube: Hierarchical spatio-temporal hashtag clustering for event exploration over the twitter stream. In Proceedings of ICDE (pp. 1561–1572).
Fung, G. P. C., Yu, J. X., Yu, P. S., & Lu, H. (2005). Parameter free bursty events detection in text streams. In Proceedings of VLDB (pp. 181–192).
He, Q., Chang, K., & Lim, E.-P. (2007a). Analyzing feature trajectories for event detection. In Proceedings of SIGIR (pp. 207–214).
He, Q., Chang, K., Lim, E.-P., & Zhang, J. (2007b). Bursty feature representation for clustering text streams. In Proceedings of the 2007 SIAM International Conference on Data Mining (pp. 491–496). Philadelphia: SIAM.
Kleinberg, J. (2003). Bursty and hierarchical structure in streams. Data Mining and Knowledge Discovery, 7(4), 373–397.
Kumaran, G., & Allan, J. (2004). Text classification and named entities for new event detection. In Proceedings of SIGIR (pp. 297–304).
Kumaran, G., & Allan, J. (2005). Using names and topics for new event detection. In Proceedings of HLT-EMNLP (pp. 121–128). Vancouver, BC: Association for Computational Linguistics.
Lavrenko, V., & Croft, W. B. (2001). Relevance based language models. In Proceedings of SIGIR, SIGIR ’01 (pp. 120–127). New York, NY: Association for Computing Machinery.
Lee, R., & Sumiya, K. (2010). Measuring geographical regularities of crowd behaviors for twitter-based geo-social event detection. In Proceedings of ACM SIGSPATIAL (pp. 1–10).
Leek, T., Schwartz, R., & Sista, S. (2002). Probabilistic approaches to topic detection and tracking. In Topic detection and tracking (pp. 67–83). Berlin: Springer.
Lin, J., Snow, R., & Morgan, W. (2011). Smoothing techniques for adaptive online language models: Topic tracking in tweet streams. In Proceedings of ACM SIGKDD (pp. 422–429).
Massoudi, K., Tsagkias, M., De Rijke, M., & Weerkamp, W. (2011). Incorporating query expansion and quality indicators in searching microblog posts. In European Conference on Information Retrieval (pp. 362–367). Berlin: Springer.
Petrović, S., Osborne, M., & Lavrenko, V. (2010). Streaming first story detection with application to twitter. In Proceedings of NAACL-HLT, HLT ’10 (pp. 181–189). Stroudsburg: Association for Computational Linguistics.
Phuvipadawat, S., & Murata, T. (2010). Breaking news detection and tracking in twitter. In Proceedings of IEEE/WIC/ACM WI-IAT, WI-IAT ’10 (pp. 120–123). New York: IEEE Computer Society.
Popescu, A.-M., & Pennacchiotti, M. (2010). Detecting controversial events from twitter. In Proceedings of CIKM, CIKM ’10 (pp. 1873–1876). New York, NY: Association for Computing Machinery.
Popescu, A.-M., Pennacchiotti, M., & Paranjpe, D. (2011). Extracting events and event descriptions from twitter. In Proceedings of WWW (pp. 105–106).
Yamron, J. P., Knecht, S., & Mulbregt, P. V. (2000). Dragon’s tracking and detection systems for the TDT2000 evaluation. In Proceedings of the Broadcast News Transcription and Understanding Workshop (pp. 75–79).
Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. In Proceedings of SIGIR (pp. 42–49).
Yang, Y., Pierce, T., & Carbonell, J. (1998). A study of retrospective and on-line event detection. In Proceedings of SIGIR (pp. 28–36).
Yu, H., Zhang, Y., Ting, L., & Sheng, L. (2007). Topic detection and tracking review. Journal of Chinese Information Processing, 6(21), 77–79.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2021 Tsinghua University Press
About this chapter
Cite this chapter
Zong, C., Xia, R., Zhang, J. (2021). Topic Detection and Tracking. In: Text Data Mining. Springer, Singapore. https://doi.org/10.1007/978-981-16-0100-2_9
Download citation
DOI: https://doi.org/10.1007/978-981-16-0100-2_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-0099-9
Online ISBN: 978-981-16-0100-2
eBook Packages: Computer ScienceComputer Science (R0)