Skip to main content

Topic Detection and Tracking

  • Chapter
  • First Online:
Text Data Mining

Abstract

With the rapid development of the Internet and social media technology, both the scale of information and the speed at which it is shared and disseminated are increasing. While people enjoy the convenience brought by the rich information on the Internet, they also suffer from the challenges brought by the information explosion. First, it is difficult for people to extract the information they need quickly and accurately from the vast amount of information on the Internet. Second, the information related to a given topic is often scattered across different periods of time and locations, which makes it more difficult for people to gain a comprehensive grasp of a topic. Faced with this massive, multisource, and diverse information, there is an urgent need for a technology that can effectively organize and aggregate information based on topics or events and that can also efficiently detect and track topics that users are interested in.

Topic detection and tracking (TDT) technology has emerged and developed against the abovementioned background, and its fundamental purpose is to help people cope with the information explosion. It can automatically identify new topics or track known topics in news and social media data streams and help users fully understand the development and evolution of a topic. In detail, TDT collects and organizes the scattered information on the Internet and determines the relationship between various factors related to a topic. This helps users obtain the full details of a topic and the relationship between one topic and the others.

In this chapter, we will firstly review the history of this field, as well as the terminology and task definition in it. Then, we will introduce topic detection and tracking methods in traditional newswire and social media respectively. Finally, we focus on a special type of topic detection task called bursty topic detection.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 89.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://catalog.ldc.upenn.edu/LDC98T25.

  2. 2.

    https://www.ldc.upenn.edu/.

References

  • Allan, J. (2012). Topic detection and tracking: Event-based information organization. New York, NY: Springer.

    MATH  Google Scholar 

  • Allan, J., Carbonell, J., Doddington, G., Yamron, J., & Yang, Y. (1998a). Topic detection and tracking pilot study final report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop (pp. 194–218).

    Google Scholar 

  • Allan, J., Lavrenko, V., & Jin, H. (2000). First story detection in tdt is hard. In Proceedings of CIKM (pp. 374–381).

    Google Scholar 

  • Allan, J., Papka, R., & Lavrenko, V. (1998b). On-line new event detection and tracking. In Proceedings of SIGIR (pp. 37–45).

    Google Scholar 

  • Becker, H., Naaman, M., & Gravano, L. (2011). Beyond trending topics: Real-world event identification on twitter. In Proceedings of ICWSM (pp. 438–441).

    Google Scholar 

  • Cataldi, M., Di Caro, L., & Schifanella, C. (2010). Emerging topic detection on twitter based on temporal and social terms evaluation. In Proceedings of MDM/KDD (pp. 1–10).

    Google Scholar 

  • Chen, Y., Amiri, H., Li, Z., & Chua, T.-S. (2013). Emerging topic detection for organizations from microblogs. In Proceedings of SIGIR (pp. 43–52).

    Google Scholar 

  • Connell, M., Feng, A., Kumaran, G., Raghavan, H., Shah, C., & Allan, J. (2004). UMass at TDT. In Proceedings of TDT (Vol. 19, pp. 109–155).

    Google Scholar 

  • Diao, Q., Jiang, J., Zhu, F., & Lim, E.-P. (2012). Finding bursty topics from microblogs. In Proceedings of ACL (pp. 536–544).

    Google Scholar 

  • Fang, A., Macdonald, C., Ounis, I., & Habel, P. (2016). Using word embedding to evaluate the coherence of topics from twitter data. In Proceedings of SIGIR (pp. 1057–1060).

    Google Scholar 

  • Feng, W., Zhang, C., Zhang, W., Han, J., Wang, J., Aggarwal, C., et al. (2015). Streamcube: Hierarchical spatio-temporal hashtag clustering for event exploration over the twitter stream. In Proceedings of ICDE (pp. 1561–1572).

    Google Scholar 

  • Fung, G. P. C., Yu, J. X., Yu, P. S., & Lu, H. (2005). Parameter free bursty events detection in text streams. In Proceedings of VLDB (pp. 181–192).

    Google Scholar 

  • He, Q., Chang, K., & Lim, E.-P. (2007a). Analyzing feature trajectories for event detection. In Proceedings of SIGIR (pp. 207–214).

    Google Scholar 

  • He, Q., Chang, K., Lim, E.-P., & Zhang, J. (2007b). Bursty feature representation for clustering text streams. In Proceedings of the 2007 SIAM International Conference on Data Mining (pp. 491–496). Philadelphia: SIAM.

    Google Scholar 

  • Kleinberg, J. (2003). Bursty and hierarchical structure in streams. Data Mining and Knowledge Discovery, 7(4), 373–397.

    Article  MathSciNet  Google Scholar 

  • Kumaran, G., & Allan, J. (2004). Text classification and named entities for new event detection. In Proceedings of SIGIR (pp. 297–304).

    Google Scholar 

  • Kumaran, G., & Allan, J. (2005). Using names and topics for new event detection. In Proceedings of HLT-EMNLP (pp. 121–128). Vancouver, BC: Association for Computational Linguistics.

    Google Scholar 

  • Lavrenko, V., & Croft, W. B. (2001). Relevance based language models. In Proceedings of SIGIR, SIGIR ’01 (pp. 120–127). New York, NY: Association for Computing Machinery.

    Google Scholar 

  • Lee, R., & Sumiya, K. (2010). Measuring geographical regularities of crowd behaviors for twitter-based geo-social event detection. In Proceedings of ACM SIGSPATIAL (pp. 1–10).

    Google Scholar 

  • Leek, T., Schwartz, R., & Sista, S. (2002). Probabilistic approaches to topic detection and tracking. In Topic detection and tracking (pp. 67–83). Berlin: Springer.

    Chapter  Google Scholar 

  • Lin, J., Snow, R., & Morgan, W. (2011). Smoothing techniques for adaptive online language models: Topic tracking in tweet streams. In Proceedings of ACM SIGKDD (pp. 422–429).

    Google Scholar 

  • Massoudi, K., Tsagkias, M., De Rijke, M., & Weerkamp, W. (2011). Incorporating query expansion and quality indicators in searching microblog posts. In European Conference on Information Retrieval (pp. 362–367). Berlin: Springer.

    Google Scholar 

  • Petrović, S., Osborne, M., & Lavrenko, V. (2010). Streaming first story detection with application to twitter. In Proceedings of NAACL-HLT, HLT ’10 (pp. 181–189). Stroudsburg: Association for Computational Linguistics.

    Google Scholar 

  • Phuvipadawat, S., & Murata, T. (2010). Breaking news detection and tracking in twitter. In Proceedings of IEEE/WIC/ACM WI-IAT, WI-IAT ’10 (pp. 120–123). New York: IEEE Computer Society.

    Google Scholar 

  • Popescu, A.-M., & Pennacchiotti, M. (2010). Detecting controversial events from twitter. In Proceedings of CIKM, CIKM ’10 (pp. 1873–1876). New York, NY: Association for Computing Machinery.

    Google Scholar 

  • Popescu, A.-M., Pennacchiotti, M., & Paranjpe, D. (2011). Extracting events and event descriptions from twitter. In Proceedings of WWW (pp. 105–106).

    Google Scholar 

  • Yamron, J. P., Knecht, S., & Mulbregt, P. V. (2000). Dragon’s tracking and detection systems for the TDT2000 evaluation. In Proceedings of the Broadcast News Transcription and Understanding Workshop (pp. 75–79).

    Google Scholar 

  • Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. In Proceedings of SIGIR (pp. 42–49).

    Google Scholar 

  • Yang, Y., Pierce, T., & Carbonell, J. (1998). A study of retrospective and on-line event detection. In Proceedings of SIGIR (pp. 28–36).

    Google Scholar 

  • Yu, H., Zhang, Y., Ting, L., & Sheng, L. (2007). Topic detection and tracking review. Journal of Chinese Information Processing, 6(21), 77–79.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Tsinghua University Press

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Zong, C., Xia, R., Zhang, J. (2021). Topic Detection and Tracking. In: Text Data Mining. Springer, Singapore. https://doi.org/10.1007/978-981-16-0100-2_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-0100-2_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-0099-9

  • Online ISBN: 978-981-16-0100-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics