Skip to main content

Explorations Within Topic Tracking and Detection

  • Chapter
Book cover Topic Detection and Tracking

Part of the book series: The Information Retrieval Series ((INRE,volume 12))

Abstract

This chapter presents the system used by the Center for Intelligent Information Retrieval (CIIR) at the University of Massachusetts for its participation in four of the five TDT tasks: tracking, detection, first story detection, and story link detection. For each task, we discuss the parameter setting approach that we used and the results of our system on the test data.

For the task of link detection, we look more carefully at score normalization across different languages and media types. We find that we can improve results noticeably though not substantially by normalizing scores differently depending upon the source language. We also consider smoothing the vocabulary in stories using a “query expansion” technique from Information Retrieval to add additional words from the corpus to each story. This results in substantial improvements.

In addition, we use TDT evaluation approaches to show that the tracking performance that sites are achieving is what is expected from Information Retrieval technology. We further show that any first story detection system based on a tracking approach is unlikely to be sufficiently accurate for most purposes. Finally, we present an overview of an automatic timeline generation system that we developed using TDT data.

Russell Swan was the primary investigator and author for the automatic timeline construction work discussed in this chapter. He passed away unexpectedly after completing the work but before this chapter was published.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allan, J., Jin, H., Rajman, M., Wayne, C., Gildea, D., Lavrenko, V., Hoberman, R., and Caputo, D. (1999). Topic-based novelty detection: 1999 summer workshop at CLSP, final report. Available at http://www.clsp.jhu.edu/ws99/tdt.

    Google Scholar 

  2. Allan, J., Lavrenko, V., and Jin, H. (2000). First story detection in TDT is hard. In Ninth International Conference on Information Knowledge Management (CIKM), pages 374–381. ACM Press.

    Google Scholar 

  3. Bikel, D., Miller, S., Schwartz, R., and Weischedel, R. (1997). Nymble: a high-performance learning name-finder. In Fifth Conference on Applied Natural Language Processing, pages 194–201. ACL.

    Chapter  Google Scholar 

  4. Bowman, A.W., and Azzalini, A. (1997). Applied Smoothing Techniques for Data Analysis. Oxford Science Publications.

    MATH  Google Scholar 

  5. Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu M.M., and Gatford, M. (1995). Okapi at TREC-3. In Proceedings of the Text Retrieval Conference (TREC-3). NIST Special Publication.

    Google Scholar 

  6. Swan, R. and Allan, J. (1999). Extracting significant time varying features from text. In Eighth International Conference on Information Knowledge Management (CIKM’ 99), pages 38–45, Kansas City, Missouri. ACM.

    Chapter  Google Scholar 

  7. Swan, R. and Allan, J. (2000). Automatic generation of overview timelines. In Proceedings of ACM SIGIR, Research and Development in Information Retrieval, pages 49–56.

    Google Scholar 

  8. Witten, I. and Bell, T. (1991). The zero-frequency problem: Estimating the probabilities of novel events in adaptive text compression. IEEE Transactions on Information Theory, 37:1085–1094.

    Article  Google Scholar 

  9. Xu, J., Broglio, J., and Croft, W. B. (1994). The design and implementation of a part of speech tagger for english. Technical Report IR-52, Center for Intelligent Information Retrieval, University of Massachusetts, Amherst.

    Google Scholar 

  10. Xu, J. and Croft, W. B. (1996). Query expansion using local and global document analysis. In Proceedings of ACM SIGIR, Research and Development in Information Retrieval, pages 4–11.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media New York

About this chapter

Cite this chapter

Allan, J., Lavrenko, V., Swan, R. (2002). Explorations Within Topic Tracking and Detection. In: Allan, J. (eds) Topic Detection and Tracking. The Information Retrieval Series, vol 12. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0933-2_10

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-0933-2_10

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-5311-9

  • Online ISBN: 978-1-4615-0933-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics