Skip to main content

Multi-strategy Learning for Topic Detection and Tracking

A joint report of CMU approaches to multilingual TDT

  • Chapter
Topic Detection and Tracking

Part of the book series: The Information Retrieval Series ((INRE,volume 12))

Abstract

This chapter reports on CMU’s work in all the five TDT-1999 tasks, including segmentation (story boundary identification), topic tracking, topic detection, first story detection, and story-link detection. We have addressed these tasks as supervised or unsupervised classification problems, and applied a variety of statistical learning algorithms to each problem for comparison. For segmentation we used exponential language models and decision trees; for topic tracking we used primarily k-nearest-neighbors classification (also language models, decision trees and a variant of the Rocchio approach); for topic detection we used a combination of incremental clustering and agglomerative hierarchical clustering, and for first story detection and story link detection we used a cosine-similarity based measure. We also studied the effect of combining the output of alternative methods for producing joint classification decisions in topic tracking. We found that a combined use of multiple methods typically improved the classification of new topics when compared to using any single method. We examined our approaches with multi-lingual corpora, including stories in English, Mandarin and Spanish, and multi-media corpora consisting of newswire texts and the results of automated speech recognition for broadcast news sources. The methods worked reasonably well under all of the above conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allan, J., J. Carbonell, G. Doddington, J. Yamron, and Y. Yang: 1998, Topic Detection and Tracking Pilot Study: Final Report’. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. San Francisco, CA, pp. 194–218, Morgan Kaufmann Publishers, Inc.

    Google Scholar 

  2. Bartell, B. T., G. W. Cottrell, and R. K. Belew: 1994, ‘Automatic Combination of Multiple Ranked Retrieval Systems’. In: Proceedings of the Seventeenth Annual International ACM S1G1R Conference on Research and Development in Information Retrieval. New York, pp. 173–181, The Association for Computing Machinery.

    Google Scholar 

  3. Beeferman, D., A. Berger, and J. Lafferty: 1999, ‘Statistical Models for Text Segmentation’. In: Machine Learning, Vol. 34. pp. 1–34.

    Google Scholar 

  4. Carbonell, J., Y. Yang, J. Lafferty, R. D. Brown, T. Pierce, and X. Liu: 1999, ‘CMU report on TDT-2: Segmentation, Detection and Tracking’. In: Proceedings of the DARPA Broadcast News Workshop. San Francisco, CA, pp. 117–120, Morgan Kaufmann Publishers, Inc.

    Google Scholar 

  5. Fiscus, J.: 1997, ‘A post-processin system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)’. In: IEEE Workshop on Automatic Speech Recognition and Understanding.

    Google Scholar 

  6. Freitag, D.: 1998, ‘Multistrategy Learning for Information Extraction’. In: Proceedings of the Fifteenth International Conference on Machine Learning. San Francisco, pp. 161–169, Morgan Kaufmann.

    Google Scholar 

  7. Jin, H., R. Schwartz, S. Sista, and F. Walls: 1999, Topic Tracking for Radio, TV Broadcast and Newswire’. In: Proceedings of the DARPA Broadcast News Workshop. San Francisco, CA, pp. 199–204, Morgan Kaufmann Publishers, Inc.

    Google Scholar 

  8. Jr., J. J. R.: 1971, ‘Relevance feedback in information retrieval’. In: G. Salton (ed.): The SMART Retrieval System: Experiments in Automatic Document Retrieval. Englewood Cliffs, New Jersay, pp. 313–323, Prentice-Hall, Inc.

    Google Scholar 

  9. Katzer, J., M. MacGill, J. Tessier, W. Frankes, and P. Dasupta: 1982, ‘A study of the overlap among document representations’. In: Information Technology: Research and Development, Vol. 1. pp. 261–214.

    Google Scholar 

  10. Larkey, L. S. and W. B. Croft: 1998, ‘Combining Classifiers in Text Categorization’. In: Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, pp. 289–297, The Association for Computing Machinery.

    Google Scholar 

  11. Lee, J. H.: 1995, ‘Combining Multiple Evidence from Different Properties of Weighting Schemes’. In: Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, pp. 180–188, The Association for Computing Machinery.

    Google Scholar 

  12. Salton, G.: 1989, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Reading, Pennsylvania: Addison-Wesley.

    Google Scholar 

  13. Salton, G. and C. Buckley: 1990, ‘Improving retrieval performance by relevance feedback’. Journal of American Society for Information Sciences 41, 288–297.

    Article  Google Scholar 

  14. Schapire, R. E., Y Singer, and A. Singhal: 1998, ‘Boosting and Rocchio Applied to Text Filtering’. In: 21th Ann In ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98). pp. 215–223.

    Google Scholar 

  15. Schwartz, R., T. Imai, L. Nguyen, and J. Makhoul: 1997, ‘A Maximum Likelihood Model for Topic Classification of Broadcast News’. In: Proceedings of Eurospeech. Rhodes, Greece.

    Google Scholar 

  16. Walls, F, H. Jin, S. Sista, and R. Schwartz: 1999, ‘Topic Detection in Broadcast News’. In: Proceedings of the DARPA Broadcast News Workshop. San Francisco, CA, pp. 193–198, Morgan Kaufmann Publishers, Inc.

    Google Scholar 

  17. Yamron, J., I. Carp, L. Gillick, S. Lowe, and P. van Mulregt: 1999, ‘Topic Tracking in a News Stream’. In: Proceedings of the DARPA Broadcast News Workshop. San Francisco, CA, pp. 133–136, Morgan Kaufmann Publishers, Inc.

    Google Scholar 

  18. Yang, Y., T. Ault, and T. Pierce: 2000a, ‘Combining multiple learning strategies for effective cross-validation’. In: Proceedings of the 17th International Conference on Machine Learning (ICML00). San Francisco, pp. 1167–1182, Morgan Kaufmann.

    Google Scholar 

  19. Yang, Y., T. Ault, T. Pierce, and C. Lattimer: 2000b, ‘Improving text categorization methods for event tracking’. In: Proceedings of the Twenty-Third Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’00). New York, pp. 65–72, The Association for Computing Machinery.

    Google Scholar 

  20. Yang, Y, J. Carbonell, R. Brown, T. Pierce, B. T. Archibald, and X. Liu: 1999, ‘Learning Approaches for Detecting and Tracking News Events’. IEEE Intelligent Systems, Special Issue on Applications of Intelligent Information Retrieval 14(4), 32–43.

    Google Scholar 

  21. Yang, Y., T. Pierce, and J. Carbonell: 1998, ‘A study on retrospective and on-line event detection’. In: Proceedings of the 21th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98). pp. 28–36.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media New York

About this chapter

Cite this chapter

Yang, Y., Carbonell, J., Brown, R., Lafferty, J., Pierce, T., Ault, T. (2002). Multi-strategy Learning for Topic Detection and Tracking. In: Allan, J. (eds) Topic Detection and Tracking. The Information Retrieval Series, vol 12. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0933-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-0933-2_5

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-5311-9

  • Online ISBN: 978-1-4615-0933-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics