Abstract
This chapter reports on CMU’s work in all the five TDT-1999 tasks, including segmentation (story boundary identification), topic tracking, topic detection, first story detection, and story-link detection. We have addressed these tasks as supervised or unsupervised classification problems, and applied a variety of statistical learning algorithms to each problem for comparison. For segmentation we used exponential language models and decision trees; for topic tracking we used primarily k-nearest-neighbors classification (also language models, decision trees and a variant of the Rocchio approach); for topic detection we used a combination of incremental clustering and agglomerative hierarchical clustering, and for first story detection and story link detection we used a cosine-similarity based measure. We also studied the effect of combining the output of alternative methods for producing joint classification decisions in topic tracking. We found that a combined use of multiple methods typically improved the classification of new topics when compared to using any single method. We examined our approaches with multi-lingual corpora, including stories in English, Mandarin and Spanish, and multi-media corpora consisting of newswire texts and the results of automated speech recognition for broadcast news sources. The methods worked reasonably well under all of the above conditions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allan, J., J. Carbonell, G. Doddington, J. Yamron, and Y. Yang: 1998, Topic Detection and Tracking Pilot Study: Final Report’. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. San Francisco, CA, pp. 194–218, Morgan Kaufmann Publishers, Inc.
Bartell, B. T., G. W. Cottrell, and R. K. Belew: 1994, ‘Automatic Combination of Multiple Ranked Retrieval Systems’. In: Proceedings of the Seventeenth Annual International ACM S1G1R Conference on Research and Development in Information Retrieval. New York, pp. 173–181, The Association for Computing Machinery.
Beeferman, D., A. Berger, and J. Lafferty: 1999, ‘Statistical Models for Text Segmentation’. In: Machine Learning, Vol. 34. pp. 1–34.
Carbonell, J., Y. Yang, J. Lafferty, R. D. Brown, T. Pierce, and X. Liu: 1999, ‘CMU report on TDT-2: Segmentation, Detection and Tracking’. In: Proceedings of the DARPA Broadcast News Workshop. San Francisco, CA, pp. 117–120, Morgan Kaufmann Publishers, Inc.
Fiscus, J.: 1997, ‘A post-processin system to yield reduced word error rates: Recognizer Output Voting Error Reduction (ROVER)’. In: IEEE Workshop on Automatic Speech Recognition and Understanding.
Freitag, D.: 1998, ‘Multistrategy Learning for Information Extraction’. In: Proceedings of the Fifteenth International Conference on Machine Learning. San Francisco, pp. 161–169, Morgan Kaufmann.
Jin, H., R. Schwartz, S. Sista, and F. Walls: 1999, Topic Tracking for Radio, TV Broadcast and Newswire’. In: Proceedings of the DARPA Broadcast News Workshop. San Francisco, CA, pp. 199–204, Morgan Kaufmann Publishers, Inc.
Jr., J. J. R.: 1971, ‘Relevance feedback in information retrieval’. In: G. Salton (ed.): The SMART Retrieval System: Experiments in Automatic Document Retrieval. Englewood Cliffs, New Jersay, pp. 313–323, Prentice-Hall, Inc.
Katzer, J., M. MacGill, J. Tessier, W. Frankes, and P. Dasupta: 1982, ‘A study of the overlap among document representations’. In: Information Technology: Research and Development, Vol. 1. pp. 261–214.
Larkey, L. S. and W. B. Croft: 1998, ‘Combining Classifiers in Text Categorization’. In: Proceedings of the Nineteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, pp. 289–297, The Association for Computing Machinery.
Lee, J. H.: 1995, ‘Combining Multiple Evidence from Different Properties of Weighting Schemes’. In: Proceedings of the Eighteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, pp. 180–188, The Association for Computing Machinery.
Salton, G.: 1989, Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Reading, Pennsylvania: Addison-Wesley.
Salton, G. and C. Buckley: 1990, ‘Improving retrieval performance by relevance feedback’. Journal of American Society for Information Sciences 41, 288–297.
Schapire, R. E., Y Singer, and A. Singhal: 1998, ‘Boosting and Rocchio Applied to Text Filtering’. In: 21th Ann In ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98). pp. 215–223.
Schwartz, R., T. Imai, L. Nguyen, and J. Makhoul: 1997, ‘A Maximum Likelihood Model for Topic Classification of Broadcast News’. In: Proceedings of Eurospeech. Rhodes, Greece.
Walls, F, H. Jin, S. Sista, and R. Schwartz: 1999, ‘Topic Detection in Broadcast News’. In: Proceedings of the DARPA Broadcast News Workshop. San Francisco, CA, pp. 193–198, Morgan Kaufmann Publishers, Inc.
Yamron, J., I. Carp, L. Gillick, S. Lowe, and P. van Mulregt: 1999, ‘Topic Tracking in a News Stream’. In: Proceedings of the DARPA Broadcast News Workshop. San Francisco, CA, pp. 133–136, Morgan Kaufmann Publishers, Inc.
Yang, Y., T. Ault, and T. Pierce: 2000a, ‘Combining multiple learning strategies for effective cross-validation’. In: Proceedings of the 17th International Conference on Machine Learning (ICML00). San Francisco, pp. 1167–1182, Morgan Kaufmann.
Yang, Y., T. Ault, T. Pierce, and C. Lattimer: 2000b, ‘Improving text categorization methods for event tracking’. In: Proceedings of the Twenty-Third Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’00). New York, pp. 65–72, The Association for Computing Machinery.
Yang, Y, J. Carbonell, R. Brown, T. Pierce, B. T. Archibald, and X. Liu: 1999, ‘Learning Approaches for Detecting and Tracking News Events’. IEEE Intelligent Systems, Special Issue on Applications of Intelligent Information Retrieval 14(4), 32–43.
Yang, Y., T. Pierce, and J. Carbonell: 1998, ‘A study on retrospective and on-line event detection’. In: Proceedings of the 21th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98). pp. 28–36.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer Science+Business Media New York
About this chapter
Cite this chapter
Yang, Y., Carbonell, J., Brown, R., Lafferty, J., Pierce, T., Ault, T. (2002). Multi-strategy Learning for Topic Detection and Tracking. In: Allan, J. (eds) Topic Detection and Tracking. The Information Retrieval Series, vol 12. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0933-2_5
Download citation
DOI: https://doi.org/10.1007/978-1-4615-0933-2_5
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-5311-9
Online ISBN: 978-1-4615-0933-2
eBook Packages: Springer Book Archive