Abstract
This paper explores two linguistically motivated restrictions on the set of words used for topic tracking on newspaper articles: named entities and headline words. We assume that named entities is one of the linguistic features for topic tracking, since both topic and event are related to a specific place and time in a story. The basic idea to use headline words for the tracking task is that headline is a compact representation of the original story, which helps people to quickly understand the most important information contained in a story. Headline words are automatically generated using headline generation technique. The method was tested on the Mainichi Shimbun Newspaper in Japanese, and the results of topic tracking show that the system works well even for a small number of positive training data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic Detection and Tracking Pilot Study Final Report. In: Proc. of the DARPA Workshop (1997)
Banko, M., Mittal, V., Witbrock, M.: Headline Generation Based on Statistical Translation. In: Proc. of ACL-2000, pp. 318–325 (2000)
Carbonell, J., Yang, Y., Lafferty, J., Brown, R.D., Pierce, T., Liu, X.: CMU Report on TDT-2: Segmentation, Detection and Tracking. In: Proc. of the DARPA Workshop (1999)
Cutting, D.R., Karger, D.R., Pedersen, L.O., Tukey, J.W.: Scatter/Gather: a Cluster-based Approach to Browsing Large Document Collections. In: Proc. of ACM SIGIR-1992, pp. 318–329 (1992)
Jin, H., Schwartz, R., Sista, S., Walls, F.: Topic Tracking for Radio, TV Broadcast, and Newswire. In: Proc. of the DARPA Broadcast News Transcription and Understanding Workshop (1999)
Katz, S.: Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing 24 (1987)
Kudo, T., Matsumoto, Y.: Fast Methods for Kernel-Based Text Analysis. In: Proc. of the ACL-2003, pp. 24–31 (2003)
Lewis, D.D.: An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task. In: Proc. of the ACM SIGIR-1994, pp. 37–50 (1994)
Lewis, D.D., Schapire, R.E., Callan, J.P., Papka, R.: Training Algorithms for Linear Text Classifiers. In: Proc. of the ACM SIGIR-1996, pp. 298–306 (1996)
Lowe, S.A.: The Beta-binomial Mixture Model and its Application to TDT Tracking and Detection. In: Proc. of the DARPA Workshop (1999)
Matsumoto, Y., Kitauchi, A., Yamashita, T., Haruno, Y., Imaichi, O., Imamura, T.: Japanese Morphological Analysis System Chasen Mannual. NAIST Technical Report NAIST-IS-TR97007 (1997)
Oard, D.W.: Topic Tracking with the PRISE Information Retrieval System. In: Proc. of the DARPA Workshop (1999)
Papka, R., Allan, J.: UMASS Approaches to Detection and Tracking at TDT2. In: Proc. of the DARPA Workshop (1999)
Schapire, R.E.: BoosTexter: A Boosting-based System for Text Categorization. Journal of Machine Learning (1999)
Schwartz, R., Imai, T., Nguyen, L., Makhoul, J.: A Maximum Likelihood Model for Topic Classification of Broadcast News. In: Proc. of Eurospeech, pp. 270–278 (1996)
Strzalkowski, T., Stein, G.C., Wise, G.B.: GE.Tracker: A Robust, Lightweight Topic Tracking System. In: Proc. of the DARPA Workshop (1999)
Yamron, Carp: Topic Tracking in a News Stream. In: Proc. of the DARPA Broadcast News Transcription and Understanding Workshop (1999)
Yamron, J.P., Carp, I., Gillick, L., Lowe, S., Mulbregt, P.V.: Topic Tracking in a News Stream. In: Proc. of the DARPA Workshop (1999)
Yang, Y.: Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval. In: Proc. of the ACM SIGIR-1994, pp. 13–22 (1994)
Yang, Y., Pierce, T., Carbonell, J.: A Study on Retrospective and On-Line Event Detection. In: Proc. of the ACM SIGIR-1998, pp. 28–36 (1998)
Yang, Y., Ault, T., Pierce, T., Lattimer, C.W.: Improving Text Categorization Methods for Event Tracking. In: Proc. of the ACM SIGIR-2000, pp. 65–72 (2000)
Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Fukumoto, F., Yamaji, Y. (2005). Topic Tracking Based on Linguistic Features. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_2
Download citation
DOI: https://doi.org/10.1007/11562214_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29172-5
Online ISBN: 978-3-540-31724-1
eBook Packages: Computer ScienceComputer Science (R0)