Skip to main content

Topic Tracking Based on Linguistic Features

  • Conference paper
Book cover Natural Language Processing – IJCNLP 2005 (IJCNLP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3651))

Included in the following conference series:

Abstract

This paper explores two linguistically motivated restrictions on the set of words used for topic tracking on newspaper articles: named entities and headline words. We assume that named entities is one of the linguistic features for topic tracking, since both topic and event are related to a specific place and time in a story. The basic idea to use headline words for the tracking task is that headline is a compact representation of the original story, which helps people to quickly understand the most important information contained in a story. Headline words are automatically generated using headline generation technique. The method was tested on the Mainichi Shimbun Newspaper in Japanese, and the results of topic tracking show that the system works well even for a small number of positive training data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic Detection and Tracking Pilot Study Final Report. In: Proc. of the DARPA Workshop (1997)

    Google Scholar 

  2. Banko, M., Mittal, V., Witbrock, M.: Headline Generation Based on Statistical Translation. In: Proc. of ACL-2000, pp. 318–325 (2000)

    Google Scholar 

  3. Carbonell, J., Yang, Y., Lafferty, J., Brown, R.D., Pierce, T., Liu, X.: CMU Report on TDT-2: Segmentation, Detection and Tracking. In: Proc. of the DARPA Workshop (1999)

    Google Scholar 

  4. Cutting, D.R., Karger, D.R., Pedersen, L.O., Tukey, J.W.: Scatter/Gather: a Cluster-based Approach to Browsing Large Document Collections. In: Proc. of ACM SIGIR-1992, pp. 318–329 (1992)

    Google Scholar 

  5. Jin, H., Schwartz, R., Sista, S., Walls, F.: Topic Tracking for Radio, TV Broadcast, and Newswire. In: Proc. of the DARPA Broadcast News Transcription and Understanding Workshop (1999)

    Google Scholar 

  6. Katz, S.: Estimation of Probabilities from Sparse Data for the Language Model Component of a Speech Recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing 24 (1987)

    Google Scholar 

  7. Kudo, T., Matsumoto, Y.: Fast Methods for Kernel-Based Text Analysis. In: Proc. of the ACL-2003, pp. 24–31 (2003)

    Google Scholar 

  8. Lewis, D.D.: An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task. In: Proc. of the ACM SIGIR-1994, pp. 37–50 (1994)

    Google Scholar 

  9. Lewis, D.D., Schapire, R.E., Callan, J.P., Papka, R.: Training Algorithms for Linear Text Classifiers. In: Proc. of the ACM SIGIR-1996, pp. 298–306 (1996)

    Google Scholar 

  10. Lowe, S.A.: The Beta-binomial Mixture Model and its Application to TDT Tracking and Detection. In: Proc. of the DARPA Workshop (1999)

    Google Scholar 

  11. Matsumoto, Y., Kitauchi, A., Yamashita, T., Haruno, Y., Imaichi, O., Imamura, T.: Japanese Morphological Analysis System Chasen Mannual. NAIST Technical Report NAIST-IS-TR97007 (1997)

    Google Scholar 

  12. Oard, D.W.: Topic Tracking with the PRISE Information Retrieval System. In: Proc. of the DARPA Workshop (1999)

    Google Scholar 

  13. Papka, R., Allan, J.: UMASS Approaches to Detection and Tracking at TDT2. In: Proc. of the DARPA Workshop (1999)

    Google Scholar 

  14. Schapire, R.E.: BoosTexter: A Boosting-based System for Text Categorization. Journal of Machine Learning (1999)

    Google Scholar 

  15. Schwartz, R., Imai, T., Nguyen, L., Makhoul, J.: A Maximum Likelihood Model for Topic Classification of Broadcast News. In: Proc. of Eurospeech, pp. 270–278 (1996)

    Google Scholar 

  16. Strzalkowski, T., Stein, G.C., Wise, G.B.: GE.Tracker: A Robust, Lightweight Topic Tracking System. In: Proc. of the DARPA Workshop (1999)

    Google Scholar 

  17. Yamron, Carp: Topic Tracking in a News Stream. In: Proc. of the DARPA Broadcast News Transcription and Understanding Workshop (1999)

    Google Scholar 

  18. Yamron, J.P., Carp, I., Gillick, L., Lowe, S., Mulbregt, P.V.: Topic Tracking in a News Stream. In: Proc. of the DARPA Workshop (1999)

    Google Scholar 

  19. Yang, Y.: Expert Network: Effective and Efficient Learning from Human Decisions in Text Categorization and Retrieval. In: Proc. of the ACM SIGIR-1994, pp. 13–22 (1994)

    Google Scholar 

  20. Yang, Y., Pierce, T., Carbonell, J.: A Study on Retrospective and On-Line Event Detection. In: Proc. of the ACM SIGIR-1998, pp. 28–36 (1998)

    Google Scholar 

  21. Yang, Y., Ault, T., Pierce, T., Lattimer, C.W.: Improving Text Categorization Methods for Event Tracking. In: Proc. of the ACM SIGIR-2000, pp. 65–72 (2000)

    Google Scholar 

  22. Vapnik, V.: The Nature of Statistical Learning Theory. Springer, Heidelberg (1995)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Fukumoto, F., Yamaji, Y. (2005). Topic Tracking Based on Linguistic Features. In: Dale, R., Wong, KF., Su, J., Kwong, O.Y. (eds) Natural Language Processing – IJCNLP 2005. IJCNLP 2005. Lecture Notes in Computer Science(), vol 3651. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11562214_2

Download citation

  • DOI: https://doi.org/10.1007/11562214_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-29172-5

  • Online ISBN: 978-3-540-31724-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics