Skip to main content

An NLP & IR Approach to Topic Detection

  • Chapter
Topic Detection and Tracking

Part of the book series: The Information Retrieval Series ((INRE,volume 12))

Abstract

This paper presents algorithms for Chinese and English-Chinese topic detection. Named entities, other nouns and verbs are cue patterns to relate news stories describing the same event. Lexical translation and name transliteration resolve lexical differences between English and Chinese. A two-threshold scheme determines relevance (irrelevance) between a news story and a topic cluster. Lookahead information deals with ambiguous cases in clustering. The least-recently-used removal strategy models the time factor in such a way that older and unimportant terms will have no effect on clustering. Experimental results show that nouns and verbs as well as the least-recently-used removal strategy outperform other models. The performance of the named-entity-only approach decreases slightly, but it has no overhead of nouns-and-verbs approach with the least-recently-used removal strategy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 299.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 379.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 379.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Allan, James; Papka, Ron; and Lavrenko, Victor (1998) “On-line New Event Detection and Tracking,” Proceedings of the 21st Annual International ACM SIGIR Conference, Melbourne, 1998, pp. 37–45.

    Google Scholar 

  • Bian, Guo-Wei and Chen, Hsin-Hsi (2000) “Cross Language Information Access to Multilingual Collections on the Internet,” Journal of American Society for Information Science, 51(3), 2000, pp. 281–296.

    Article  Google Scholar 

  • Chen, Hsin-Hsi; Bian, Guo-Wei and Lin, Wen-Cheng (1999) “Resolving Translation  Ambiguity and Target Polysemy in Cross-Language Information Retrieval,” Proceedings of 37 th Annual Meeting of the Association for Computational Linguistics, University of Maryland, 1999, pp. 215–222.

    Google Scholar 

  • Chen, Hsin-Hsi; Ding, Yung-Wei and Tsai, Shih-Chung (1998) “Named Entity Extraction for Information Retrieval,” Computer Processing of Oriental Languages, Special Issue on Information Retrieval on Oriental Languages, 12(1), 1998, pp. 75–85.

    Google Scholar 

  • Chen, Hsin-Hsi; Ding, Yung-Wei; Tsai, Shih-Chung and Bian, Guo-Wei (1998) “Description of the NTU System Used for MET2,” Proceedings of 7 th Message Understanding Conference, Fairfax, VA, 1998, http://www.muc.saic.com/proceedings/muc_7_toc.html.

  • Chen, Hsin-Hsi and Huang, Sheng-Jie (1999) “A Summarization System for Chinese News from Multiple Sources,” Proceedings of the 4 th International Workshop on Information Retrieval with Asian Languages, 1999, Taipei, Taiwan, pp. 1–7.

    Google Scholar 

  • Chen, Hsin-Hsi; Huang, Sheng-Jie; Ding, Yung-Wei and Tsai, Shih-Chung (1998) “Proper Name Translation in Cross-Language Information Retrieval,” Proceedings of 17 th International Conference on Computational Linguistics and 36 th Annual Meeting of the Association for Computational Linguistics, Montreal, Quebec, Canada, 1998, pp. 232–236.

    Google Scholar 

  • Chen, Hsin-Hsi and Lee, Jen-Chang (1996) “Identification and Classification of Proper Nouns in Chinese Texts,” Proceedings of 16th International Conference on Computational Linguistics, Copenhagen, Denmark, 1996, pp. 222–229.

    Google Scholar 

  • Chen, Hsin-Hsi and Lin, Chuan-Jie (2000) “A Multilingual News Summarizes” Proceedings of 18th International Conference on Computational Linguistics, 2000, Saarland University, pp. 159–165.

    Google Scholar 

  • Church, K., et al. (1989) “Parsing, Word Associations and Typical Predicate-Argument Relations,” Proceedings of International Workshop on Parsing Technologies, 1989, pp. 389–398.

    Google Scholar 

  • Fellbaum, C. (1998) WordNet: An Electronic Lexical Database, MIT Press, Cambridge, Mass., 1998.

    MATH  Google Scholar 

  • Harabagiu, S. (1998) Usage of WordNet in Natural Language Processing Systems, Proceedings of the Workshop, Montreal, Quebec, 1998.

    Google Scholar 

  • Lin, Wei-Hao and Chen, Hsin-Hsi (2000) “Similarity Measure in Backward Transliteration between Different Character Sets and Its Application to CLIR,” Proceedings of 13 th Research on Computational Linguistics and Chinese Language Processing Conference, Taipei, Taiwan, pp. 97–113.

    Google Scholar 

  • Mei, J.; et al. (1982) tong2yi4ci2ci2lin2. Shanghai Dictionary Press.

    Google Scholar 

  • Rila, M. (1998) “The Use of WordNet in Information Retrieval,” Proceedings of ACL Workshop on the Usage of WordNet in Natural Language Processing Systems, 1998. pp. 31–37.

    Google Scholar 

  • Ruiz, M.; et al. (1999) “CINDOR Conceptual Interlingua Document Retrieval: TREC-8 Evaluation,” Proceedings of Eighth Text Retrieval Conference, 1999.

    Google Scholar 

  • Sproat, Richard, et al. (1994) “A Stochastic Finite-State Word-Segmentation Algorithm for Chinese,” Proceedings of 32nd Annual Meeting of the Association for Computational Linguistics, New Mexico, 1994, pp. 66–73.

    Google Scholar 

  • Yang, Yiming; Pierce, Tom; and Carbonell, Jame (1998) “A Study on Retrospective and On-Line Detection,” Proceedings of the 21st Annual International ACM SIGIR Conference, Melbourne, 1998, pp. 28–36.

    Google Scholar 

  • Zamir, Oren and Etzioni, Oren (1998) “Web Document Clustering: A Feasibility Demonstration,” Proceedings of the 21st Annual International ACM SIGIR Conference, Melbourne, 1998, pp. 46–54.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer Science+Business Media New York

About this chapter

Cite this chapter

Chen, HH., Ku, LW. (2002). An NLP & IR Approach to Topic Detection. In: Allan, J. (eds) Topic Detection and Tracking. The Information Retrieval Series, vol 12. Springer, Boston, MA. https://doi.org/10.1007/978-1-4615-0933-2_12

Download citation

  • DOI: https://doi.org/10.1007/978-1-4615-0933-2_12

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-5311-9

  • Online ISBN: 978-1-4615-0933-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics