Skip to main content

Semantics-Based Event-Driven Web News Classification

  • Conference paper
Frontiers of High Performance Computing and Networking ISPA 2007 Workshops (ISPA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4743))

  • 923 Accesses

Abstract

Web news classification is an unsupervised learning task, which is often accomplished by clustering methods. In traditional works, documents are first represented using the vector space model. Each vector generally consists of the keywords or phrases important to the document. Then vectors are clustered together according to some (dis)similarity measure. Such methods often take no or little semantic information into account. In this paper, we present a semantics-based event-driven approach. Event is represented by 3-tuple and document is associated with set of candidate events. These event sets are classified according to semantic dissimilarity. The preliminary experiment on Chinese web news classification shows that the proposed approach is promising.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Yang, Y., Lin, X.: A re-examination of text categorization methods. In: SIGIR, pp. 42–49 (1999)

    Google Scholar 

  2. Han, E., Karypis, G.: Centroid-Based Document Classification Analysis & Experimental Result. In: PKDD (2000)

    Google Scholar 

  3. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys. 34(1), 1–47 (2002)

    Article  Google Scholar 

  4. White, M., Cardie, C.: Selecting Sentences for Multidocument Summaries using Randomized Local Search. In: Proc. Of the Workshop on Automatic Summarization (including DUC 2002). Association for Computational Linguistics, Philadelphia, US, pp. 9–18 (2002)

    Google Scholar 

  5. Jing, H., Radev, D.R., Budzikowska, M.: Centroid-based summarization of multiple documents: sentence extaction, utility-based evaluation and user studies. In: Proceedings of ANLP/NAACL-2000 (2000)

    Google Scholar 

  6. Nomoto, T., Matsumoto, Y.: A New Approach to Unsupervised Text Summarization. In: Proc. Of the SIGIR 2001, New Orleans, Louisiana, USA, pp. 26–34 (2001)

    Google Scholar 

  7. Mladenic, D.: Text learning and related intelligent agents. IEEE Expert (July 1999)

    Google Scholar 

  8. Korfhage, R.R.: Information Storage and Retrieval. Wiley, Chichester (1997)

    Google Scholar 

  9. Dagan, I., Lee, L., Pereira, F.: Similarity-based methods for word sense disambiguation. In: Proc. Of the 32nd Conference of the Association of Computational Linguistics, pp. 56–63 (1997)

    Google Scholar 

  10. Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study: Final report. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop (1998)

    Google Scholar 

  11. Yang, Y., Pierce, T., Carbonell, J.: A study on retrospective and on-line event detection. In: Proceedings of SIGIR 1998 (1998)

    Google Scholar 

  12. Allan, J., Papka, R., Lavrenko, V.: On-line New Event Detection and Tracking. In: Proceedings of the 21th Ann. Int. ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR 1998, pp. 37-45 (1998)

    Google Scholar 

  13. Frigui, H., Krishnapuram, R.: A robust competitive clustering algorithm with applications in computer vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(5), 450–465 (1999)

    Article  Google Scholar 

  14. Kurita, T.: An Ecient Clustering Algorithm for Region Merging. IEICE Trans. of Information and Systems E78-D(12) (1995)

    Google Scholar 

  15. Gowda, K.C., Ravi, T.V.: Agglomerative clustering of symbolic objects using the concepts of both similarity and dissimilarity. Pattern Recognition Letters 16, 647–652 (1995)

    Article  Google Scholar 

  16. Sina Sports News. From http://sports.sina.com.cn/normal/oldnews.shtml

  17. Hu, W., Zhang, D.: Cluster-Based and Brute-Correcting Grammatical Rules Learning. In: International Conference on Natural Language Processing and Knowledge Engineering Proceedings (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Parimala Thulasiraman Xubin He Tony Li Xu Mieso K. Denko Ruppa K. Thulasiram Laurence T. Yang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hu, W., Sheng, Hy. (2007). Semantics-Based Event-Driven Web News Classification. In: Thulasiraman, P., He, X., Xu, T.L., Denko, M.K., Thulasiram, R.K., Yang, L.T. (eds) Frontiers of High Performance Computing and Networking ISPA 2007 Workshops. ISPA 2007. Lecture Notes in Computer Science, vol 4743. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74767-3_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74767-3_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74766-6

  • Online ISBN: 978-3-540-74767-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics