Abstract
Web news classification is an unsupervised learning task, which is often accomplished by clustering methods. In traditional works, documents are first represented using the vector space model. Each vector generally consists of the keywords or phrases important to the document. Then vectors are clustered together according to some (dis)similarity measure. Such methods often take no or little semantic information into account. In this paper, we present a semantics-based event-driven approach. Event is represented by 3-tuple and document is associated with set of candidate events. These event sets are classified according to semantic dissimilarity. The preliminary experiment on Chinese web news classification shows that the proposed approach is promising.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Yang, Y., Lin, X.: A re-examination of text categorization methods. In: SIGIR, pp. 42–49 (1999)
Han, E., Karypis, G.: Centroid-Based Document Classification Analysis & Experimental Result. In: PKDD (2000)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys. 34(1), 1–47 (2002)
White, M., Cardie, C.: Selecting Sentences for Multidocument Summaries using Randomized Local Search. In: Proc. Of the Workshop on Automatic Summarization (including DUC 2002). Association for Computational Linguistics, Philadelphia, US, pp. 9–18 (2002)
Jing, H., Radev, D.R., Budzikowska, M.: Centroid-based summarization of multiple documents: sentence extaction, utility-based evaluation and user studies. In: Proceedings of ANLP/NAACL-2000 (2000)
Nomoto, T., Matsumoto, Y.: A New Approach to Unsupervised Text Summarization. In: Proc. Of the SIGIR 2001, New Orleans, Louisiana, USA, pp. 26–34 (2001)
Mladenic, D.: Text learning and related intelligent agents. IEEE Expert (July 1999)
Korfhage, R.R.: Information Storage and Retrieval. Wiley, Chichester (1997)
Dagan, I., Lee, L., Pereira, F.: Similarity-based methods for word sense disambiguation. In: Proc. Of the 32nd Conference of the Association of Computational Linguistics, pp. 56–63 (1997)
Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study: Final report. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop (1998)
Yang, Y., Pierce, T., Carbonell, J.: A study on retrospective and on-line event detection. In: Proceedings of SIGIR 1998 (1998)
Allan, J., Papka, R., Lavrenko, V.: On-line New Event Detection and Tracking. In: Proceedings of the 21th Ann. Int. ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR 1998, pp. 37-45 (1998)
Frigui, H., Krishnapuram, R.: A robust competitive clustering algorithm with applications in computer vision. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(5), 450–465 (1999)
Kurita, T.: An Ecient Clustering Algorithm for Region Merging. IEICE Trans. of Information and Systems E78-D(12) (1995)
Gowda, K.C., Ravi, T.V.: Agglomerative clustering of symbolic objects using the concepts of both similarity and dissimilarity. Pattern Recognition Letters 16, 647–652 (1995)
Sina Sports News. From http://sports.sina.com.cn/normal/oldnews.shtml
Hu, W., Zhang, D.: Cluster-Based and Brute-Correcting Grammatical Rules Learning. In: International Conference on Natural Language Processing and Knowledge Engineering Proceedings (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Hu, W., Sheng, Hy. (2007). Semantics-Based Event-Driven Web News Classification. In: Thulasiraman, P., He, X., Xu, T.L., Denko, M.K., Thulasiram, R.K., Yang, L.T. (eds) Frontiers of High Performance Computing and Networking ISPA 2007 Workshops. ISPA 2007. Lecture Notes in Computer Science, vol 4743. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74767-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-74767-3_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74766-6
Online ISBN: 978-3-540-74767-3
eBook Packages: Computer ScienceComputer Science (R0)