Abstract
For centuries, classification has been used to provide context and direction in any aspect of human knowledge. Standard machine learning techniques like support vector machines and related large margin methods have been successfully applied for this task. Unfortunately, automatic classifiers often conduct misclassifications. Folksonomy, a new manual classification scheme based on tagging efforts of users with freely chosen keywords can effective resolve this problem. In folksonomy, a user attaches tags to an item for their own classification, and they reflect many one’s viewpoints. Since tags are chosen from users’ vocabulary and contain many one’s viewpoints, classification results are easy to understand for ordinary users. Even though the scalability of folksonomy is much higher than the other manual classification schemes, the method cannot deal with tremendous number of items such as whole weblog articles on the Internet. For the purpose of solving this problem, we propose a new classification method FSVMC (folisonomy and support vector machine classifier). The FSVMC uses support vector machines as a Tag-agent which is a program to determine whether a particular tag should be attached to a weblog page and Folksonomy dedicates to categorize the weblog articles. In addition, we propose a method to create a candidate tag database which is a list of tags that may be attached to weblog pages. Experimental results indicate our method is more flexible and effective than traditional methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Apte, C., Damerau, F., Weiss, S.M.: Text mining with decision trees and decision rule. In: Proceeding of the automated learning and discovery conference, Carnegie-Mellon University, pp. 99–103 (1998)
Gunn, S.R.: Support vector machines for classification and regression. ISIS technical report, 31–36. Image speech and intelligent systems group of University of Southampton (1998)
Tan, S.: Neighbor-weighted k-nearest neighbor for unbalanced text corpus. Expert Systems with Applications, 1–5 (2005)
Salton, G., McGill, M.J.: Introduction to modern information retrieval, pp. 13–17. McGraw-Hill, New York (1983)
Schapire, R.E., Singer, Y.: Boostexter: A boosting-based system for text categorization. Machine Learning, 135–168 (2000)
Avesani, P., et al.: Learning contextualised weblog topics. In: WWW 2005 Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, pp. 20–33 (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Wang, X., Bai, R., Liao, J. (2007). Chinese Weblog Pages Classification Based on Folksonomy and Support Vector Machines. In: Gorodetsky, V., Zhang, C., Skormin, V.A., Cao, L. (eds) Autonomous Intelligent Systems: Multi-Agents and Data Mining. AIS-ADM 2007. Lecture Notes in Computer Science(), vol 4476. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72839-9_27
Download citation
DOI: https://doi.org/10.1007/978-3-540-72839-9_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-72838-2
Online ISBN: 978-3-540-72839-9
eBook Packages: Computer ScienceComputer Science (R0)