Abstract
This paper studies the problem of building Web page classifiers using positive and unlabeled examples, and proposes a more principled technique to solving the problem based on tolerance rough set and Support Vector Machine (SVM). It uses tolerance classes to approximate concepts existed in Web pages and enrich the representation of Web pages, draws an initial approximation of negative example. It then iteratively runs SVM to build classifier which maximizes margins to progressively improve the approximation of negative example. Thus, the class boundary eventually converges to the true boundary of the positive class in the feature space. Experimental results show that the novel method outperforms existing methods significantly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Lewis, D., Ringuette, M.: A Comparison of Two Learning Algorithms for Text Categorization. In: Third annual symposium on document analysis and information retrieval, pp. 81–93 (1994)
Pawlak, Z.: Rough sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)
Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundamenta Informaticae 27, 245–253 (1996)
Kryszkiewicz, M.: Rough set approach to incomplete information system. Information Sciences 112, 39–49 (1998)
Ho, T.B., Nguyen, N.B.: Nonhierarchical Document Clustering based on A Tolerance Tough Set Model. International Journal of Intelligent Systems 17, 199–212 (2002)
Lang, N.C.: A Tolerance Rough Set Approach to Clustering Web Search Results. In: Boulicaut, J.-F., et al. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 515–517. Springer, Heidelberg (2004)
Liu, B., et al.: Partially Supervised Classification of Text Documents. In: ICML-02 (2002)
Yu, H., Han, J., Chang, K.C.-C.: PEBL: Web Page Classification without Negative Examples. IEEE Transactions on Knowledge and Data Engineering 16(1), 70–81 (2004)
Manevitz, L.M., Yousef, M.: One-Class SVMs for Document Classification. J. Machine Learning Research 2, 139–154 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Duan, Q., Miao, D., Jin, K. (2007). A Rough Set Approach to Classifying Web Page Without Negative Examples. In: Zhou, ZH., Li, H., Yang, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71701-0_49
Download citation
DOI: https://doi.org/10.1007/978-3-540-71701-0_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71700-3
Online ISBN: 978-3-540-71701-0
eBook Packages: Computer ScienceComputer Science (R0)