A Rough Set Approach to Classifying Web Page Without Negative Examples

Duan, Qiguo; Miao, Duoqian; Jin, Kaimin

doi:10.1007/978-3-540-71701-0_49

Qiguo Duan¹,
Duoqian Miao¹ &
Kaimin Jin¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4426))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

1827 Accesses
5 Citations

Abstract

This paper studies the problem of building Web page classifiers using positive and unlabeled examples, and proposes a more principled technique to solving the problem based on tolerance rough set and Support Vector Machine (SVM). It uses tolerance classes to approximate concepts existed in Web pages and enrich the representation of Web pages, draws an initial approximation of negative example. It then iteratively runs SVM to build classifier which maximizes margins to progressively improve the approximation of negative example. Thus, the class boundary eventually converges to the true boundary of the positive class in the feature space. Experimental results show that the novel method outperforms existing methods significantly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Lewis, D., Ringuette, M.: A Comparison of Two Learning Algorithms for Text Categorization. In: Third annual symposium on document analysis and information retrieval, pp. 81–93 (1994)
Google Scholar
Pawlak, Z.: Rough sets: Theoretical Aspects of Reasoning about Data. Kluwer Academic Publishers, Dordrecht (1991)
MATH Google Scholar
Skowron, A., Stepaniuk, J.: Tolerance approximation spaces. Fundamenta Informaticae 27, 245–253 (1996)
MATH MathSciNet Google Scholar
Kryszkiewicz, M.: Rough set approach to incomplete information system. Information Sciences 112, 39–49 (1998)
Article MATH MathSciNet Google Scholar
Ho, T.B., Nguyen, N.B.: Nonhierarchical Document Clustering based on A Tolerance Tough Set Model. International Journal of Intelligent Systems 17, 199–212 (2002)
Article MATH Google Scholar
Lang, N.C.: A Tolerance Rough Set Approach to Clustering Web Search Results. In: Boulicaut, J.-F., et al. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 515–517. Springer, Heidelberg (2004)
Google Scholar
Liu, B., et al.: Partially Supervised Classification of Text Documents. In: ICML-02 (2002)
Google Scholar
Yu, H., Han, J., Chang, K.C.-C.: PEBL: Web Page Classification without Negative Examples. IEEE Transactions on Knowledge and Data Engineering 16(1), 70–81 (2004)
Article Google Scholar
Manevitz, L.M., Yousef, M.: One-Class SVMs for Document Classification. J. Machine Learning Research 2, 139–154 (2001)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Technology, Tongji University, Shanghai, 201804,China, The Key Laboratory of ”Embedded System and Service Computing”, Ministry of Education, Shanghai, 201804, China
Qiguo Duan, Duoqian Miao & Kaimin Jin

Authors

Qiguo Duan
View author publications
You can also search for this author in PubMed Google Scholar
Duoqian Miao
View author publications
You can also search for this author in PubMed Google Scholar
Kaimin Jin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Zhi-Hua Zhou Hang Li Qiang Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Duan, Q., Miao, D., Jin, K. (2007). A Rough Set Approach to Classifying Web Page Without Negative Examples. In: Zhou, ZH., Li, H., Yang, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71701-0_49

Download citation

DOI: https://doi.org/10.1007/978-3-540-71701-0_49
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71700-3
Online ISBN: 978-3-540-71701-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics