A Biased Support Vector Machine Approach to Web Filtering

Du, A-Ning; Fang, Bin-Xing; Li, Bin

doi:10.1007/11551188_39

A-Ning Du²⁰,
Bin-Xing Fang²⁰ &
Bin Li²⁰

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3686))

Included in the following conference series:

International Conference on Pattern Recognition and Image Analysis

1824 Accesses

Abstract

Web filtering is an inductive process which automatically builds a filter by learning the description of user interest from a set of pre-assigned web pages, and uses the filter to assign unprocessed web pages. In web filtering, content similarity analysis is the core problem, the automatic-learning and relativity-analysis abilities of machine learning algorithms help solve the above problems and make ML useful in web filtering. While in practical applications, different filtering task implies different userinterest and thus implies different filtering result. This work studies how to adjust the web filtering results to be more fit for the user interest. The web filtering result are divided into three categories: relative pages, similar pages and homologous pages according to different user interest. A Biased Support Vector Machine (BSVM) algorithm, which imports a stimulant function, uses training examples distribution n ₊/n _− − and a user-adaptable parameter k to deal imbalancedly different classes of the pre-assigned pages, is introduced to adjust the filtering result to be best fit for the user interest. Experiments show that BSVM can greatly improve the web filtering performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Belkin, N.J., Croft, W.B.: Information Filtering and Information Retrieval: Two Sides of the Same Coin? Communications of the ACM 35, 29–38 (1992)
Article Google Scholar
Quinlan, J.R.: Discovering rules by induction from large collections of examples. In: Michie, D. (ed.) Expert Systems in the Micro-Electronic Age, pp. 168–201. Edinburgh University Press, Edinburgh (1979)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Google Scholar
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Chid Apte, F.D., Weiss, S.: Text miningwith decision rules and decision trees. In: Proceedings of the Conference on Automated Learning and Discovery, CMU (1998)
Google Scholar
Clark, P., Niblett, T.: The cn2 induction algorithm. Mach. Learn. 3, 261–283 (1989)
Google Scholar
McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI 1998 Workshop on Learning for Text Categorization (1998)
Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the European Conference on Machine Learning, pp. 137–142. Springer, Berlin (1998)
Google Scholar
Du, A., Fang, B.: Comparison of maching learning algorithms in chinese web filtering. In: Proceedings of The third International Conference on Machine Learning and Cybernetics, Shanghai,China, pp. 2521–2526 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Research Center of Computer Network and Information Security Technology, Harbin Institute of Technology, Harbin, 150001, People’s Republic of China
A-Ning Du, Bin-Xing Fang & Bin Li

Authors

A-Ning Du
View author publications
You can also search for this author in PubMed Google Scholar
Bin-Xing Fang
View author publications
You can also search for this author in PubMed Google Scholar
Bin Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research School of Infomatics, Loughborough, UK
Sameer Singh
ATR Lab, Research School of Informatics, University of Loughborough, Loughborough, UK
Maneesha Singh
IBM Corporation, 1133 Wetchester Avenue, White Plains, 10604, New York, United States
Chid Apte
Institute of Computer Vision and applied Computer Sciences, IBaI, Germany
Petra Perner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Du, AN., Fang, BX., Li, B. (2005). A Biased Support Vector Machine Approach to Web Filtering. In: Singh, S., Singh, M., Apte, C., Perner, P. (eds) Pattern Recognition and Data Mining. ICAPR 2005. Lecture Notes in Computer Science, vol 3686. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11551188_39

Download citation

DOI: https://doi.org/10.1007/11551188_39
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28757-5
Online ISBN: 978-3-540-28758-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics