Skip to main content

Hierarchical Training of Multiple SVMs for Personalized Web Filtering

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 7458)

Abstract

The abundance of information published on the Internet makes filtering of hazardous Web pages a difficult yet important task. Supervised learning methods such as Support Vector Machines can be used to identify hazardous Web content. However, scalability is a big challenge, especially if we have to train multiple classifiers, since different policies exist on what kind of information is hazardous. We therefore propose a transfer learning approach called Hierarchical Training for Multiple SVMs. HTMSVM identifies common data among similar training sets and trains the common data sets first, in order to obtain initial solutions. These initial solutions then reduce the time for training the individual training sets without influencing classification accuracy. In an experiment, in which we trained five Web content filters with 80% of common and 20% of inconsistently labeled training examples, HTMSVM was able to predict hazardous Web pages with a training time of only 26% to 41% compared to LibSVM, but the same classification accuracy (more than 91%).

Keywords

  • Hazardous Web content
  • Hierarchical training
  • Transfer learning
  • SVM
  • Machine learning

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-642-32695-0_5
  • Chapter length: 13 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   109.00
Price excludes VAT (USA)
  • ISBN: 978-3-642-32695-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   143.00
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ikeda, K., Yanagihara, T., Hattori, G., Matsumoto, K., Takisima, Y.: Hazardous Document Detection Based on Dependency Relations and Thesaurus. In: Li, J. (ed.) AI 2010. LNCS, vol. 6464, pp. 455–465. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  2. Nguyen, D.D., Matsumoto, K., Takishima, Y., Hashimoto, K.: Condensed vector machines: Learning fast machine for large data. IEEE Transactions on Neural Networks 21(12), 1903–1914 (2010)

    CrossRef  Google Scholar 

  3. Cortes, C., Vapnik, V.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)

    MATH  Google Scholar 

  4. Platt, J.C.: Sequential minimal optimization: A fast algorithm for training support vector machines. Technical report, Advances in Kernel Methods - Support Vector Learning (1998)

    Google Scholar 

  5. Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 27:1–27:27 (2011), Software, http://www.csie.ntu.edu.tw/~cjlin/libsvm

  6. Menon, A.K.: Large-scale support vector machines: Algorithms and theory, research exam. Technical report, University of California San Diego (2009)

    Google Scholar 

  7. Cervantes, J., Li, X., Yu, W.: Svm classification for large data sets by considering models of classes distribution. In: Mexican International Conference on Artificial Intelligence (MIKAI), pp. 51–60 (2007)

    Google Scholar 

  8. Abu-Mostafa, Y.S.: Learning from hints in neural networks. Journal of Complexity 6(2), 192–198 (1990)

    MathSciNet  MATH  CrossRef  Google Scholar 

  9. Caruana, R.: Multitask learning: A knowledge-based source of inductive bias. In: Proceedings of the Tenth International Conference on Machine Learning, pp. 41–48 (1993)

    Google Scholar 

  10. Thrun, S.: Is learning the n-th thing any easier than learning the first? In: Advances in Neural Information Processing Systems, pp. 640–646 (1996)

    Google Scholar 

  11. Baxter, J.: A model of inductive bias learning. Journal of Artificial Intelligence Research 12, 149–198 (2000)

    MathSciNet  MATH  Google Scholar 

  12. Arnold, A., Nallapati, R., Cohen, W.W.: A comparative study of methods for transductive transfer learning. In: Proceedings of the Seventh IEEE International Conference on Data Mining Workshops, pp. 77–82 (2007)

    Google Scholar 

  13. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Transactions on Knowledge and Data Engineering 22(10), 1345–1359 (2010)

    CrossRef  Google Scholar 

  14. Bickel, S.: Ecml-pkdd discovery challenge 2006 overview. In: ECML-PKDD Discovery Challenge Workshop, pp. 1–9 (2008)

    Google Scholar 

  15. Cauwenberghs, G., Poggio, T.: Incremental and decremental support vector machine learning. In: Advances in Neuronal Information Processing Systems, vol. 13, pp. 409–415 (2000)

    Google Scholar 

  16. Ruping, S.: Incremental learning with support vector machines. In: IEEE International Conference on Data Mining, pp. 641–642 (2001)

    Google Scholar 

  17. Shilton, A., Palaniswami, M., Ralph, D., Tsoi, A.C.: Incremental training of support vector machines. IEEE Transactions on Neural Networks 16(1), 114–131 (2005)

    CrossRef  Google Scholar 

  18. Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of Data Warehousing and Mining, 1–13 (2007)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Erdmann, M., Nguyen, D.D., Takeyoshi, T., Hattori, G., Matsumoto, K., Ono, C. (2012). Hierarchical Training of Multiple SVMs for Personalized Web Filtering. In: Anthony, P., Ishizuka, M., Lukose, D. (eds) PRICAI 2012: Trends in Artificial Intelligence. PRICAI 2012. Lecture Notes in Computer Science(), vol 7458. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32695-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32695-0_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32694-3

  • Online ISBN: 978-3-642-32695-0

  • eBook Packages: Computer ScienceComputer Science (R0)