Skip to main content

Benchmarking the Performance of Support Vector Machines in Classifying Web Pages

  • Conference paper
Knowledge Technology (KTW 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 295))

Included in the following conference series:

  • 998 Accesses

Abstract

In this paper, we benchmark the efficiency of support vector machines (SVMs), in terms of classification accuracy and the classification speed with the other two popular classification algorithms, which are decision tree and Naïve Bayes. We conduct the study on the 4-University data set, using 4-fold cross validation. The empirical results indicate that both SVMs and Naïve Bayes achieve comparative results in the average precision and recall while decision tree ID3 algorithm outperforms the rest in the average accuracy despite. Nevertheless, ID3 consumes the longest time in generating the classification model as well as classifying the web pages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Han, J., Kamber, M.: Data Mining Concepts and Techniques, 2nd edn. Morgan Kaufmann Publishers, Elsevier (2006)

    Google Scholar 

  2. Kosala, R., Blockeel, H.: Web Mining Research: A Survey. ACM SIGKDD Explorations 2(1), 1–15 (2000)

    Article  Google Scholar 

  3. Liu, B.: Information Retrieval and Web Search. In: Liu, B. (ed.) Web Data Mining, Exploring Hyperlinks, Contents and Usage Data, pp. 18–236. Springer, Heidelberg (2007)

    Google Scholar 

  4. Liu, B., Chang, K.C.C.: Editorial: Special Issue on Web Content Mining. ACM SIGKDD Explorations 6(2), 1–4 (2004)

    Article  MATH  Google Scholar 

  5. Peleato, R.A., Chappelier, J.C., Rajman, M.: Using Information Extraction to Classify Newspapers Advertisements. In: Proc of the 5th International Conference on the Statistical Analysis of Textual Data, Lausanne, Switzerland (2000)

    Google Scholar 

  6. Qi, X., Davison, B.: Web Page Classification: Features and Algorithms. ACM Computing Surveys 41(2), 1–31 (2009)

    Article  Google Scholar 

  7. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco (1993)

    Google Scholar 

  8. Shanks, V., Williams, H.E.: Fast Categorisation of Large Document Collections. In: Proc of the 8th International Symposium on String Processing and Information Retrieval, pp. 194–204 (2001)

    Google Scholar 

  9. Shen, D., Chen, Z., Yang, Q., Zeng, H.J., Zhang, B., Lu, Y., Ma, W.Y.: Web-page Classification through Summarization. In: Proc of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 242–246 (2004)

    Google Scholar 

  10. Soon, L.-K., Hwang, K.-B., Lee, S.L.: An Empirical Study on Harmonizing Classification Precision using IE Patterns. In: Proc. of the 2nd International Conference on Software Engineering and Data Mining, Chengdu, China (2010)

    Google Scholar 

  11. Sun, A.-X., Lim, E.-P., Ng, W.-K.: Web Classification using Support Vector Machine. In: Proc of the 4th Workshop on Web Information and Data Management, pp. 96–99 (2002)

    Google Scholar 

  12. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)

    MATH  Google Scholar 

  13. Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Journal of Information Retrieval 1(1-2), 69–90 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wong, WP., Chan, KX., Soon, LK. (2012). Benchmarking the Performance of Support Vector Machines in Classifying Web Pages. In: Lukose, D., Ahmad, A.R., Suliman, A. (eds) Knowledge Technology. KTW 2011. Communications in Computer and Information Science, vol 295. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-32826-8_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-32826-8_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-32825-1

  • Online ISBN: 978-3-642-32826-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics