Skip to main content

Automatic Web Page Classification Using Visual Content for Subjective and Functional Variables

  • Conference paper
  • First Online:
Web Information Systems and Technologies (WEBIST 2014)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 226))

Included in the following conference series:

Abstract

Automatic classification of webpages has several applications in industry: digital marketing, search engines, content filtering and many more. Traditionally this classification has been done using only the textual information of webpages, which includes the html code, tags, title and more lately also the url. The aim of this paper is to prove that for some subjective variables, although very important to the applications mentioned, the visual information of webpages as they are rendered by the browser has extremely rich content for the classification task. The variables studied are the aesthetic value (whether pages are beautiful or ugly) and the design recency of them (whether pages are old fashioned or look modern). We then proved that automatic classifications that rely only on the visual look and feel can achieve very high accuracies. As we used several low-level and mid-level features and studied several criteria for selection and classification, our classifiers were able to improve one step further the stat of the art. Finally, we applied this framework to classify webpages in their topic (content aware) and also to classify whether pages are a blog or not (functional aware).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://addons.mozilla.org/pt-pt/firefox/addon/fireshot/.

  2. 2.

    http://archive.org/web/web.php.

  3. 3.

    http://www.alexa.com.

References

  1. Videira, A., Goncalves, N.: Automatic web page classification using visual content. In: 10th International Conference on Web Information Systems and Technologies (WEBIST 2014) (2014)

    Google Scholar 

  2. Selamat, A., Omatu, S.: Web page feature selection and classification using neural networks. Inf. Sci. Inf. Comput. Sci. 158, 69–88 (2004)

    MathSciNet  Google Scholar 

  3. Chen, R.C., Hsieh, C.H.: Web page classification based on a support vector machine using a weighted vote schema. Expert. Syst. Appl. 2(31), 427–435 (2006)

    Google Scholar 

  4. de Boer, V., van Someren, M., Lupascu, T.: Classifying web pages with visual features. In: 6th International Conference on Web Information Systems and Technologies (WEBIST 2010), pp. 245–252 (2010)

    Google Scholar 

  5. Asirvatham, A.P., Ravi, K.K.: Web page classification based on document structure. In: IEEE National Convention (2001)

    Google Scholar 

  6. Kovacevic, M., Diligenti, M., Gori, M., Milutinovic, V.: Visual adjacency multigraphs, a novel approach for a web page classification. In: Workshop on Statistical Approaches to Web Mining (SAWM), pp. 38–49 (2004)

    Google Scholar 

  7. Tamura, H., Mori, S., Yamawaki, T.: Textural features corresponding to visual perception. IEEE Trans. Syst. Man Cybern. 8, 460–472 (1978)

    Article  Google Scholar 

  8. Deselaers, T.: Features for Image Retrieval (thesis). RWTH Aachen University (2003)

    Google Scholar 

  9. Zhang, D., Wong, A., Indrawan, M., Lu, G.: Content-based image retrieval using Gabor texture features. In: IEEE Pacific-Rim Conference on Multimedia, University of Sydney, Australia (2000)

    Google Scholar 

  10. Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 2(60), 91–110 (2004)

    Article  Google Scholar 

  11. Jialu Liu: Image Retrieval based on Bag-of-Words model (2013). arXiv preprint arXiv:1304.5168

  12. L. Andrade: The worlds ugliest websites!!! (2009). http://www.nikibrown.com/designoblog/2009/03/03/theworlds-ugliest-websites. Acessed October 2009

  13. Matthew Shuey: 10-worst-websites-for-2013 (2013). http://www.globalwebfx.com/10-worst-websites-for-2013/

  14. Vicent Flanders: Worst Websites of the Year 2012–2005 (2012).http://www.webpagesthatsuck.com/worst-websites-of-the-year.html

  15. Crazyleafdesign.com: Most beautiful and inspirational website designs (2013). http://www.crazyleafdesign.com/blog/

  16. waxy.org: Den.net and the top 100 websites of 1999 (2010). http://waxy.org/2010/02/dennet_and_the_top_100_web-sites_of_1999/

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nuno Goncalves .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Goncalves, N., Videira, A. (2015). Automatic Web Page Classification Using Visual Content for Subjective and Functional Variables. In: Monfort, V., Krempels, KH. (eds) Web Information Systems and Technologies. WEBIST 2014. Lecture Notes in Business Information Processing, vol 226. Springer, Cham. https://doi.org/10.1007/978-3-319-27030-2_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-27030-2_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-27029-6

  • Online ISBN: 978-3-319-27030-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics