Skip to main content

Web Scraping of Online Newspapers via Image Matching

  • Conference paper
  • First Online:
Progress in Industrial Mathematics at ECMI 2014 (ECMI 2014)

Part of the book series: Mathematics in Industry ((TECMI,volume 22))

Included in the following conference series:

  • 1188 Accesses

Abstract

Reading is an activity which takes place widely on the web: almost all newspapers have his own digital version on the internet and there are even a lot of magazines only on the web. In such a scenario, Computer Vision can offer a useful set of tools that can help web editors to improve the quality of the provided service. One of these tools is here presented: given a webpage of a newspaper or journal, the proposed framework localizes news items remotely clicked by users, giving the bounding box of the content of an article in its relative homepage. The tool is hence able to track an article in the page in which is contained at any time during the day: such an information is very useful for web editors to understand the trend of the published items and to rearrange the contents of the homepage accordingly.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Hedley, J.: Jsoup java html parser. http://jsoup.org.

  2. 2.

    Bradski, G.: The OpenCV Library. Dr. Dobb’s Journal of Software Tools (2000).

References

  1. Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features. Comput. Vis. Image Underst. 110(3), 346–359 (2008)

    Article  Google Scholar 

  2. Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: binary robust independent elementary features. In: Computer Vision—ECCV 2010, pp. 778–792. Springer, Berlin (2010)

    Google Scholar 

  3. Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)

    Article  Google Scholar 

  4. Leutenegger, S., Chli, M., Siegwart, R.Y.: Brisk: Binary robust invariant scalable keypoints. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2548–2555. IEEE, New York (2011)

    Google Scholar 

  5. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)

    Article  Google Scholar 

  6. Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP (1), pp. 331–340 (2009)

    Google Scholar 

  7. Rosten, E., Porter, R., Drummond, T.: Faster and better: a machine learning approach to corner detection. IEEE Trans. Pattern Anal. Mach. Intell. 32, 105–119 (2010). doi:10.1109/TPAMI.2008.275

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Moltisanti .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Moltisanti, D., Farinella, G.M., Battiato, S., Giuffrida, G. (2016). Web Scraping of Online Newspapers via Image Matching. In: Russo, G., Capasso, V., Nicosia, G., Romano, V. (eds) Progress in Industrial Mathematics at ECMI 2014. ECMI 2014. Mathematics in Industry(), vol 22. Springer, Cham. https://doi.org/10.1007/978-3-319-23413-7_4

Download citation

Publish with us

Policies and ethics