Abstract
Reading is an activity which takes place widely on the web: almost all newspapers have his own digital version on the internet and there are even a lot of magazines only on the web. In such a scenario, Computer Vision can offer a useful set of tools that can help web editors to improve the quality of the provided service. One of these tools is here presented: given a webpage of a newspaper or journal, the proposed framework localizes news items remotely clicked by users, giving the bounding box of the content of an article in its relative homepage. The tool is hence able to track an article in the page in which is contained at any time during the day: such an information is very useful for web editors to understand the trend of the published items and to rearrange the contents of the homepage accordingly.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Hedley, J.: Jsoup java html parser. http://jsoup.org.
- 2.
Bradski, G.: The OpenCV Library. Dr. Dobb’s Journal of Software Tools (2000).
References
Bay, H., Ess, A., Tuytelaars, T., Van Gool, L.: Speeded-up robust features. Comput. Vis. Image Underst. 110(3), 346–359 (2008)
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: Brief: binary robust independent elementary features. In: Computer Vision—ECCV 2010, pp. 778–792. Springer, Berlin (2010)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Leutenegger, S., Chli, M., Siegwart, R.Y.: Brisk: Binary robust invariant scalable keypoints. In: 2011 IEEE International Conference on Computer Vision (ICCV), pp. 2548–2555. IEEE, New York (2011)
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60(2), 91–110 (2004)
Muja, M., Lowe, D.G.: Fast approximate nearest neighbors with automatic algorithm configuration. In: VISAPP (1), pp. 331–340 (2009)
Rosten, E., Porter, R., Drummond, T.: Faster and better: a machine learning approach to corner detection. IEEE Trans. Pattern Anal. Mach. Intell. 32, 105–119 (2010). doi:10.1109/TPAMI.2008.275
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Moltisanti, D., Farinella, G.M., Battiato, S., Giuffrida, G. (2016). Web Scraping of Online Newspapers via Image Matching. In: Russo, G., Capasso, V., Nicosia, G., Romano, V. (eds) Progress in Industrial Mathematics at ECMI 2014. ECMI 2014. Mathematics in Industry(), vol 22. Springer, Cham. https://doi.org/10.1007/978-3-319-23413-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-23413-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23412-0
Online ISBN: 978-3-319-23413-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)