Abstract
Methods of Web pages content monitoring come increasingly in the interest of law enforcement services, searching for Web pages contain symptoms of criminal activities. The information can be hidden from indexing systems by embedding in multimedia materials. Finding such materials is a large challenge of contemporary criminal analysis. A concept of integrating a large scale Web crawling system with a multimedia materials analysis algorithms is described in this paper. The Web crawling system, which is processing a few hundred pages per second, provides a mechanism for plugin inclusion. A plugin can analyze processed resources and detect references to multimedia materials. The references are passed to a component, which implements an algorithm for image or video analysis. Several approaches to the integration are described and some exemplary implementation assumptions are presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Opalinski, A., Turek, W.: Information retrieval and identity analysis. In: Metody Sztucznej Inteligencji w Dzialaniach na Rzecz Bezpieczenstwa Publicznego, pp. 173–194 (2009); ISBN: 978-83-7464-268-2
Miller, R.C., Bharat, K.: SPHINX: A Framework for Creating Personal, Site-Specific Web Crawlers. In: Proceedings of WWW 2007, Brisbane Australia (1998)
Shoberg, J.: Building Search Applications with Lucine and Nutch. APress (2006); ISBN: 978-1590596876
Sigursson, K.: Incremental crawling with Heritrix. In: Proceedings of the 5th International Web Archiving Workshop (2005)
Marrs, T., Davis, S.: JBoss At Work: A Practical Guide. O’Reilly, Sebastopol (2005); ISBN: 0596007345
Alpert, J., Hajaj, N.: We knew the web was big... The Official Google Blog (2008)
Korus, P., Glowacz, A.: A system for automatic face indexing. Przeglad Telekomunikacyjny, Wiadomosci Telekomunikacyjne 81(8-9), 1304–1312 (2008); ISSN 1230-3496
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Turek, W., Opalinski, A., Kisiel-Dorohinicki, M. (2011). Extensible Web Crawler – Towards Multimedia Material Analysis. In: Dziech, A., Czyżewski, A. (eds) Multimedia Communications, Services and Security. MCSS 2011. Communications in Computer and Information Science, vol 149. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21512-4_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-21512-4_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21511-7
Online ISBN: 978-3-642-21512-4
eBook Packages: Computer ScienceComputer Science (R0)