Abstract
Over the years, internet has become the major source of security threat to computer systems. With the number of people browsing internet increasing exponentially in the last couple of years, browser based attacks have become the preferred means of infecting a computer system. These browser based attacks, known as ‘Drive-by Download’ attacks, inject malicious JavaScript from the server hosting the malicious web application to the browser. Since, the numbers of malicious websites launching such attacks have increased in the past few years; it has become critical to detect them. Typically, search for malicious web pages involves three steps- crawling URLs on the internet, using fast analysis filters to reject benign pages, and then running complex but slow detailed analysis (using Honey Clients) on the filtered list. While effective, these techniques consume substantial time and computing resources. This limitation can be overcome by designing a crawler which can seek more malicious sites than benign sites, thus, increasing the “toxicity” of the URLs collected in the first step. In this paper, we propose a focused web crawler, named “MalCrawler”, which has been designed to crawl and search malicious websites efficiently. This crawler, when compared to a generic crawler, will not only seek more malicious sites than benign sites, but will also handle cloaking, entanglement and AJAX content in malicious sites. MalCrawler, designed, developed and tested, as part of the scope of this paper, proved to be more efficient than generic crawlers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Seed refers to the initial set of URLs from where crawl starts.
References
Symantec Corporation: Internet Security Threat Report 2016. Symantec (2016). http://www.symantec.com
Jayasinghe, G.K., Culpepper, J.S., Bertok, P.: Efficient and effective realtime prediction of drive-by download attacks. J. Netw. Comput. Appl. 38, 135–149 (2014)
Cao, Y., Pan, X., Chen, Y., Zhuge, J.: JShield: towards real-time and vulnerability-based detection of polluted drive-by download attacks. In: Proceedings of the 30th Annual Computer Security Applications Conference, pp. 466–475 (2014)
Sarwade, S., Patil, P.D.D.: Document-based and URL-based features for automatic classification of cross-site scripting in web pages. IOSR J. Eng. 3, 1–10 (2013)
Invernizzi, L., Benvenuti, S., Cova, M., Kruegel, C., Vigna, G.: EVILSEED : a guided approach to finding malicious web pages. In: IEEE Symposium on Security and Privacy (SP), pp. 428–442 (2012)
Canali, D., Vigna, G., Kruegel, C.: Prophiler : a fast filter for the large-scale detection of malicious web pages. In: Proceeding of 20th International Conference on World Wide Web, pp. 197–206 (2011)
Rohit, P.S., Krishnaveni, R.: Deep malicious website detection. Int. J. Comput. Sci. Mob. Comput. 2(4), 517–522 (2013)
Provos, N., Mavrommatis, P., Rajab, M.A., Monrose, F.: All your iFRAMEs point to us. In: USENIX Security Symposium (2008)
Hou, Y.-T., Chang, Y., Chen, T., Laih, C.-S., Chen, C.-M.: Malicious web content detection by machine learning. Expert Syst. Appl. 37(1), 55–60 (2010)
Pham, K., Santos, A., Freire, J.: Understanding website behavior based on user agent. In: Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM (2016)
Likarish, P., Jung, E.: A targeted web crawling for building malicious javascript collection. In: Proceeding of the ACM DSMM, vol. 21, issue 4, pp. 23–26 (2009)
Jo, H.Y.I., Jung, E.: Interactive website filter for safe web browsing. J. Inf. Sci. 131, 115–131 (2013)
Qassrawi, M.T., Zhang, H.: Detecting malicious web servers with honeyclients. Directory Open Access J. (DOAJ) 6(1), 145–152 (2011)
Ikinci, A., Holz, T., Freiling, F.C.: Monkey-spider: detecting malicious websites with low-interaction honeyclients. Sicherheit, vol. 8 (2008)
JSoup- JSoup Java Library. http://www.jsoup.org
N.Z. Univeristy of Waikato, WEKA. http://www.cs.waikato.ac.nz/ml/weka
HtmlUnit. http://htmlunit.sourceforge.net/
Rhino-Mozilla. https://developer.mozilla.org/docs/Mozilla/Projects/Rhino
Karbalaie, F., Sami, A., Ahmadi, M.: Semantic malware detection by deploying graph mining. Int. J. Comput. Sci. Issues (IJCSI) 9(1), 373–379 (2012)
Kaplan, S., Siefert, C., Livshits, B., Zorn, B., Curtsinger, C.: NoFus : automatically detecting obfuscated javascript code (2011)
Pintol, B.S., Barnete, R.: A novel algorithm for obfuscated code analysis. In: 2011 IEEE International Workshop of Information Forensics and Security (WIFS), pp. 1–5 (2011)
Safe Browsing API. https://developers.google.com/safe-browsing
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Singh, A.K., Goyal, N. (2017). MalCrawler: A Crawler for Seeking and Crawling Malicious Websites. In: Krishnan, P., Radha Krishna, P., Parida, L. (eds) Distributed Computing and Internet Technology. ICDCIT 2017. Lecture Notes in Computer Science(), vol 10109. Springer, Cham. https://doi.org/10.1007/978-3-319-50472-8_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-50472-8_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-50471-1
Online ISBN: 978-3-319-50472-8
eBook Packages: Computer ScienceComputer Science (R0)