Lexical Mining of Malicious URLs for Classifying Android Malware

  • Shanshan Wang
  • Qiben Yan
  • Zhenxiang ChenEmail author
  • Lin Wang
  • Riccardo Spolaor
  • Bo Yang
  • Mauro Conti
Conference paper
Part of the Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering book series (LNICST, volume 254)


The prevalence of mobile malware has become a growing issue given the tight integration of mobile systems with our daily life. Most malware programs use URLs inside network traffic to forward commands to launch malicious activities. Therefore, the detection of malicious URLs can be essential in deterring such malicious activities. Traditional methods construct blacklists with verified URLs to identify malicious URLs, but their effectiveness is impaired by unknown malicious URLs. Recently, machine learning-based methods have been proposed for malware detection with improved performance. In this paper, we propose a novel URL detection method based on Floating Centroids Method (FCM), which integrates supervised classification and unsupervised clustering in a coherent manner. The proposed method uses the lexical features of a URL to effectively identify malicious URLs while grouping similar URLs into the same cluster. Our experimental results show that a URL cluster exhibits unique behavioral patterns that can be used for malware detection with high accuracy. Moreover, the proposed behavioral clustering method facilitates the identification of malicious URL categories and unseen malware variants.



This work was supported by the National Natural Science Foundation of China under Grants No. 61672262, No. 61573166 and No. 61572230, the Shandong Provincial Key R&D Program under Grant No. 2016GGX101001 and No. 2018CXGC0706, CERNET Next Generation Internet Technology Innovation Project under Grant No. NGII20160404. This work is also supported in part by NSF grant CNS-1566388.


  1. 1.
  2. 2.
    Wang, L., et al.: Improvement of neural network classifier using floating centroids. Knowl. Inf. Syst. 31(3), 433–454 (2012)CrossRefGoogle Scholar
  3. 3.
    Specification of malicious url 2013.
  4. 4.
  5. 5.
    Wu, D.J., Mao, C.H., Lee, H.M., Wu, K.P.: Droidmat: android malware detection through manifest and api calls tracing. In: Information Security, pp. 62–69 (2012)Google Scholar
  6. 6.
    Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K.: DREBIN: effective and explainable detection of android malware in your pocket. In: Proceedings of the Network and Distributed System Security Symposium (NDSS) (2014)Google Scholar
  7. 7.
    Yang, C., Xu, Z., Gu, G., Yegneswaran, V., Porras, P.: DroidMiner: automated mining and characterization of fine-grained malicious behaviors in android applications. In: Kutyłowski, M., Vaidya, J. (eds.) ESORICS 2014. LNCS, vol. 8712, pp. 163–182. Springer, Cham (2014). Scholar
  8. 8.
    Yan, L.K., Yin, H.: DroidScope: seamlessly reconstructing the OS and Dalvik semantic views for dynamic android malware analysis. In: Proceedings of the 21st USENIX Conference on Security Symposium, p. 29 (2013)Google Scholar
  9. 9.
    Rastogi, V., Chen, Y., Enck, W.: AppsPlayground: automatic security analysis of smartphone applications. In: ACM Conference on Data and Application Security and Privacy, pp. 209–220 (2013)Google Scholar
  10. 10.
    Narudin, F.A., Feizollah, A., Anuar, N.B., Gani, A.: Evaluation of machine learning classifiers for mobile malware detection. Soft Comput. 20(1), 1–15 (2016)CrossRefGoogle Scholar
  11. 11.
    Xu, Q., et al.: Automatic generation of mobile app signatures from traffic observations. In: Computer Communications, pp. 1481–1489 (2015)Google Scholar
  12. 12.
    Wang, S., Chen, Z., Zhang, L., Yan, Q., Yang, B.: Trafficav: an effective and explainable detection of mobile malware behavior using network traffic. In: Proceedings of IEEE/ACM International Symposium on Quality of Service (IWQOS), pp. 1–6 (2016)Google Scholar
  13. 13.
    Pizzato, L., Rej, T., Chung, T., Koprinska, I., Kay, J.: RECON: a reciprocal recommender for online dating. In: ACM Conference on Recommender Systems, pp. 207–214 (2010)Google Scholar
  14. 14.
    Wei, X., Neamtiu, I., Faloutsos, M.: Whom does your android app talk to? In: Global Communications Conference (GLOBECOM), pp. 1–6. IEEE (2015)Google Scholar
  15. 15.
    Shabtai, A., Tenenboim-Chekina, L., Mimran, D., Rokach, L., Shapira, B., Elovici, Y.: Mobile malware detection through analysis of deviations in application network behavior. Comput. Secur. 43(6), 1–18 (2014)CrossRefGoogle Scholar
  16. 16.
    Gorla, A., Tavecchia, I., Gross, F., Zeller, A.: Checking app behavior against app descriptions. In: Proceedings of the 36th International Conference on Software Engineering, pp. 1025–1035. ACM (2014)Google Scholar
  17. 17.
  18. 18.
    Tshark - dump and analyze network traffic.
  19. 19.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Fourteenth International Conference on Machine Learning, pp. 412–420 (1997)Google Scholar
  20. 20.
  21. 21. - because sharing is caring.
  22. 22.
  23. 23.
    Aranganayagi, S., Thangavel, K.: Clustering categorical data using silhouette coefficient as a relocating measure. In: Conference on Computational Intelligence and Multimedia Applications. International Conference on, vol. 2, pp. 13–17. IEEE (2007)Google Scholar
  24. 24.
    Perdisci, R., Lee, W., Feamster, N.: Behavioral clustering of HTTP-based malware and signature generation using malicious network traces. In: Usenix Conference on Networked Systems Design and Implementation, p. 26 (2010)Google Scholar
  25. 25.
    Wang, S., Yan, Q., Chen, Z., Yang, B., Zhao, C., Conti, M.: Detecting android malware leveraging text semantics of network flows. IEEE Trans. Inf. Forensics Secur. PP(99), 1 (2017)Google Scholar

Copyright information

© ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2018

Authors and Affiliations

  • Shanshan Wang
    • 1
  • Qiben Yan
    • 2
  • Zhenxiang Chen
    • 1
    Email author
  • Lin Wang
    • 1
  • Riccardo Spolaor
    • 3
  • Bo Yang
    • 1
  • Mauro Conti
    • 3
  1. 1.Shandong Provincial Key Laboratory of Network Based Intelligent ComputingUniversity of JinanJinanChina
  2. 2.Department of Computer Science and EngineeringUniversity of Nebraska-LincolnLincolnUSA
  3. 3.Department of MathematicsUniversity of PadovaPaduaItaly

Personalised recommendations