Advertisement

A Visualization-Based Analysis on Classifying Android Malware

  • Rory Coulter
  • Lei PanEmail author
  • Jun Zhang
  • Yang Xiang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11806)

Abstract

Since the introduction of the Android mobile platform, the state of mobile malware has evolved in both attack sophistication and its ability to evade detection. Given the right combination of elements, the detection of malicious applications may be found among those that pose no threat, yet the threats that exist across these malware types reveal distinguishable attack characteristics. This paper investigates the benign and attacking characteristics. By plotting complex features into dendrograms, we propose a novel approach to visually distinguish Android apps. We visualize the complicated relationship and evaluate the effect of different text mining methods. Specifically, we employ machine learning techniques including feature reduction using Principle Component Analysis, and the Random Forest classifier, to compare eight different models. Using the Drebin dataset, we achieved an average accuracy of 95.83%.

Keywords

Artificial intelligence Cyber security Data driven cyber security Machine learning Malware detection 

References

  1. 1.
    Ahmadi, M., Ulyanov, D., Semenov, S., Trofimov, M., Giacinto, G.: Novel feature extraction, selection and fusion for effective malware family classification. In: Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, CODASPY 2016, pp. 183–194. ACM, New York (2016).  https://doi.org/10.1145/2857705.2857713
  2. 2.
    Armanfard, N., Reilly, J.P., Komeili, M.: Local feature selection for data classification. IEEE Trans. Pattern Anal. Mach. Intell. 38(6), 1217–1227 (2016).  https://doi.org/10.1109/TPAMI.2015.2478471CrossRefGoogle Scholar
  3. 3.
    Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., Siemens, C.: DREBIN: effective and explainable detection of android malware in your pocket. In: NDSS (2014)Google Scholar
  4. 4.
    Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1798–1828 (2013).  https://doi.org/10.1109/TPAMI.2013.50CrossRefGoogle Scholar
  5. 5.
    Coulter, R., Pan, L.: Intelligent agents defending for an IoT world: a review. Comput. Secur. 73, 439–458 (2018)CrossRefGoogle Scholar
  6. 6.
    Deshotels, L., Notani, V., Lakhotia, A.: DroidLegacy: automated familial classification of android malware. In: Proceedings of ACM SIGPLAN on Program Protection and Reverse Engineering Workshop 2014, PPREW 2014, pp. 3:1–3:12. ACM, New York (2014).  https://doi.org/10.1145/2556464.2556467
  7. 7.
    Feng, Y., Anand, S., Dillig, I., Aiken, A.: Apposcopy: semantics-based detection of android malware through static analysis. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, FSE 2014, pp. 576–587. ACM, New York (2014).  https://doi.org/10.1145/2635868.2635869
  8. 8.
    Galili, T.: dendextend: an R package for visualizing, adjusting, and comparing trees of hierarchical clustering. Bioinformatics (2015).  https://doi.org/10.1093/bioinformatics/btv428. http://bioinformatics.oxfordjournals.org/content/31/22/3718
  9. 9.
    Grace, M., Zhou, Y., Zhang, Q., Zou, S., Jiang, X.: RiskRanker: scalable and accurate zero-day android malware detection. In: Proceedings of the 10th International Conference on Mobile Systems, Applications, and Services, MobiSys 2012, pp. 281–294. ACM, New York (2012).  https://doi.org/10.1145/2307636.2307663
  10. 10.
    Graziano, M., Canali, D., Bilge, L., Lanzi, A., Balzarotti, D.: Needles in a haystack: mining information from public dynamic analysis sandboxes for malware intelligence. In: 24th USENIX Security Symposium, USENIX Security 2015, pp. 1057–1072. USENIX Association, Washington, D.C. (2015). https://www.usenix.org/conference/usenixsecurity15/technical-sessions/presentation/graziano
  11. 11.
    Gu, Z., Gu, L., Eils, R., Schlesner, M., Brors, B.: Circlize implements and enhances circular visualization in R. Bioinformatics 30, 2811–2812 (2014)CrossRefGoogle Scholar
  12. 12.
    Hou, S., Ye, Y., Song, Y., Abdulhayoglu, M.: HinDroid: an intelligent android malware detection system based on structured heterogeneous information network. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2017, pp. 1507–1515. ACM, New York (2017).  https://doi.org/10.1145/3097983.3098026
  13. 13.
    Labs, M.: State of malware report. https://www.malwarebytes.com/pdf/white-papers/stateofmalware.pdf. Accessed 15 July 2019
  14. 14.
    Li, B., Yan, Q., Xu, Z., Wang, G.: Weighted document frequency for feature selection in text classification. In: 2015 International Conference on Asian Language Processing (IALP), pp. 132–135, October 2015.  https://doi.org/10.1109/IALP.2015.7451549
  15. 15.
    Liaw, A., Wiener, M.: Classification and regression by randomforest. R News 2(3), 18–22 (2002)Google Scholar
  16. 16.
    Liu, L., De Vel, O., Han, Q.L., Zhang, J., Xiang, Y.: Detecting and preventing cyber insider threats: a survey. IEEE Commun. Surv. Tutor. 20(2), 1397–1417 (2018)CrossRefGoogle Scholar
  17. 17.
    Maiorca, D., Mercaldo, F., Giacinto, G., Visaggio, C.A., Martinelli, F.: R-PackDroid: API package-based characterization and detection of mobile ransomware. In: Proceedings of the Symposium on Applied Computing, SAC 2017, pp. 1718–1723. ACM, New York (2017).  https://doi.org/10.1145/3019612.3019793
  18. 18.
    McAfee: McAfee labs 2017 threats predictions. https://www.mcafee.com/au/resources/reports/rp-threats-predictions-2017.pdf. Accessed 15 July 2019
  19. 19.
    Narayanan, A., Chandramohan, M., Chen, L., Liu, Y.: Context-aware, adaptive, and scalable android malware detection through online learning. IEEE Trans. Emerg. Top. Comput. Intell. 1(3), 157–175 (2017).  https://doi.org/10.1109/TETCI.2017.2699220CrossRefGoogle Scholar
  20. 20.
    Narayanan, B.N., Djaneye-Boundjou, O., Kebede, T.M.: Performance analysis of machine learning and pattern recognition algorithms for malware classification. In: 2016 IEEE National Aerospace and Electronics Conference (NAECON) and Ohio Innovation Summit (OIS), pp. 338–342, July 2016.  https://doi.org/10.1109/NAECON.2016.7856826
  21. 21.
    Plansangket, S., Gan, J.Q.: A new term weighting scheme based on class specific document frequency for document representation and classification. In: 2015 7th Computer Science and Electronic Engineering Conference (CEEC), pp. 5–8, September 2015.  https://doi.org/10.1109/CEEC.2015.7332690
  22. 22.
    Suarez-Tangil, G., Tapiador, J.E., Peris-Lopez, P., Blasco, J.: DenDroid: a text mining approach to analyzing and classifying code structures in Android malware families. Expert Syst. Appl. 41(4), 1104–1117 (2014).  https://doi.org/10.1016/j.eswa.2013.07.106. http://www.sciencedirect.com/science/article/pii/S0957417413006088
  23. 23.
    Sun, N., Zhang, J., Rimba, P., Gao, S., Zhang, L.Y., Xiang, Y.: Data-driven cybersecurity incident prediction: a survey. IEEE Commun. Surv. Tutor. 21(2), 1744–1772 (2018)CrossRefGoogle Scholar
  24. 24.
    Symantec: Internet security threat report. https://www.symantec.com/content/dam/symantec/docs/reports/istr-21-2016-en.pdf. Accessed 15 July 2019
  25. 25.
    Vidas, T., Votipka, D., Christin, N.: All your droid are belong to us: a survey of current android attacks. In: WOOT, pp. 81–90 (2011)Google Scholar
  26. 26.
    Wei, M., Gong, X., Wang, W.: Claim what you need: a text-mining approach on android permission request authorization. In: 2015 IEEE Global Communications Conference (GLOBECOM), pp. 1–6, December 2015.  https://doi.org/10.1109/GLOCOM.2015.7417472
  27. 27.
    Wu, S.X., Banzhaf, W.: The use of computational intelligence in intrusion detection systems: a review. Appl. Soft Comput. 10(1), 1–35 (2010).  https://doi.org/10.1016/j.asoc.2009.06.019. http://www.sciencedirect.com/science/article/pii/S1568494609000908
  28. 28.
    Xue, Y., et al.: Auditing anti-malware tools by evolving android malware and dynamic loading technique. IEEE Trans. Inf. Forensics Secur. 12(7), 1529–1544 (2017).  https://doi.org/10.1109/TIFS.2017.2661723CrossRefGoogle Scholar
  29. 29.
    Yuan, Z., Lu, Y., Xue, Y.: DroidDetector: android malware characterization and detection using deep learning. Tsinghua Sci. Technol. 21(1), 114–123 (2016)CrossRefGoogle Scholar
  30. 30.
    Zhang, J., Xiang, Y., Wang, Y., Zhou, W., Xiang, Y., Guan, Y.: Network traffic classification using correlation information. IEEE Trans. Parallel Distrib. Syst. 24(1), 104–117 (2012)CrossRefGoogle Scholar
  31. 31.
    Zhu, Z., Dumitras, T.: FeatureSmith: automatically engineering features for malware detection by mining the security literature. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 767–778. ACM (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.School of Software and Electrical EngineeringSwinburne University of TechnologyHawthornAustralia
  2. 2.School of Information TechnologyDeakin UniversityGeelongAustralia

Personalised recommendations