Skip to main content

Multi-class Malware Detection via Deep Graph Convolutional Networks Using TF-IDF-Based Attributed Call Graphs

  • Conference paper
  • First Online:
Information Security Applications (WISA 2023)

Abstract

The proliferation of malware in the Android ecosystem poses significant security risks and financial losses for enterprises and developers. Malware constantly evolves, exhibiting dynamic behavior and complexity, thus making it challenging to develop robust defense mechanisms. Traditional methods, such as signature-based and battery-monitoring approaches, struggle to detect emerging malware variants effectively. Recent advancements in deep learning have shown promising results in Android malware detection. However, most existing approaches focus on binary classification and need more insights into the model’s generality across different types of malware. This study presents a novel approach to address Android malware detection by integrating TF-IDF (Term Frequency-Inverse Document Frequency) features into the call graph structure. By attributing each node in the call graph with TF-IDF-based feature vectors extracted from the opcode sequences of each method using an opcode list, we present a more thorough representation that encapsulates the complex traits of the malware samples. We employ state-of-the-art graph-based deep learning models to classify malware families, including Graph Convolutional Networks (GCN), SAGEConv, Graph Attention Networks (GAT), and Graph Isomorphism Networks (GIN). By incorporating high-level structural information from the call graphs and TF-IDF-based raw features, our approach aims to enhance the accuracy and generality of the malware detection models. We identify an optimal model for the Android malware family classification task through extensive evaluation and comparison of the above-mentioned models. The findings of this study contribute to advancing the field of Android malware detection and provide insights into the effectiveness of graph-based deep learning models for combating evolving malware threats.

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2021R1I1A3043889) and the Ministry of Science and ICT (No.2021R1A5A1021944).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. “Smartphones-statistics and facts.” https://www.statista.com/topics/840/smartphones/

  2. “Mobile malware evolution report.” https://securelist.com/mobile-malware-evolution-2019/96280/

  3. Qiu, J., et al.: Data-driven android malware intelligence: a survey. In: Chen, X., Huang, X., Zhang, J. (eds.) ML4CS 2019. LNCS, vol. 11806, pp. 183–202. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30619-9_14

    Chapter  Google Scholar 

  4. Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., Siemens, C.: DREBIN: effective and explainable detection of android malware in your pocket. In: NDSS, vol. 14, pp. 23–26 (2014)

    Google Scholar 

  5. Zhang, M., Duan, Y., Yin, H., Zhao, Z.: Semantics-aware android malware classification using weighted contextual API dependency graphs. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1105–1116 (2014)

    Google Scholar 

  6. Yuan, Z., Lu, Y., Wang, Z., Xue, Y.: Droid-sec: deep learning in android malware detection. In: Proceedings of the 2014 ACM Conference on SIGCOMM, pp. 371–372 (2014)

    Google Scholar 

  7. Narayanan, A., Meng, G., Yang, L., Liu, J., Chen, L.: Contextual Weisfeiler-Lehman graph kernel for malware detection. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 4701–4708. IEEE (2016)

    Google Scholar 

  8. Hassen, M., Chan, P.K.: Scalable function call graph-based malware classification. In: Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, pp. 239–248 (2017)

    Google Scholar 

  9. Xu, K., Li, Y., Deng, R.H., Chen, K.: DeepRefiner: multi-layer android malware detection system applying deep neural networks. In: 2018 IEEE European Symposium on Security and Privacy (EuroS &P), pp. 473–487. IEEE (2018)

    Google Scholar 

  10. Androguard. https://androguard.readthedocs.io/en/latest/

  11. Tam, K., Fattori, A., Khan, S., Cavallaro, L.: Copperdroid: automatic reconstruction of android malware behaviors. In: NDSS Symposium 2015, pp. 1–15 (2015)

    Google Scholar 

  12. Gandotra, E., Bansal, D., Sofat, S.: Malware analysis and classification: a survey. J. Inf. Secur. 2014 (2014)

    Google Scholar 

  13. Li, J., Sun, L., Yan, Q., Li, Z., Srisa-An, W., Ye, H.: Significant permission identification for machine-learning-based android malware detection. IEEE Trans. Ind. Inf. 14(7), 3216–3225 (2018)

    Article  Google Scholar 

  14. Liu, Y., Zhang, L., Huang, X.: Using G features to improve the efficiency of function call graph based android malware detection. Wireless Pers. Commun. 103(4), 2947–2955 (2018)

    Article  Google Scholar 

  15. McLaughlin, N., et al.: Deep android malware detection. In: Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, pp. 301–308 (2017)

    Google Scholar 

  16. Gao, H., Cheng, S., Zhang, W.: GDroid: android malware detection and classification with graph convolutional network. Comput. Secur. 106, 102264 (2021)

    Article  Google Scholar 

  17. Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks?. arXiv preprint arXiv:1810.00826 (2018)

  18. Jing, L.P., Huang, H.K., Shi, H.B.: Improved feature selection approach TFIDF in text mining. In: Proceedings International Conference on Machine Learning and Cybernetics, vol. 2, pp. 944–946. IEEE (2002)

    Google Scholar 

  19. Ozogur, G., Erturk, M.A., Gurkas Aydin, Z., Aydin, M.A.: Android malware detection in bytecode level using TF-IDF and XGBoost. Comput. J. bxac198 (2023)

    Google Scholar 

  20. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  21. Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  22. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)

  23. Hu, W., et al.: Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265 (2019)

  24. Mahdavifar, S., Kadir, A.F.A., Fatemi, R., Alhadidi, D., Ghorbani, A.: Dynamic android malware category classification using semi-supervised deep learning. In: 2020 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), pp. 515–522. IEEE (2020)

    Google Scholar 

  25. Kotsiantis, S., Kanellopoulos, D., Pintelas, P., et al.: Handling imbalanced datasets: a review. GESTS Int. Trans. Comput. Sci. Eng. 30(1), 25–36 (2006)

    Google Scholar 

  26. Goutte, C., Gaussier, E.: A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 345–359. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31865-1_25

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Young-Woo Kwon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Khan, I., Kwon, YW. (2024). Multi-class Malware Detection via Deep Graph Convolutional Networks Using TF-IDF-Based Attributed Call Graphs. In: Kim, H., Youn, J. (eds) Information Security Applications. WISA 2023. Lecture Notes in Computer Science, vol 14402. Springer, Singapore. https://doi.org/10.1007/978-981-99-8024-6_15

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8024-6_15

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8023-9

  • Online ISBN: 978-981-99-8024-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics