Multi-class Malware Detection via Deep Graph Convolutional Networks Using TF-IDF-Based Attributed Call Graphs

Khan, Irshad; Kwon, Young-Woo

doi:10.1007/978-981-99-8024-6_15

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14402))

Included in the following conference series:

International Conference on Information Security Applications

208 Accesses

Abstract

The proliferation of malware in the Android ecosystem poses significant security risks and financial losses for enterprises and developers. Malware constantly evolves, exhibiting dynamic behavior and complexity, thus making it challenging to develop robust defense mechanisms. Traditional methods, such as signature-based and battery-monitoring approaches, struggle to detect emerging malware variants effectively. Recent advancements in deep learning have shown promising results in Android malware detection. However, most existing approaches focus on binary classification and need more insights into the model’s generality across different types of malware. This study presents a novel approach to address Android malware detection by integrating TF-IDF (Term Frequency-Inverse Document Frequency) features into the call graph structure. By attributing each node in the call graph with TF-IDF-based feature vectors extracted from the opcode sequences of each method using an opcode list, we present a more thorough representation that encapsulates the complex traits of the malware samples. We employ state-of-the-art graph-based deep learning models to classify malware families, including Graph Convolutional Networks (GCN), SAGEConv, Graph Attention Networks (GAT), and Graph Isomorphism Networks (GIN). By incorporating high-level structural information from the call graphs and TF-IDF-based raw features, our approach aims to enhance the accuracy and generality of the malware detection models. We identify an optimal model for the Android malware family classification task through extensive evaluation and comparison of the above-mentioned models. The findings of this study contribute to advancing the field of Android malware detection and provide insights into the effectiveness of graph-based deep learning models for combating evolving malware threats.

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2021R1I1A3043889) and the Ministry of Science and ICT (No.2021R1A5A1021944).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SFCGDroid: android malware detection based on sensitive function call graph

Article 01 May 2023

Deep learning for effective Android malware detection using API call graph embeddings

Article 23 March 2019

CDGDroid: Android Malware Detection Based on Deep Learning Using CFG and DFG

References

“Smartphones-statistics and facts.” https://www.statista.com/topics/840/smartphones/
“Mobile malware evolution report.” https://securelist.com/mobile-malware-evolution-2019/96280/
Qiu, J., et al.: Data-driven android malware intelligence: a survey. In: Chen, X., Huang, X., Zhang, J. (eds.) ML4CS 2019. LNCS, vol. 11806, pp. 183–202. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30619-9_14
Chapter Google Scholar
Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K., Siemens, C.: DREBIN: effective and explainable detection of android malware in your pocket. In: NDSS, vol. 14, pp. 23–26 (2014)
Google Scholar
Zhang, M., Duan, Y., Yin, H., Zhao, Z.: Semantics-aware android malware classification using weighted contextual API dependency graphs. In: Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, pp. 1105–1116 (2014)
Google Scholar
Yuan, Z., Lu, Y., Wang, Z., Xue, Y.: Droid-sec: deep learning in android malware detection. In: Proceedings of the 2014 ACM Conference on SIGCOMM, pp. 371–372 (2014)
Google Scholar
Narayanan, A., Meng, G., Yang, L., Liu, J., Chen, L.: Contextual Weisfeiler-Lehman graph kernel for malware detection. In: 2016 International Joint Conference on Neural Networks (IJCNN), pp. 4701–4708. IEEE (2016)
Google Scholar
Hassen, M., Chan, P.K.: Scalable function call graph-based malware classification. In: Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, pp. 239–248 (2017)
Google Scholar
Xu, K., Li, Y., Deng, R.H., Chen, K.: DeepRefiner: multi-layer android malware detection system applying deep neural networks. In: 2018 IEEE European Symposium on Security and Privacy (EuroS &P), pp. 473–487. IEEE (2018)
Google Scholar
Androguard. https://androguard.readthedocs.io/en/latest/
Tam, K., Fattori, A., Khan, S., Cavallaro, L.: Copperdroid: automatic reconstruction of android malware behaviors. In: NDSS Symposium 2015, pp. 1–15 (2015)
Google Scholar
Gandotra, E., Bansal, D., Sofat, S.: Malware analysis and classification: a survey. J. Inf. Secur. 2014 (2014)
Google Scholar
Li, J., Sun, L., Yan, Q., Li, Z., Srisa-An, W., Ye, H.: Significant permission identification for machine-learning-based android malware detection. IEEE Trans. Ind. Inf. 14(7), 3216–3225 (2018)
Article Google Scholar
Liu, Y., Zhang, L., Huang, X.: Using G features to improve the efficiency of function call graph based android malware detection. Wireless Pers. Commun. 103(4), 2947–2955 (2018)
Article Google Scholar
McLaughlin, N., et al.: Deep android malware detection. In: Proceedings of the Seventh ACM on Conference on Data and Application Security and Privacy, pp. 301–308 (2017)
Google Scholar
Gao, H., Cheng, S., Zhang, W.: GDroid: android malware detection and classification with graph convolutional network. Comput. Secur. 106, 102264 (2021)
Article Google Scholar
Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks?. arXiv preprint arXiv:1810.00826 (2018)
Jing, L.P., Huang, H.K., Shi, H.B.: Improved feature selection approach TFIDF in text mining. In: Proceedings International Conference on Machine Learning and Cybernetics, vol. 2, pp. 944–946. IEEE (2002)
Google Scholar
Ozogur, G., Erturk, M.A., Gurkas Aydin, Z., Aydin, M.A.: Android malware detection in bytecode level using TF-IDF and XGBoost. Comput. J. bxac198 (2023)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)
Hu, W., et al.: Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265 (2019)
Mahdavifar, S., Kadir, A.F.A., Fatemi, R., Alhadidi, D., Ghorbani, A.: Dynamic android malware category classification using semi-supervised deep learning. In: 2020 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), pp. 515–522. IEEE (2020)
Google Scholar
Kotsiantis, S., Kanellopoulos, D., Pintelas, P., et al.: Handling imbalanced datasets: a review. GESTS Int. Trans. Comput. Sci. Eng. 30(1), 25–36 (2006)
Google Scholar
Goutte, C., Gaussier, E.: A probabilistic interpretation of precision, recall and F-score, with implication for evaluation. In: Losada, D.E., Fernández-Luna, J.M. (eds.) ECIR 2005. LNCS, vol. 3408, pp. 345–359. Springer, Heidelberg (2005). https://doi.org/10.1007/978-3-540-31865-1_25
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science and Engineering, Kyungpook National University, Daegu, South Korea
Irshad Khan & Young-Woo Kwon

Authors

Irshad Khan
View author publications
You can also search for this author in PubMed Google Scholar
Young-Woo Kwon
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Young-Woo Kwon .

Editor information

Editors and Affiliations

Pusan National University, Busan, Korea (Republic of)
Howon Kim
Yeungnam University, Gyeongbuk, Korea (Republic of)
Jonghee Youn

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Khan, I., Kwon, YW. (2024). Multi-class Malware Detection via Deep Graph Convolutional Networks Using TF-IDF-Based Attributed Call Graphs. In: Kim, H., Youn, J. (eds) Information Security Applications. WISA 2023. Lecture Notes in Computer Science, vol 14402. Springer, Singapore. https://doi.org/10.1007/978-981-99-8024-6_15

Download citation

DOI: https://doi.org/10.1007/978-981-99-8024-6_15
Published: 11 January 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8023-9
Online ISBN: 978-981-99-8024-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Multi-class Malware Detection via Deep Graph Convolutional Networks Using TF-IDF-Based Attributed Call Graphs

Abstract

Access this chapter

Similar content being viewed by others

SFCGDroid: android malware detection based on sensitive function call graph

Deep learning for effective Android malware detection using API call graph embeddings

CDGDroid: Android Malware Detection Based on Deep Learning Using CFG and DFG

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Multi-class Malware Detection via Deep Graph Convolutional Networks Using TF-IDF-Based Attributed Call Graphs

Abstract

Access this chapter

Similar content being viewed by others

SFCGDroid: android malware detection based on sensitive function call graph

Deep learning for effective Android malware detection using API call graph embeddings

CDGDroid: Android Malware Detection Based on Deep Learning Using CFG and DFG

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation