Abstract
As Android-based mobile devices become increasingly popular, malware detection on Android is very crucial nowadays. In this paper, a novel detection method based on deep learning is proposed to distinguish malware from trusted applications. Considering there is some semantic information in system call sequences as the natural language, we treat one system call sequence as a sentence in the language and construct a classifier based on the Long Short-Term Memory (LSTM) language model. In the classifier, at first two LSTM models are trained respectively by the system call sequences from malware and those from benign applications. Then according to these models, two similarity scores are computed. Finally, the classifier determines whether the application under analysis is malicious or trusted by the greater score. Thorough experiments show that our approach can achieve high efficiency and reach high recall of 96.6% with low false positive rate of 9.3%, which is better than the other methods.
Similar content being viewed by others
References
Arp D, Spreitzenbarth M, Hubner M, et al (2014) DREBIN: Effective and Explainable Detection of Android Malware in Your Pocket, in: Proceeding of 21th Annual Network and Distributed System Security Symposium (NDSS), San Diego, 2014
Aung Z, Zaw W (2013) Permission-based android malware detection. Int J Sci Technol Res 2:228–234
Battista P, Mercaldo F, Nardone V, et al (2016) Identification of Android Malware Families with Model Checking, in: Proceeding of International Conference on Information Systems Security and Privacy, Rome, 2016
Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult, IEEE Press, Neural Networks, 5(2) (1994), pp. 157–166
Bengio Y, Schwenk H, Senécal J et al (2003) Probabilistic language models. J Mach Learn Res 3:1137–1155
Canfora G, Medvet E, Mercaldo F, et al (2015) Detecting Android malware using sequences of system calls, in: Proceeding of International Workshop on Software Development Lifecycle for Mobile (DeMobile), 2015, pp 13–20
Canfora G, Mercaldo F, Visaggio CA (2016) An HMM and structural entropy based detector for android malware: an empirical study. Comput Secur 61:1–18
Chen PS, Lin SC, Sun CH (2015) Simple and effective method for detecting abnormal internet behaviors of mobile devices. Inf Sci 321:193–204
Chen S, Xue M, Tang Z, et al (2016) StormDroid: A Streaminglized Machine Learning-Based System for Detecting Android Malware, in: Proceeding of ACM on Asia Conference on Computer and Communications Security(ASIACCS), Xian, 2016, pp 377–388
Dimja, Marko, Atzeni S, et al (2016) Evaluation of Android Malware Detection Based on System Calls. In: Proceedings of ACM on International Workshop on Security and Privacy Analytics (IWSPA), New Orlean, pp 1–8
Elman JL (1990) Finding structure in time 1990. Cogn Sci 14:179–211
Feng Y, Anand S, Dillig I, et al (2014) Apposcopy: semantics-based detection of Android malware through static analysis, in: Proceeding of 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE14), Hong Kong, 2014, pp 576–587
FireEye, Out of Pocket (2015): A Comprehensive Mobile Threat Assessment of 7 Million iOS and Android Apps, < https://www.fireeye.com/rs/fireeye/images/rpt-mobilethreat assessment.pdf>, (accessed 17.08.27)
Graves A (2012), Supervised Sequence Labelling with Recurrent Neural Networks. Studies in Computational Intelligence
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9:1735–1780
Li Q, Li X (2015) Android Malware Detection Based on Static Analysis of Characteristic Tree, in: Proceeding of International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery, Xian, 2015, pp 84–91
Li Y, Shen T, Sun X, et al (2015) Detection, Classification and Characterization of Android Malware Using API Data Dependency, in: Proceeding of International Conference on Security and Privacy in Communication Systems(SecureComm2015), Dallas, 2015, pp 23–40
Lunden I. (2015) 6.1b smartphone users globally by 2020, overtaking basic fixed phone subscriptions.<http://techcrunch.com/2015/06/02/6-1b-smartphone-users-globally-by-2020-overtaking-basic-fixed-phone-subscriptions >, (accessed 17.08.27)
android-market-api-py. <https://github.com/liato/android-market-api-py> (accessed 17.08.27)
Mercaldo F, Nardone V, Santone A, et al (2016) Download Malware? No, Thanks. How Formal Methods Can Block Update Attacks, in: Proceeding of Fme Workshop on Formal Methods in Software Engineering, Austin, 2016, pp 22–28
Mercaldo F, Nardone V, Santone A, et al (2016) Ransomware Steals Your Phone. Formal Methods Rescue It, in: Proceeding of International Conference on Formal Techniques for Distributed Objects, Components, and Systems, Heraklion, Crete, Greece, 2016, pp 212–221
Mikolov T, Karafiat M, Burget L, et al (2010) Recurrent neural network based language model, in: Proceeding of the Annual Conference of the International Speech Communication Association (Interspeech 2010), Makuhari, 2010, pp 1045–1048
Rashidi B, Fung C, Bertino E (2016) Android resource usage risk assessment using hidden Markov model and online learning. Comput Secur 65:90–107
Saracino A, Sgandurra D, Dini G, et al (2016) MADAM: Effective and Efficient Behavior-based Android Malware Detection and Prevention. IEEE Transactions on Dependable & Secure Computing, 2016, pp 1–1
Sundermeyer M, Schluter R, Ney H (2012) LSTM Neural Networks for Language Modeling, in: Proceeding of the Annual Conference of the International Speech Communication Association (Interspeech2012), Portland, 2012, pp 601–608
System call https://en.wikipedia.org/wiki/System_call > (accessed 17.08.27)
Tensorflow http://www.tensorflow.org> (accessed 17.08.27)
Wang Z, Li C, Guan Y, et al (2015) DroidChain: A novel malware detection method for Android based on behavior chain, in: Proceeding of Communications and Network Security (CNS), FLORENCE, 2015, pp 727–728
Wu W C, Hung S H (2014) DroidDolphin: a dynamic Android malware detection framework using big data and machine learning, in: Proceeding of 2014 Conference on Research in Adaptive and Convergent Systems, Towson, 2014, pp 247–252
Xiao X, Wang Z, Li Q et al (2017) Back-propagation neural network on Markov chains from system call sequences: a new approach for detecting android malware with system call sequences. IET Inf Secur 11:8–15
Xu K, Li Y, Deng RH (2016) ICCDetector: ICC-based malware detection on android. IEEE Trans Inf Forensics Secur 11:1252–1264
Yeh C W, Yeh W T, Hung S H, et al (2016) Flattened Data in Convolutional Neural Networks: Using Malware Detection as Case Study, in: Proceeding of International Conference on Research in Adaptive and Convergent Systems, Odense, 2016, pp 130–135
Yu W, Ge L, Xu G, et al (2014) Towards Neural Network Based Malware Detection on Android Mobile Devices, Cybersecurity Systems for Human Cognition Augmentation, 2014, pp 99–117
Zhou Y, Jiang X (2012) Dissecting Android Malware: Characterization and Evolution, in: Proceedings of 33rd IEEE Symposium on Security and Privacy, SAN FRANCISCO, 2012, pp. 95–109
Zhou Y, Jiang X (2013) Android malware, Springer, New York, USA, 2013
Acknowledgements
This work is supported by the NSFC projects (61375054, 61402255, 61202358), the National High-tech R&D Program of China (2015AA016102), Guangdong Natural Science Foundation (2015A030310492, 2014A030313745) and the RD Program of Shenzhen (JCYJ20150630170146831, JCYJ20160301152145171, JCYJ20160531174259309, JSGG20150512162853495, Shenfagai [2015] 986), and Cross fund of Graduate School at Shenzhen, Tsinghua University (JC20140001).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Xiao, X., Zhang, S., Mercaldo, F. et al. Android malware detection based on system call sequences and LSTM. Multimed Tools Appl 78, 3979–3999 (2019). https://doi.org/10.1007/s11042-017-5104-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-017-5104-0