Abstract
Feature-based learning plays a crucial role at building and sustaining the security. Determination of a software based on its extracted features whether a benign or malign process, and particularly classification into a correct malware family improves the security of the operating system and protects critical user’s information. In this paper, we present a novel hybrid feature-based classification system for Android malware samples. Static features such as permissions requested by mobile applications, hidden payload, and dynamic features such as API calls, installed services, network connections are extracted for classification. We apply machine learning and evaluate the level in classification accuracy of different classifiers by extracting Android malware features using a fairly large set of 3339 samples belonging to 20 malware families. The evaluation study has been scalable with 5 guest machines and took 8 days of processing. The testing accuracy is reached at 92%.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Scikit-learn: Machine Learning in Python. http://scikit-learn.org/stable/index.html. Accessed 15 Jan 2017
Virusshare: Malware Sharing Platform. https://virusshare.com/. Accessed 15 Jan 2017
Virustotal: Free Online Virus, Malware and URL Scanner. https://www.virustotal.com/. Accessed 15 Jan 2017
Aung, Z., Zaw, W.: Permission-based android malware detection. Int. J. Sci. Technol. Res. 2, 228–234 (2013)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Dhaya, R., Poongodi, M.: Detecting software vulnerabilities in android using static analysis. In: 2014 IEEE International Conference on Advanced Communications, Control and Computing Technologies, pp. 915–918, May 2014
Faruki, P., Bharmal, A., Laxmi, V., Ganmoor, V., Gaur, M.S., Conti, M., Rajarajan, M.: Android security: a survey of issues, malware penetration, and defenses. IEEE Commun. Surv. Tutorials 17(2), 998–1022 (2015). (Secondquarter)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
John, G.H., Langley, P.: Estimating continuous distributions in Bayesian classifiers. In: Proceedings of the Eleventh conference on Uncertainty in artificial intelligence, pp. 338–345. Morgan Kaufmann Publishers Inc. (1995)
McWilliams, G.: Analysis of Bayesian classification-based approaches for android malware detection. IET Inf. Secur. 8(1), 25–36 (2014). http://digital-library.theiet.org/content/journals/10.1049/iet-ifs.2013.0095
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Peiravian, N., Zhu, X.: Machine learning for android malware detection using permission and API calls. In: Proceedings of the 2013 IEEE 25th International Conference on Tools with Artificial Intelligence, ICTAI 2013, pp. 300–305 (2013). http://dx.doi.org/10.1109/ICTAI.2013.53
Sahs, J., Khan, L.: A machine learning approach to android malware detection. In: 2012 European Intelligence and Security Informatics Conference, pp. 141–147, August 2012
Schmidt, M., Le Roux, N., Bach, F.: Minimizing finite sums with the stochastic average gradient. Math. Program. 162(1), 83–112 (2017). http://dx.doi.org/10.1007/s10107-016-1030-6
Suarez-Tangil, G., Tapiador, J.E., Peris-Lopez, P., Blasco, J.: Dendroid: a text mining approach to analyzing and classifying code structures in android malware families. Expert Syst. Appl. 41(4), 1104–1117 (2014). http://dx.doi.org/10.1016/j.eswa.2013.07.106
Symantec: Internet security threat report (2016). https://www.symantec.com/content-/dam/symantec/docs/reports/istr-21-2016-en.pdf
Yang, Y., Wei, Z., Xu, Y., He, H., Wang, W.: Droidward: an effective dynamic analysis method for vetting android applications. Cluster Comput. 19, 1–11 (2016)
Yu, H.F., Huang, F.L., Lin, C.J.: Dual coordinate descent methods for logistic regression and maximum entropy models. Mach. Learn. 85(1–2), 41–75 (2011)
Acknowledgements
The authors gratefully acknowledge the support of Galatasaray University, scientific research support program under grant #16.401.004.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Pektaş, A., Acarman, T. (2018). Ensemble Machine Learning Approach for Android Malware Classification Using Hybrid Features. In: Kurzynski, M., Wozniak, M., Burduk, R. (eds) Proceedings of the 10th International Conference on Computer Recognition Systems CORES 2017. CORES 2017. Advances in Intelligent Systems and Computing, vol 578. Springer, Cham. https://doi.org/10.1007/978-3-319-59162-9_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-59162-9_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59161-2
Online ISBN: 978-3-319-59162-9
eBook Packages: EngineeringEngineering (R0)