Abstract
Static and dynamic program analysis are the key concepts researchers apply to uncover security-critical implementation weaknesses in Android applications. As it is often not obvious in which context problematic statements occur, it is challenging to assess their practical impact. While some flaws may turn out to be bad practice but not undermine the overall security level, others could have a serious impact. Distinguishing them requires knowledge of the designated app purpose.
In this paper, we introduce a machine learning-based system that is capable of generating natural language text describing the purpose and core functionality of Android apps based on their actual code. We design a dense neural network that captures the semantic relationships of resource identifiers, string constants, and API calls contained in apps to derive a high-level picture of implemented program behavior. For arbitrary applications, our system can predict precise, human-readable keywords and short phrases that indicate the main use-cases apps are designed for.
We evaluate our solution on 67,040 real-world apps and find that with a precision between 69% and 84% we can identify keywords that also occur in the developer-provided description in Google Play. To avoid incomprehensible black box predictions, we apply a model explaining algorithm and demonstrate that our technique can substantially augment inspections of Android apps by contributing contextual information.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Our implementation is available at: https://github.com/sg10/apk-verbalizer.
References
Cao, Y., et al.: EdgeMiner: automatically detecting implicit control flow transitions through the android framework. In: Network and Distributed System Security Symposium - NDSS 2015. The Internet Society (2015)
Gao, H., et al.: AutoPer: automatic recommender for runtime-permission in android applications. In: 43rd IEEE Annual Computer Software and Applications Conference, COMPSAC 2019, Milwaukee, WI, USA, 15–19 July 2019, vol. 1, pp. 107–116. IEEE (2019)
Gorla, A., Tavecchia, I., Gross, F., Zeller, A.: Checking app behavior against app descriptions. In: International Conference on Software Engineering - ICSE 2014, pp. 1025–1035. ACM (2014)
Hamedani, M.R., Shin, D., Lee, M., Cho, S., Hwang, C.: AndroClass: an effective method to classify android applications by applying deep neural networks to comprehensive features. Wirel. Commun. Mob. Comput. 2018, 1250359:1–1250359:21 (2018)
Karbab, E.B., Debbabi, M., Derhab, A., Mouheb, D.: MalDozer: automatic framework for android malware detection using deep learning. Digital Invest. 24, S48–S59 (2018)
Kowalczyk, E., Memon, A.M., Cohen, M.B.: Piecing together app behavior from multiple artifacts: a case study. In: Symposium on Software Reliability Engineering - ISSRE 2015, pp. 438–449. IEEE Computer Society (2015)
Kuznetsov, K., Avdiienko, V., Gorla, A., Zeller, A.: Checking app user interfaces against app descriptions. In: Workshop on App Market Analytics - WAMA, pp. 1–7. ACM (2016)
Lundberg, S.M., Lee, S.: A unified approach to interpreting model predictions. In: Neural Information Processing Systems - NIPS 2017, pp. 4765–4774 (2017)
Pan, X., et al.: FlowCog: context-aware semantics extraction and analysis of information flow leaks in android apps. In: USENIX Security 2018, pp. 1669–1685. USENIX Association (2018)
Qu, Z., Rastogi, V., Zhang, X., Chen, Y., Zhu, T., Chen, Z.: AutoCog: measuring the description-to-permission fidelity in android applications. In: Conference on Computer and Communications Security - CCS 2014, pp. 1354–1365. ACM (2014)
Takahashi, T., Ban, T.: Android application analysis using machine learning techniques. In: Sikos, L.F. (ed.) AI in Cybersecurity. ISRL, vol. 151, pp. 181–205. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-98842-9_7
Vásquez, M.L., Holtzhauer, A., Poshyvanyk, D.: On automatically detecting similar Android apps. In: International Conference on Program Comprehension - ICPC 2016, pp. 1–10. IEEE Computer Society (2016)
Viennot, N., Garcia, E., Nieh, J.: A measurement study of Google Play. In: Measurement and Modeling of Computer Systems - SIGMETRICS 2014, pp. 221–233. ACM (2014)
Watanabe, T., Akiyama, M., Sakai, T., Mori, T.: Understanding the inconsistencies between text descriptions and the use of privacy-sensitive resources of mobile apps. In: Symposium On Usable Privacy and Security - SOUPS 2015, pp. 241–255. USENIX Association (2015)
Zhang, M., Duan, Y., Feng, Q., Yin, H.: Towards automatic generation of security-centric descriptions for Android apps. In: Conference on Computer and Communications Security - CCS 2015, pp. 518–529. ACM (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 IFIP International Federation for Information Processing
About this paper
Cite this paper
Feichtner, J., Gruber, S. (2020). Code Between the Lines: Semantic Analysis of Android Applications. In: Hölbl, M., Rannenberg, K., Welzer, T. (eds) ICT Systems Security and Privacy Protection. SEC 2020. IFIP Advances in Information and Communication Technology, vol 580. Springer, Cham. https://doi.org/10.1007/978-3-030-58201-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-58201-2_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58200-5
Online ISBN: 978-3-030-58201-2
eBook Packages: Computer ScienceComputer Science (R0)