MobileFindr: Function Similarity Identification for Reversing Mobile Binaries

  • Yibin LiaoEmail author
  • Ruoyan CaiEmail author
  • Guodong ZhuEmail author
  • Yue YinEmail author
  • Kang LiEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11098)


Identifying binary code at function level has been applied to a broad range of software security applications and reverse engineering tasks, including patch analysis, vulnerability assessment, code plagiarism detection, malware analysis, etc. However, various anti-reverse engineering techniques (e.g., obfuscation, anti-emulator, etc.) employed by the mobile apps make existing approaches ineffective when performing function identification. In this paper, we propose MobileFindr, an on-device trace-based function similarity identification framework on the mobile platform. MobileFindr runs on real mobile devices and mitigates many prevalent anti-reversing techniques by extracting function execution behaviors via dynamic instrumentation, then characterizing functions with collected behaviors and performing function matching via distance calculation. We have evaluated MobileFindr using real-world top-ranked mobile frameworks and applications. The experimental results showed that MobileFindr outperforms existing state-of-the-art tools in terms of better obfuscation resilience and accuracy.


Reverse engineering Similarity identification Dynamic instrumentation 


  1. 1.
    Android studio - debug your app. Accessed 30 Jan 2018
  2. 2.
    Apktool - a tool for reverse engineering android apk files. Accessed 30 Jan 2018
  3. 3.
    Bingrep. Accessed 30 Jan 2018
  4. 4.
    Clutch 2.0.4. Accessed 30 Jan 2018
  5. 5.
    dex2jar. Accessed 30 Jan 2018
  6. 6.
    Disable aslr on ios applications. Accessed 30 Jan 2018
  7. 7.
    Frida. Accessed 30 Jan 2018
  8. 8.
    Hex-rays decompiler. Accessed 30 Jan 2018
  9. 9.
  10. 10.
    Jd-gui. Accessed 30 Jan 2018
  11. 11.
    ldid. Accessed 30 Jan 2018
  12. 12.
    The lldb debugger. Accessed 30 Jan 2018
  13. 13.
    More complex = less secure: Miss a test path and you could get hacked. Accessed 30 Jan 2018
  14. 14.
    Nearpy. Accessed 30 Jan 2018
  15. 15.
    smali/baksmali wiki. Accessed 30 Jan 2018
  16. 16.
    Top 10 libraries for ios developers. Accessed 30 Jan 2018
  17. 17.
    Zynamics bindiff. Accessed 30 Jan 2018
  18. 18.
    Andoni, A., Indyk, P.: Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. In: 47th Annual IEEE Symposium on Foundations of Computer Science, 2006. FOCS 2006, pp. 459–468. IEEE (2006)Google Scholar
  19. 19.
    Brumley, D., Poosankam, P., Song, D., Zheng, J.: Automatic patch-based exploit generation is possible: techniques and implications. In: IEEE Symposium on Security and Privacy 2008. SP 2008, pp. 143–157. IEEE (2008)Google Scholar
  20. 20.
    David, Y., Partush, N., Yahav, E.: Similarity of binaries through re-optimization. In: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 79–94. ACM (2017)Google Scholar
  21. 21.
    Egele, M., Woo, M., Chapman, P., Brumley, D.: Blanket execution: dynamic similarity testing for program binaries and components. USENIX (2014)Google Scholar
  22. 22.
    Eschweiler, S., Yakdan, K., Gerhards-Padilla, E.: discovRE: efficient cross-architecture identification of bugs in binary code. In: NDSS (2016)Google Scholar
  23. 23.
    Feng, Q., Zhou, R., Xu, C., Cheng, Y., Testa, B., Yin, H.: Scalable graph-based bug search for firmware images. In: Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 480–491. ACM (2016)Google Scholar
  24. 24.
    Flake, H.: Structural comparison of executable objects. In: Proceedings of the International GI Workshop on Detection of Intrusions and Malware & Vulnerability Assessment, number P-46 in Lecture Notes in Informatics, pp. 161–174. Citeseer (2004)Google Scholar
  25. 25.
    Gao, D., Reiter, M.K., Song, D.: BinHunt: automatically finding semantic differences in binary programs. In: Chen, L., Ryan, M.D., Wang, G. (eds.) ICICS 2008. LNCS, vol. 5308, pp. 238–255. Springer, Heidelberg (2008). Scholar
  26. 26.
    Gibler, C., Stevens, R., Crussell, J., Chen, H., Zang, H., Choi, H.: Adrob: examining the landscape and impact of android application plagiarism. In: Proceeding of the 11th Annual International Conference on Mobile Systems, Applications, and Services, pp. 431–444. ACM (2013)Google Scholar
  27. 27.
    Godefroid, P., Levin, M.Y., Molnar, D.A., et al.: Automated whitebox fuzz testing. In: NDSS, vol. 8, pp. 151–166 (2008)Google Scholar
  28. 28.
    Herremans, D.: MorpheuS: automatic music generation with recurrent pattern constraints and tension profiles (2016)Google Scholar
  29. 29.
    Junod, P., Rinaldini, J., Wehrli, J., Michielin, J.: Obfuscator-LLVM - software protection for the masses. In: Wyseur, B. (ed.) Proceedings of the IEEE/ACM 1st International Workshop on Software Protection, SPRO 2015, Firenze, Italy, 19th May 2015, pp. 3–9. IEEE (2015).
  30. 30.
    Kirat, D., Vigna, G.: Malgene: automatic extraction of malware analysis evasion signature. In: Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pp. 769–780. ACM (2015)Google Scholar
  31. 31.
    Lindorfer, M., Di Federico, A., Maggi, F., Comparetti, P.M., Zanero, S.: Lines of malicious code: insights into the malicious software industry. In: Proceedings of the 28th Annual Computer Security Applications Conference, pp. 349–358. ACM (2012)Google Scholar
  32. 32.
    Liu, C., Chen, C., Han, J., Yu, P.S.: GPLAG: detection of software plagiarism by program dependence graph analysis. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 872–881. ACM (2006)Google Scholar
  33. 33.
    Luk, C.K., et al.: Pin: building customized program analysis tools with dynamic instrumentation. In: ACM SIGPLAN notices, vol. 40, pp. 190–200. ACM (2005)Google Scholar
  34. 34.
    Luo, L., Ming, J., Wu, D., Liu, P., Zhu, S.: Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pp. 389–400. ACM (2014)Google Scholar
  35. 35.
    Ming, J., Xu, D., Jiang, Y., Wu, D.: BinSim: trace-based semantic binary diffing via system call sliced segment equivalence checking. In: Proceedings of the 26th USENIX Security Symposium, pp. 253–270. USENIX Association (2017)Google Scholar
  36. 36.
    Moser, A., Kruegel, C., Kirda, E.: Exploring multiple execution paths for malware analysis. In: IEEE Symposium on Security and Privacy 2007. SP 2007, pp. 231–245. IEEE (2007)Google Scholar
  37. 37.
    Moser, A., Kruegel, C., Kirda, E.: Limits of static analysis for malware detection. In: Twenty-Third Annual Computer Security Applications Conference 2007. ACSAC 2007, pp. 421–430. IEEE (2007)Google Scholar
  38. 38.
    Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856 (2002)Google Scholar
  39. 39.
    Oh, J.: Fight against 1-day exploits: diffing binaries vs anti-diffing binaries. Black Hat (2009)Google Scholar
  40. 40.
    Petsas, T., Voyatzis, G., Athanasopoulos, E., Polychronakis, M., Ioannidis, S.: Rage against the virtual machine: hindering dynamic analysis of android malware. In: Proceedings of the Seventh European Workshop on System Security, p. 5. ACM (2014)Google Scholar
  41. 41.
    Pewny, J., Garmany, B., Gawlik, R., Rossow, C., Holz, T.: Cross-architecture bug search in binary executables. In: 2015 IEEE Symposium on Security and Privacy (SP), pp. 709–724. IEEE (2015)Google Scholar
  42. 42.
    Sharma, R., Schkufza, E., Churchill, B., Aiken, A.: Data-driven equivalence checking. In: ACM SIGPLAN Notices, vol. 48, pp. 391–406. ACM (2013)Google Scholar
  43. 43.
    Wang, X., Jhi, Y.C., Zhu, S., Liu, P.: Behavior based software theft detection. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, pp. 280–290. ACM (2009)Google Scholar
  44. 44.
    Wang, X., Jhi, Y.C., Zhu, S., Liu, P.: Detecting software theft via system call based birthmarks. In: Annual Computer Security Applications Conference 2009. ACSAC 2009, pp. 149–158. IEEE (2009)Google Scholar
  45. 45.
    Xu, D., Ming, J., Wu, D.: Cryptographic function detection in obfuscated binaries via bit-precise symbolic loop mapping. In: 2017 IEEE Symposium on Security and Privacy (SP), pp. 921–937. IEEE (2017)Google Scholar
  46. 46.
    Xu, X., Liu, C., Feng, Q., Yin, H., Song, L., Song, D.: Neural network-based graph embedding for cross-platform binary code similarity detection. In: Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 363–376. ACM (2017)Google Scholar
  47. 47.
    Xue, L., Zhou, Y., Chen, T., Luo, X., Gu, G.: Malton: towards on-device non-invasive mobile malware analysis for art. In: 26th USENIX Security Symposium (USENIX Security 17). ACM (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.University of GeorgiaAthensUSA

Personalised recommendations