Skip to main content

Lens on the Endpoint: Hunting for Malicious Software Through Endpoint Data Analysis

  • Conference paper
  • First Online:
Research in Attacks, Intrusions, and Defenses (RAID 2017)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10453))

Abstract

Organizations are facing an increasing number of criminal threats ranging from opportunistic malware to more advanced targeted attacks. While various security technologies are available to protect organizations’ perimeters, still many breaches lead to undesired consequences such as loss of proprietary information, financial burden, and reputation defacing. Recently, endpoint monitoring agents that inspect system-level activities on user machines started to gain traction and be deployed in the industry as an additional defense layer. Their application, though, in most cases is only for forensic investigation to determine the root cause of an incident.

In this paper, we demonstrate how endpoint monitoring can be proactively used for detecting and prioritizing suspicious software modules overlooked by other defenses. Compared to other environments in which host-based detection proved successful, our setting of a large enterprise introduces unique challenges, including the heterogeneous environment (users installing software of their choice), limited ground truth (small number of malicious software available for training), and coarse-grained data collection (strict requirements are imposed on agents’ performance overhead). Through applications of clustering and outlier detection algorithms, we develop techniques to identify modules with known malicious behavior, as well as modules impersonating popular benign applications. We leverage a large number of static, behavioral and contextual features in our algorithms, and new feature weighting methods that are resilient against missing attributes. The large majority of our findings are confirmed as malicious by anti-virus tools and manual investigation by experienced security analysts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated classification and analysis of internet malware. In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 178–197. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74320-0_10

    Chapter  Google Scholar 

  2. Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: Proceedings of Network and Distributed System Security Symposium, NDSS, vol. 9, pp. 8–11 (2009)

    Google Scholar 

  3. Bianchi, A., Shoshitaishvili, Y., Kruegel, C., Vigna, G.: Blacksheep: detecting compromised hosts in homogeneous crowds. In: Proceedings of ACM Conference on Computer and Communications Security, CCS, pp. 341–352. ACM (2012)

    Google Scholar 

  4. Bowers, K.D., Hart, C., Juels, A., Triandopoulos, N.: PillarBox: combating next-generation malware with fast forward-secure logging. In: Stavrou, A., Bos, H., Portokalidis, G. (eds.) RAID 2014. LNCS, vol. 8688, pp. 46–67. Springer, Cham (2014). doi:10.1007/978-3-319-11379-1_3

    Google Scholar 

  5. Canali, D., Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M., Kirda, E.: A quantitative study of accuracy in system call-based malware detection. In: Proceedings of International Symposium on Software Testing and Analysis, pp. 122–132. ACM (2012)

    Google Scholar 

  6. Chau, D.H., Nachenberg, C., Wilhelm, J., Wright, A., Faloutsos, C.: Polonium: tera-scale graph mining and inference for malware detection. In: Proceedings of SIAM International Conference on Data Mining, SDM, SIAM (2011)

    Google Scholar 

  7. Damballa: first zeus, now spyeye. look at the source code now! (2011). https://www.damballa.com/first-zeus-now-spyeye-look-the-source-code-now/

  8. Dash, M., Choi, K., Scheuermann, P., Liu, H.: Feature selection for clustering - a filter solution. In: Proceedings of International Conference on Data Mining, ICDM, pp. 115–122. IEEE (2002)

    Google Scholar 

  9. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD, pp. 226–231. ACM (1996)

    Google Scholar 

  10. Feng, H.H., Kolesnikov, O.M., Fogla, P., Lee, W., Gong, W.: Anomaly detection using call stack information. In: Proceedings of IEEE Symposium on Security and Privacy, S&P, pp. 62–75. IEEE (2003)

    Google Scholar 

  11. Gao, D., Reiter, M.K., Song, D.: Gray-box extraction of execution graphs for anomaly detection. In: Proceedings of ACM Conference on Computer and Communications Security, CCS, pp. 318–329. ACM (2004)

    Google Scholar 

  12. Gu, G., Porras, P., Yegneswaran, V., Fong, M., Lee, W.: BotHunter: detecting malware infection through IDS-driven dialog correlation. In: Proceedings of USENIX Security Symposium, SECURITY, pp. 12:1–12:16. USENIX Association (2007)

    Google Scholar 

  13. Gu, Z., Pei, K., Wang, Q., Si, L., Zhang, X., Xu, D.: LEAPS: detecting camouflaged attacks with statistical learning guided by program analysis. In: Proceedings of International Conference on Dependable Systems and Networks, DSN, pp. 57–68. IEEE/IFIP (2015)

    Google Scholar 

  14. Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2009). doi:10.1007/978-0-387-84858-7

    Book  MATH  Google Scholar 

  15. He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Proceedings of Advances in Neural Information Processing Systems, NIPS, pp. 507–514 (2005)

    Google Scholar 

  16. Hofmeyr, S.A., Forrest, S., Somayaji, A.: Intrusion detection using sequences of system calls. J. Comput. Secur. 6(3), 151–180 (1998)

    Article  Google Scholar 

  17. Hu, X., Shin, K.G.: DUET: integration of dynamic and static analyses for malware clustering with cluster ensembles. In: Proceedings of 29th Annual Computer Security Applications Conference, ACSAC, pp. 79–88 (2013)

    Google Scholar 

  18. Hu, X., Shin, K.G., Bhatkar, S., Griffin, K.: MutantX-S: scalable malware clustering based on static features. In: Proceedings of USENIX Annual Technical Conference, ATC, pp. 187–198. USENIX Association (2013)

    Google Scholar 

  19. Kolbitsch, C., Comparetti, P.M., Kruegel, C., Kirda, E., Zhou, X., Wang, X.: Effective and efficient malware detection at the end host. In: Proceedings of USENIX Security Symposium, SECURITY, pp. 351–366. USENIX Association (2009)

    Google Scholar 

  20. Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M., Kirda, E.: AccessMiner: using system-centric models for malware protection. In: Proceedings of ACM Conference on Computer and Communications Security, CCS, pp. 399–412. ACM (2010)

    Google Scholar 

  21. Lee, W., Stolfo, S.J.: Data mining approaches for intrusion detection. In: Proceedings of USENIX Security Symposium, SECURITY. USENIX Association (1998)

    Google Scholar 

  22. Lee, W., Stolfo, S.J., Chan, P.K.: Learning patterns from UNIX process execution traces for intrusion detection. In: Proceedings of AAAI Workshop on AI Approaches to Fraud Detection and Risk Management, pp. 50–56. AAAI (1997)

    Google Scholar 

  23. MANDIANT: APT1: Exposing one of China’s cyber espionage units. Report available from (2013). www.mandiant.com

  24. Mandiant Consulting: M-TRENDS 2016 (2016). https://www2.fireeye.com/rs/848-DID-242/images/Mtrends2016.pdf

  25. McAfee Labs: Diary of a “RAT” (Remote Access Tool) (2011). https://kc.mcafee.com/resources/sites/MCAFEE/content/live/PRODUCT_DOCUMENTATION/23000/PD23258/en_US/Diary_of_a_RAT_datasheet.pdf

  26. McAfee Labs: ZeroAccess Rootkit. (2013). https://kc.mcafee.com/resources/sites/MCAFEE/content/live/PRODUCT_DOCUMENTATION/23000/PD23412/en_US/McAfee

  27. Neugschwandtner, M., Comparetti, P.M., Jacob, G., Kruegel, C.: Forecast: skimming off the malware cream. In: Proceedings of 27th Annual Computer Security Applications Conference, ACSAC, pp. 11–20 (2011)

    Google Scholar 

  28. Oprea, A., Li, Z., Yen, T., Chin, S.H., Alrwais, S.A.: Detection of early-stage enterprise infection by mining large-scale log data. In: Proceedings of 45th Annual International Conference on Dependable Systems and Networks, DSN, pp. 45–56. IEEE/IFIP (2015)

    Google Scholar 

  29. Perdisci, R., Lee, W., Feamster, N.: Behavioral clustering of HTTP-based malware and signature generation using malicious network traces. In: Proceedings of Symposium on Networked Systems Design and Implementation, NSDI, pp. 391–404. USENIX Association (2010)

    Google Scholar 

  30. Rahbarinia, B., Balduzzi, M., Perdisci, R.: Real-time detection of malware downloads via large-scale URL \(\rightarrow \) file \(\rightarrow \) machine graph mining. In: Proceedings of ACM Asia Conference on Computer and Communications Security, AsiaCCS, pp. 1117–1130. ACM (2016)

    Google Scholar 

  31. Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. J. Comput. Secur. 19(4), 639–668 (2011)

    Article  Google Scholar 

  32. Sekar, R., Bendre, M., Dhurjati, D., Bollineni, P.: A fast automaton-based method for detecting anomalous program behaviors. In: Proceedings of IEEE Symposium on Security and Privacy, S&P, pp. 144–155. IEEE (2001)

    Google Scholar 

  33. Shin, S., Xu, Z., Gu, G.: EFFORT: a new host-network cooperated framework for efficient and effective bot malware detection. Comput. Networks (Elsevier) 57(13), 2628–2642 (2013)

    Article  Google Scholar 

  34. Symantec: The Rebirth Of Endpoint Security. http://www.darkreading.com/endpoint/the-rebirth-of-endpoint-security/d/d-id/1322775

  35. Tamersoy, A., Roundy, K., Chau, D.H.: Guilt by association: large scale malware detection by mining file-relation graphs. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD, pp. 1524–1533. ACM (2014)

    Google Scholar 

  36. Verizon: 2015 data breach investigations report (2015). http://www.verizonenterprise.com/DBIR/2015/

  37. Wicherski, G.: peHash: a novel approach to fast malware clustering. In: 2nd Workshop on Large-Scale Exploits and Emergent Threats. LEET, USENIX Association (2009)

    Google Scholar 

  38. Yen, T.F., Heorhiadi, V., Oprea, A., Reiter, M.K., Juels, A.: An epidemiological study of malware encounters in a large enterprise. In: Proceedings of ACM Conference on Computer and Communications Security, CCS, pp. 1117–1130. ACM (2014)

    Google Scholar 

  39. Yen, T.F., Oprea, A., Onarlioglu, K., Leetham, T., Robertson, W., Juels, A., Kirda, E.: Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks. In: Proceedings of 29th Annual Computer Security Applications Conference, ACSAC, pp. 199–208 (2013)

    Google Scholar 

  40. Zeng, Y., Hu, X., Shin, K.G.: Detection of botnets using combined host- and network-level information. In: Proceedings of International Conference on Dependable Systems and Networks, DSN, pp. 291–300. IEEE/IFIP (2010)

    Google Scholar 

Download references

Acknowledgement

We are grateful to the enterprise who permitted us access to their endpoint data for our analysis. We would like to thank Justin Lamarre, Robin Norris, Todd Leetham, and Christopher Harrington for their help with system design and evaluation of our findings, as well as Kevin Bowers and Martin Rosa for comments and suggestions on our paper. We thank our shepherd Alfonso Valdes and anonymous reviewers for their feedback on drafts of this paper. This work was supported by the National Science Foundation (NSF) under grant CNS-1409738, and Secure Business Austria.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmet Salih Buyukkayhan .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (txt 1 KB)

Appendices

A Feature Set

Our feature set includes features with different types, such as string, set, binary, and numerical attributes. Table 7 displays the full set of features used for our analysis, as well as their category and type.

Table 7. Final list of features. To note, all contextual features and numerical behavior features are computed by averaging the corresponding values across all hosts including the module.

B Case Studies

In this section, we present several detailed case studies of our findings. First, we detail two clusters of similar modules we identified, one with executable modules and another with DLLs, and we highlight the features that our new findings share with blacklisted modules. Second, we give more details on some of the detected outliers and emphasize the difference from the legitimate whitelisted modules they impersonate.

1.1 Similarity

We found 12 unknown modules all with different file names, but similar to a blacklisted module house of cards s03e01 \(\sim \) .exe. These modules impersonate popular movie or application names such as Fifty Shades of Grey \(\sim \) .exe and VCE Exam Simulator \(\sim \) .exe to deceive users. They all imported a single DLL (KERNEL32.dll) and used the same very common section names (.text, .rdata, .data, .rsrc, .reloc). One of them is even signed with a rogue certificate. Interestingly, these modules could not be grouped together only based on their static features, as these are common among other modules. However, when we consider the behavioral and contextual features, they are similar in some unusual ways. For instance, these modules write executables to a temp directory under AppData and create processes from that location. Moreover, they used the same autostart method (AutoLogon) to be persistent in the system and they reside in the same path under the ProgramData folder.

Another DLL cluster including 15 unknown and 1 blacklisted modules is intriguing as they have randomized 14-character file names (e.g. oXFV2lbFU7dgHY.x64.dll). The modules are almost identical in their features except for slightly different entropy values and creation dates. VirusTotal reported 10 of them, but different modules were detected by different number of AVs. One of them was not detected initially, but when we queried VirusTotal later the module was detected by 29 AVs. After eight months, the remaining 5 modules have not yet been detected by any AVs in VirusTotal but confirmed manually by the security analysts.

1.2 Outlier Detection

Our system identified 2 blacklisted and 3 unknown modules of services.exe as outliers. We found out that one of them was infected by ZeroAccess [26], a Trojan horse that steals personal information, replaces search results, downloads, and executes additional files. This module was confirmed by VirusTotal one week later after our detection. For the remaining two, we performed manual analysis. One of the modules has a description in Korean without a company name and signature. It has additional section names .itext, .bss, .edata, .tls compared to the legitimate process. The module imports some common DLLs such as kernel32 .dll, user32.dll, oleaut32.dll, but also imports shell32.dll and wsock32.dll, which is unusual for benign variants of services.exe modules. In addition, the module size is \(\sim \)1 MB whereas other whitelisted modules have sizes between 110 KB to 417 KB. Unfortunately, no behavior features were captured in this module but it has several suspicious contextual features. The module is installed in only a single machine with hidden attributes and it is located in C:\Windows\winservice instead of C:\Windows\System32. The second detected services.exe module is missing the signature field and imports different set of DLLs. Even though the module is 32 bit, the DLLs it imports are usually included in 64-bit versions of benign services.exe. It also has some suspicious contextual features since it is installed only in a single machine relatively recently and its file system path is \(\sim \) \Download\ffadecffa baffc instead of the usual C:\Windows\System32. Both of these modules were confirmed as malicious by security experts in the organization.

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Buyukkayhan, A.S., Oprea, A., Li, Z., Robertson, W. (2017). Lens on the Endpoint: Hunting for Malicious Software Through Endpoint Data Analysis. In: Dacier, M., Bailey, M., Polychronakis, M., Antonakakis, M. (eds) Research in Attacks, Intrusions, and Defenses. RAID 2017. Lecture Notes in Computer Science(), vol 10453. Springer, Cham. https://doi.org/10.1007/978-3-319-66332-6_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-66332-6_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-66331-9

  • Online ISBN: 978-3-319-66332-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics