Lens on the Endpoint: Hunting for Malicious Software Through Endpoint Data Analysis

Buyukkayhan, Ahmet Salih; Oprea, Alina; Li, Zhou; Robertson, William

doi:10.1007/978-3-319-66332-6_4

Ahmet Salih Buyukkayhan¹⁷,
Alina Oprea¹⁷,
Zhou Li¹⁸ &
…
William Robertson¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 10453))

Included in the following conference series:

International Symposium on Research in Attacks, Intrusions, and Defenses

2491 Accesses
6 Citations

Abstract

Organizations are facing an increasing number of criminal threats ranging from opportunistic malware to more advanced targeted attacks. While various security technologies are available to protect organizations’ perimeters, still many breaches lead to undesired consequences such as loss of proprietary information, financial burden, and reputation defacing. Recently, endpoint monitoring agents that inspect system-level activities on user machines started to gain traction and be deployed in the industry as an additional defense layer. Their application, though, in most cases is only for forensic investigation to determine the root cause of an incident.

In this paper, we demonstrate how endpoint monitoring can be proactively used for detecting and prioritizing suspicious software modules overlooked by other defenses. Compared to other environments in which host-based detection proved successful, our setting of a large enterprise introduces unique challenges, including the heterogeneous environment (users installing software of their choice), limited ground truth (small number of malicious software available for training), and coarse-grained data collection (strict requirements are imposed on agents’ performance overhead). Through applications of clustering and outlier detection algorithms, we develop techniques to identify modules with known malicious behavior, as well as modules impersonating popular benign applications. We leverage a large number of static, behavioral and contextual features in our algorithms, and new feature weighting methods that are resilient against missing attributes. The large majority of our findings are confirmed as malicious by anti-virus tools and manual investigation by experienced security analysts.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bailey, M., Oberheide, J., Andersen, J., Mao, Z.M., Jahanian, F., Nazario, J.: Automated classification and analysis of internet malware. In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 178–197. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74320-0_10
Chapter Google Scholar
Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: Proceedings of Network and Distributed System Security Symposium, NDSS, vol. 9, pp. 8–11 (2009)
Google Scholar
Bianchi, A., Shoshitaishvili, Y., Kruegel, C., Vigna, G.: Blacksheep: detecting compromised hosts in homogeneous crowds. In: Proceedings of ACM Conference on Computer and Communications Security, CCS, pp. 341–352. ACM (2012)
Google Scholar
Bowers, K.D., Hart, C., Juels, A., Triandopoulos, N.: PillarBox: combating next-generation malware with fast forward-secure logging. In: Stavrou, A., Bos, H., Portokalidis, G. (eds.) RAID 2014. LNCS, vol. 8688, pp. 46–67. Springer, Cham (2014). doi:10.1007/978-3-319-11379-1_3
Google Scholar
Canali, D., Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M., Kirda, E.: A quantitative study of accuracy in system call-based malware detection. In: Proceedings of International Symposium on Software Testing and Analysis, pp. 122–132. ACM (2012)
Google Scholar
Chau, D.H., Nachenberg, C., Wilhelm, J., Wright, A., Faloutsos, C.: Polonium: tera-scale graph mining and inference for malware detection. In: Proceedings of SIAM International Conference on Data Mining, SDM, SIAM (2011)
Google Scholar
Damballa: first zeus, now spyeye. look at the source code now! (2011). https://www.damballa.com/first-zeus-now-spyeye-look-the-source-code-now/
Dash, M., Choi, K., Scheuermann, P., Liu, H.: Feature selection for clustering - a filter solution. In: Proceedings of International Conference on Data Mining, ICDM, pp. 115–122. IEEE (2002)
Google Scholar
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of 2nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD, pp. 226–231. ACM (1996)
Google Scholar
Feng, H.H., Kolesnikov, O.M., Fogla, P., Lee, W., Gong, W.: Anomaly detection using call stack information. In: Proceedings of IEEE Symposium on Security and Privacy, S&P, pp. 62–75. IEEE (2003)
Google Scholar
Gao, D., Reiter, M.K., Song, D.: Gray-box extraction of execution graphs for anomaly detection. In: Proceedings of ACM Conference on Computer and Communications Security, CCS, pp. 318–329. ACM (2004)
Google Scholar
Gu, G., Porras, P., Yegneswaran, V., Fong, M., Lee, W.: BotHunter: detecting malware infection through IDS-driven dialog correlation. In: Proceedings of USENIX Security Symposium, SECURITY, pp. 12:1–12:16. USENIX Association (2007)
Google Scholar
Gu, Z., Pei, K., Wang, Q., Si, L., Zhang, X., Xu, D.: LEAPS: detecting camouflaged attacks with statistical learning guided by program analysis. In: Proceedings of International Conference on Dependable Systems and Networks, DSN, pp. 57–68. IEEE/IFIP (2015)
Google Scholar
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York (2009). doi:10.1007/978-0-387-84858-7
Book MATH Google Scholar
He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Proceedings of Advances in Neural Information Processing Systems, NIPS, pp. 507–514 (2005)
Google Scholar
Hofmeyr, S.A., Forrest, S., Somayaji, A.: Intrusion detection using sequences of system calls. J. Comput. Secur. 6(3), 151–180 (1998)
Article Google Scholar
Hu, X., Shin, K.G.: DUET: integration of dynamic and static analyses for malware clustering with cluster ensembles. In: Proceedings of 29th Annual Computer Security Applications Conference, ACSAC, pp. 79–88 (2013)
Google Scholar
Hu, X., Shin, K.G., Bhatkar, S., Griffin, K.: MutantX-S: scalable malware clustering based on static features. In: Proceedings of USENIX Annual Technical Conference, ATC, pp. 187–198. USENIX Association (2013)
Google Scholar
Kolbitsch, C., Comparetti, P.M., Kruegel, C., Kirda, E., Zhou, X., Wang, X.: Effective and efficient malware detection at the end host. In: Proceedings of USENIX Security Symposium, SECURITY, pp. 351–366. USENIX Association (2009)
Google Scholar
Lanzi, A., Balzarotti, D., Kruegel, C., Christodorescu, M., Kirda, E.: AccessMiner: using system-centric models for malware protection. In: Proceedings of ACM Conference on Computer and Communications Security, CCS, pp. 399–412. ACM (2010)
Google Scholar
Lee, W., Stolfo, S.J.: Data mining approaches for intrusion detection. In: Proceedings of USENIX Security Symposium, SECURITY. USENIX Association (1998)
Google Scholar
Lee, W., Stolfo, S.J., Chan, P.K.: Learning patterns from UNIX process execution traces for intrusion detection. In: Proceedings of AAAI Workshop on AI Approaches to Fraud Detection and Risk Management, pp. 50–56. AAAI (1997)
Google Scholar
MANDIANT: APT1: Exposing one of China’s cyber espionage units. Report available from (2013). www.mandiant.com
Mandiant Consulting: M-TRENDS 2016 (2016). https://www2.fireeye.com/rs/848-DID-242/images/Mtrends2016.pdf
McAfee Labs: Diary of a “RAT” (Remote Access Tool) (2011). https://kc.mcafee.com/resources/sites/MCAFEE/content/live/PRODUCT_DOCUMENTATION/23000/PD23258/en_US/Diary_of_a_RAT_datasheet.pdf
McAfee Labs: ZeroAccess Rootkit. (2013). https://kc.mcafee.com/resources/sites/MCAFEE/content/live/PRODUCT_DOCUMENTATION/23000/PD23412/en_US/McAfee
Neugschwandtner, M., Comparetti, P.M., Jacob, G., Kruegel, C.: Forecast: skimming off the malware cream. In: Proceedings of 27th Annual Computer Security Applications Conference, ACSAC, pp. 11–20 (2011)
Google Scholar
Oprea, A., Li, Z., Yen, T., Chin, S.H., Alrwais, S.A.: Detection of early-stage enterprise infection by mining large-scale log data. In: Proceedings of 45th Annual International Conference on Dependable Systems and Networks, DSN, pp. 45–56. IEEE/IFIP (2015)
Google Scholar
Perdisci, R., Lee, W., Feamster, N.: Behavioral clustering of HTTP-based malware and signature generation using malicious network traces. In: Proceedings of Symposium on Networked Systems Design and Implementation, NSDI, pp. 391–404. USENIX Association (2010)
Google Scholar
Rahbarinia, B., Balduzzi, M., Perdisci, R.: Real-time detection of malware downloads via large-scale URL \(\rightarrow \) file \(\rightarrow \) machine graph mining. In: Proceedings of ACM Asia Conference on Computer and Communications Security, AsiaCCS, pp. 1117–1130. ACM (2016)
Google Scholar
Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. J. Comput. Secur. 19(4), 639–668 (2011)
Article Google Scholar
Sekar, R., Bendre, M., Dhurjati, D., Bollineni, P.: A fast automaton-based method for detecting anomalous program behaviors. In: Proceedings of IEEE Symposium on Security and Privacy, S&P, pp. 144–155. IEEE (2001)
Google Scholar
Shin, S., Xu, Z., Gu, G.: EFFORT: a new host-network cooperated framework for efficient and effective bot malware detection. Comput. Networks (Elsevier) 57(13), 2628–2642 (2013)
Article Google Scholar
Symantec: The Rebirth Of Endpoint Security. http://www.darkreading.com/endpoint/the-rebirth-of-endpoint-security/d/d-id/1322775
Tamersoy, A., Roundy, K., Chau, D.H.: Guilt by association: large scale malware detection by mining file-relation graphs. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD, pp. 1524–1533. ACM (2014)
Google Scholar
Verizon: 2015 data breach investigations report (2015). http://www.verizonenterprise.com/DBIR/2015/
Wicherski, G.: peHash: a novel approach to fast malware clustering. In: 2nd Workshop on Large-Scale Exploits and Emergent Threats. LEET, USENIX Association (2009)
Google Scholar
Yen, T.F., Heorhiadi, V., Oprea, A., Reiter, M.K., Juels, A.: An epidemiological study of malware encounters in a large enterprise. In: Proceedings of ACM Conference on Computer and Communications Security, CCS, pp. 1117–1130. ACM (2014)
Google Scholar
Yen, T.F., Oprea, A., Onarlioglu, K., Leetham, T., Robertson, W., Juels, A., Kirda, E.: Beehive: large-scale log analysis for detecting suspicious activity in enterprise networks. In: Proceedings of 29th Annual Computer Security Applications Conference, ACSAC, pp. 199–208 (2013)
Google Scholar
Zeng, Y., Hu, X., Shin, K.G.: Detection of botnets using combined host- and network-level information. In: Proceedings of International Conference on Dependable Systems and Networks, DSN, pp. 291–300. IEEE/IFIP (2010)
Google Scholar

Download references

Acknowledgement

We are grateful to the enterprise who permitted us access to their endpoint data for our analysis. We would like to thank Justin Lamarre, Robin Norris, Todd Leetham, and Christopher Harrington for their help with system design and evaluation of our findings, as well as Kevin Bowers and Martin Rosa for comments and suggestions on our paper. We thank our shepherd Alfonso Valdes and anonymous reviewers for their feedback on drafts of this paper. This work was supported by the National Science Foundation (NSF) under grant CNS-1409738, and Secure Business Austria.

Author information

Authors and Affiliations

Northeastern University, Boston, MA, USA
Ahmet Salih Buyukkayhan, Alina Oprea & William Robertson
RSA Laboratories, Bedford, MA, USA
Zhou Li

Authors

Ahmet Salih Buyukkayhan
View author publications
You can also search for this author in PubMed Google Scholar
Alina Oprea
View author publications
You can also search for this author in PubMed Google Scholar
Zhou Li
View author publications
You can also search for this author in PubMed Google Scholar
William Robertson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmet Salih Buyukkayhan .

Editor information

Editors and Affiliations

Qatar Computing Research Institute, Doha, Qatar
Marc Dacier
University of Illinois at Urbana Champaign, Champaign, Illinois, USA
Michael Bailey
Stony Brook University, Stony Brook, New York, USA
Michalis Polychronakis
Georgia Institute of Technology, Georgia, USA
Manos Antonakakis

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (txt 1 KB)

Appendices

A Feature Set

Our feature set includes features with different types, such as string, set, binary, and numerical attributes. Table 7 displays the full set of features used for our analysis, as well as their category and type.

Table 7. Final list of features. To note, all contextual features and numerical behavior features are computed by averaging the corresponding values across all hosts including the module.

Full size table

B Case Studies

In this section, we present several detailed case studies of our findings. First, we detail two clusters of similar modules we identified, one with executable modules and another with DLLs, and we highlight the features that our new findings share with blacklisted modules. Second, we give more details on some of the detected outliers and emphasize the difference from the legitimate whitelisted modules they impersonate.

1.1 Similarity

We found 12 unknown modules all with different file names, but similar to a blacklisted module house of cards s03e01 \(\sim \) .exe. These modules impersonate popular movie or application names such as Fifty Shades of Grey \(\sim \) .exe and VCE Exam Simulator \(\sim \) .exe to deceive users. They all imported a single DLL (KERNEL32.dll) and used the same very common section names (.text, .rdata, .data, .rsrc, .reloc). One of them is even signed with a rogue certificate. Interestingly, these modules could not be grouped together only based on their static features, as these are common among other modules. However, when we consider the behavioral and contextual features, they are similar in some unusual ways. For instance, these modules write executables to a temp directory under AppData and create processes from that location. Moreover, they used the same autostart method (AutoLogon) to be persistent in the system and they reside in the same path under the ProgramData folder.

Another DLL cluster including 15 unknown and 1 blacklisted modules is intriguing as they have randomized 14-character file names (e.g. oXFV2lbFU7dgHY.x64.dll). The modules are almost identical in their features except for slightly different entropy values and creation dates. VirusTotal reported 10 of them, but different modules were detected by different number of AVs. One of them was not detected initially, but when we queried VirusTotal later the module was detected by 29 AVs. After eight months, the remaining 5 modules have not yet been detected by any AVs in VirusTotal but confirmed manually by the security analysts.

1.2 Outlier Detection

Our system identified 2 blacklisted and 3 unknown modules of services.exe as outliers. We found out that one of them was infected by ZeroAccess [26], a Trojan horse that steals personal information, replaces search results, downloads, and executes additional files. This module was confirmed by VirusTotal one week later after our detection. For the remaining two, we performed manual analysis. One of the modules has a description in Korean without a company name and signature. It has additional section names .itext, .bss, .edata, .tls compared to the legitimate process. The module imports some common DLLs such as kernel32 .dll, user32.dll, oleaut32.dll, but also imports shell32.dll and wsock32.dll, which is unusual for benign variants of services.exe modules. In addition, the module size is \(\sim \)1 MB whereas other whitelisted modules have sizes between 110 KB to 417 KB. Unfortunately, no behavior features were captured in this module but it has several suspicious contextual features. The module is installed in only a single machine with hidden attributes and it is located in C:\Windows\winservice instead of C:\Windows\System32. The second detected services.exe module is missing the signature field and imports different set of DLLs. Even though the module is 32 bit, the DLLs it imports are usually included in 64-bit versions of benign services.exe. It also has some suspicious contextual features since it is installed only in a single machine relatively recently and its file system path is \(\sim \) \Download\ffadecffa baffc instead of the usual C:\Windows\System32. Both of these modules were confirmed as malicious by security experts in the organization.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Buyukkayhan, A.S., Oprea, A., Li, Z., Robertson, W. (2017). Lens on the Endpoint: Hunting for Malicious Software Through Endpoint Data Analysis. In: Dacier, M., Bailey, M., Polychronakis, M., Antonakakis, M. (eds) Research in Attacks, Intrusions, and Defenses. RAID 2017. Lecture Notes in Computer Science(), vol 10453. Springer, Cham. https://doi.org/10.1007/978-3-319-66332-6_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-66332-6_4
Published: 12 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66331-9
Online ISBN: 978-3-319-66332-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics