Abstract
The number of software vulnerabilities discovered and publicly disclosed is increasing every year; however, only a small fraction of these vulnerabilities are exploited in real-world attacks. With limitations on time and skilled resources, organizations often look at ways to identify threatened vulnerabilities for patch prioritization. In this chapter, an exploit prediction model is presented, which predicts whether a vulnerability will likely be exploited. Our proposed model leverages data from a variety of online data sources (white hat community, vulnerability research community, and dark web/deep web (DW) websites) with vulnerability mentions. Compared to the standard scoring system (CVSS base score) and a benchmark model that leverages Twitter data in exploit prediction, our model outperforms the baseline models with an F1 measure of 0.40 on the minority class (266% improvement over CVSS base score) and also achieves high true positive rate and low false positive rate (90%, 13%, respectively), making it highly effective as an early predictor of exploits that could appear in the wild. A qualitative and a quantitative study are also conducted to investigate whether the likelihood of exploitation increases if a vulnerability is mentioned in each of the examined data sources. The proposed model is proven to be much more robust than adversarial examples—postings authored by adversaries in the attempt to induce the model to produce incorrect predictions. A discussion on the viability of the model is provided, showing cases where the classifier achieves high performance, and other cases where the classifier performs less efficiently.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
Ethical (white hat) hacker is a person who practices hacking activities against some computer network to identify its weaknesses and assess its security, rather than having malicious intent or seeking personal gain.
- 11.
- 12.
- 13.
- 14.
An MSSP is a service provider that provides its clients with tools that continuously monitor and manage wide range of cybersecurity-related activities and operations, which may include threat intelligence, virus and spam blocking, and vulnerability and risk assessment.
- 15.
- 16.
- 17.
- 18.
- 19.
TPR is a metric that measures the proportion of exploited vulnerabilities that are correctly predicted from all exploited vulnerabilities.
- 20.
FPR is a metric that measures the proportion of non-exploited vulnerabilities that are incorrectly predicted as being exploited from the total number of all non-exploited vulnerabilities.
- 21.
- 22.
Twitter posts, called tweets, are limited to 280 characters.
- 23.
Note that these metrics are sensitive to the underlying class distribution and sensitive to the ratio of class rebalancing.
- 24.
- 25.
https://www.securityfocus.com There are many examples where attack signatures are reported by Symantec, but not reported by SecurityFocus. Also, there are vulnerabilities SecurityFocus reports as exploited, and those exist in software whose vendors are well-covered by Symantec, yet Symantec does not report them.
- 26.
- 27.
- 28.
- 29.
- 30.
- 31.
- 32.
The harmonic mean of precision and recall.
- 33.
- 34.
References
Pfleeger CP, Pfleeger SL, Margulies J (2015) Security in computing, 5th edn. Prentice Hall, Upper Saddle River, NJ, USA
Bilge L, Dumitras T (2012) Before we knew it: an empirical study of zero-day attacks in the real world. In: Yu T, Danezis G, Gligor V (eds) Proceedings of the 2012 ACM Conference on Computer and Communications Security. ACM, New York, pp 833–844. https://doi.org/10.1145/2382196.2382284
Frei S, Schatzmann D, Plattner B, Trammell B (2010) Modeling the security ecosystem–The dynamics of (in)security. In: Moore T, Pym D, Ioannidis C (eds) Economics of information security and privacy. Springer, Boston, pp 79–106. https://doi.org/10.1007/978-1-4419-6967-5_6
Allodi L, Massacci F (2014) Comparing vulnerability severity and exploits using case-control studies. ACM Trans Inform Syst Secur 17(1), Article No. 1. https://doi.org/10.1145/2630069
Durumeric Z, Kasten J, Adrian D, Halderman JA, Bailey M, Li F, Weaver N, Amann J, Beekman J, Payer M, Weaver N, Adrian D, Paxson V, Bailey M, Halderman JA (2014) The matter of Heartbleed. In: Williamson C, Akella A, Taft N (eds) Proceedings of the 2014 Conference on Internet Measurement Conference. ACM, New York, pp 475–488. https://doi.org/10.1145/2663716.2663755
Edkrantz M, Said A (2015) Predicting cyber vulnerability exploits with machine learning. In: Thirteenth Scandinavian Conference on Artificial Intelligence, pp 48–57. https://doi.org/10.3233/978-1-61499-589-0-48
Nayak K, Marino D, Efstathopoulos P, Dumitraş T (2014) Some vulnerabilities are different than others. In: Stavrou A, Bos H, Portokalidis G (eds) Research in attacks, intrusions and defenses. Springer, Cham, pp 426–446. https://doi.org/10.1007/978-3-319-11379-1_21
Sabottke C, Suciu O, Dumitras T (2015) Vulnerability disclosure in the age of social media: exploiting Twitter for predicting real-world exploits. In: Proceedings of the 24th USENIX Security Symposium. USENIX Association, Berkeley, CA, USA, pp 1041–1056. https://www.usenix.org/sites/default/files/sec15_full_proceedings.pdf
Allodi L, Massacci F (2012) A preliminary analysis of vulnerability scores for attacks in wild: the EKITS and SYM datasets. In: Yu T, Christodorescu M (eds) Proceedings of the 2012 ACM Workshop on Building Analysis Datasets and Gathering Experience Returns for Security. ACM, New York, pp 17–24. https://doi.org/10.1145/2382416.2382427
Mittal S, Das PK, Mulwad V, Joshi A, Finin T (2016) CyberTwitter: using Twitter to generate alerts for cybersecurity threats and vulnerabilities. In: Subrahmanian VS, Rokne J, Kimar R, Caverlee J, Tong H (eds) Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE Press, Piscataway, NJ, USA, pp 860–867
Marin E, Diab A, Shakarian P (2016) Product offerings in malicious hacker markets. In: Zhou L, Kaati L, Mao W, Wang GA (eds) Proceedings of the 2016 IEEE Conference on Intelligence and Security Informatics. The Printing House, Stoughton, WI, USA, pp 187–189. https://doi.org/10.1109/ISI.2016.7745465
Samtani S, Chinn K, Larson C, Chen H (2016) AZSecure hacker assets portal: cyber threat intelligence and malware analysis. In: Zhou L, Kaati L, Mao W, Wang GA (eds) Proceedings of the 2016 IEEE Conference on Intelligence and Security Informatics. The Printing House, Stoughton, WI, USA, pp 19–24. https://doi.org/10.1109/ISI.2016.7745437
Allodi L (2017) Economic factors of vulnerability trade and exploitation. In: Thuraisingham B, Evans D, Malkin T, Xu D (eds) Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, New York, pp 1483–1499. https://doi.org/10.1145/3133956.3133960
Bullough BL, Yanchenko AK, Smith CL, Zipkin JR (2017) Predicting exploitation of disclosed software vulnerabilities using open-source data. In: Verma R, Thuraisingham B (eds) Proceedings of the 3rd ACM on International Workshop on Security and Privacy Analytics. ACM, New York, pp 45–53. https://doi.org/10.1145/3041008.3041009
Allodi L, Shim W, Massacci F (2013) Quantitative assessment of risk reduction with cybercrime black market monitoring. In: 2013 IEEE Security and Privacy Workshops. IEEE Computer Society, Los Alamitos, CA, USA, pp 165–172. https://doi.org/10.1109/SPW.2013.16
Bozorgi M, Saul LK, Savage S, Voelker GM (2010) Beyond heuristics: learning to classify vulnerabilities and predict exploits. In: Rao B, Krishnapuram B, Tomkins A, Yang Q (eds) Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, pp 105–114. https://doi.org/10.1145/1835804.1835821
Motoyama M, McCoy D, Levchenko K, Savage S, Voelker GM (2011) An analysis of underground forums. In: Thiran P, Willinger W (eds) Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement. ACM, New York, pp 71–80. https://doi.org/10.1145/2068816.2068824
Holt TJ, Lampke E (2010) Exploring stolen data markets online: products and market forces. Crim Justice Stud 23(1):33–50. https://doi.org/10.1080/14786011003634415
Shakarian J, Gunn AT, Shakarian P (2016) Exploring malicious hacker forums. In: Jajodia S, Subrahmanian V, Swarup V, Wang C (eds) Cyber deception. Springer, Cham, pp 259–282. https://doi.org/10.1007/978-3-319-32699-3_11
Nunes E, Diab A, Gunn A, Marin E, Mishra V, Paliath V, Robertson J, Shakarian J, Thart A, Shakarian P (2016) Darknet and deepnet mining for proactive cybersecurity threat intelligence. In: Chen H, Hariri S, Thuraisingham B, Zeng D (eds) Proceedings of the 2016 IEEE Conference on Intelligence and Security Informatics, pp 7–12. https://doi.org/10.1109/ISI.2016.7745435
Robertson J, Diab A, Marin E, Nunes E, Paliath V, Shakarian J, Shakarian P (2017) Darkweb cyber threat intelligence mining. Cambridge University Press, New York. https://doi.org/10.1017/9781316888513
Liu Y, Sarabi A, Zhang J, Naghizadeh P, Karir M, Bailey M, Liu M (2015) Cloudy with a chance of breach: forecasting cyber security incidents. In: Proceedings of the 24th USENIX Security Symposium. USENIX Association, Berkeley, CA, USA, pp 1009–1024. https://www.usenix.org/sites/default/files/sec15_full_proceedings.pdf
Soska N, Christin K (2014) Automatically detecting vulnerable websites before they turn malicious. In: Proceedings of the 23rd USENIX Security Symposium. USENIX Association, Berkeley, CA, USA, pp 625–640. https://www.usenix.org/sites/default/files/sec14_full_proceedings.pdf
Almukaynizi M, Nunes E, Dharaiya K, Senguttuvan M, Shakarian J, Shakarian P (2017) Proactive identification of exploits in the wild through vulnerability mentions online. In: Sobiesk E, Bennett D, Maxwell P (eds) Proceedings of the 2017 International Conference on Cyber Conflict. Curran Associates, Red Hook, NY, USA, pp 82–88. https://doi.org/10.1109/CYCONUS.2017.8167501
Zhang S, Caragea D, Ou X (2011) An empirical study on using the national vulnerability database to predict software vulnerabilities. In: Hameurlain A, Liddle SW, Schewe KD, Zhou X (eds) Database and expert systems applications. Springer, Heidelberg, pp 217–231. https://doi.org/10.1007/978-3-642-23088-2_15
Hao S, Kantchelian A, Miller B, Paxson V, Feamster N (2016) PREDATOR: proactive recognition and elimination of domain abuse at time-of-registration. In: Weippl E, Katzenbeisser S, Kruegel C, Myers A, Halevi S (eds) Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, New York, pp 1568-1579. https://doi.org/10.1145/2976749.2978317
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. https://doi.org/10.1023/A:1022627411411
Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Int Res 16(1):321–357. https://doi.org/10.1613/jair.953
Allodi L, Massacci F, Williams JM (2017) The work-averse cyber attacker model: theory and evidence from two million attack signatures. https://doi.org/10.2139/ssrn.2862299
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1007/BF00058655
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830
Guo D, Shamai S, Verdu S (2005) Mutual information and minimum mean-square error in Gaussian channels. IEEE Trans Inform Theory 51(4):1261–1282. https://doi.org/10.1109/TIT.2005.844072
Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approach. IEEE Trans Syst Man Cybern C 42(4):463–484. https://doi.org/10.1109/TSMCC.2011.2161285
Barreno M, Bartlett PL, Chi FJ, Joseph AD, Nelson B, Rubinstein BIP, Saini U, Tygar JD (2008) Open problems in the security of learning. In: Balfanz D, Staddon J (eds) Proceedings of the 1st ACM Workshop on AISec. ACM, New York, pp 19–26. https://doi.org/10.1145/1456377.1456382
Barreno M, Nelson B, Joseph AD, Tygar J (2010) The security of machine learning. Mach Learn 81(2):121–148. https://doi.org/10.1007/s10994-010-5188-5
Biggio B, Nelson B, Laskov P (2011) Support vector machines under adversarial label noise. In: Hsu C-N, Lee WS (eds) Proceedings of the 3rd Asian Conference on Machine Learning, pp 97–112. http://www.jmlr.org/proceedings/papers/v20/biggio11/biggio11.pdf
Acknowledgements
Some of the authors were supported by the Office of Naval Research (ONR) contract N00014-15-1-2742, the Office of Naval Research (ONR) Neptune program and the ASU Global Security Initiative (GSI). Paulo Shakarian and Jana Shakarian are supported by the Office of the Director of National Intelligence (ODNI) and the Intelligence Advanced Research Projects Activity (IARPA) via the Air Force Research Laboratory (AFRL) contract number FA8750-16-C-0112. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of ODNI, IARPA, AFRL, or the U.S. Government.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Almukaynizi, M., Nunes, E., Dharaiya, K., Senguttuvan, M., Shakarian, J., Shakarian, P. (2019). Patch Before Exploited: An Approach to Identify Targeted Software Vulnerabilities. In: Sikos, L. (eds) AI in Cybersecurity. Intelligent Systems Reference Library, vol 151. Springer, Cham. https://doi.org/10.1007/978-3-319-98842-9_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-98842-9_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98841-2
Online ISBN: 978-3-319-98842-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)