Skip to main content

Patch Before Exploited: An Approach to Identify Targeted Software Vulnerabilities

  • Chapter
  • First Online:
AI in Cybersecurity

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 151))

Abstract

The number of software vulnerabilities discovered and publicly disclosed is increasing every year; however, only a small fraction of these vulnerabilities are exploited in real-world attacks. With limitations on time and skilled resources, organizations often look at ways to identify threatened vulnerabilities for patch prioritization. In this chapter, an exploit prediction model is presented, which predicts whether a vulnerability will likely be exploited. Our proposed model leverages data from a variety of online data sources (white hat community, vulnerability research community, and dark web/deep web (DW) websites) with vulnerability mentions. Compared to the standard scoring system (CVSS base score) and a benchmark model that leverages Twitter data in exploit prediction, our model outperforms the baseline models with an F1 measure of 0.40 on the minority class (266% improvement over CVSS base score) and also achieves high true positive rate and low false positive rate (90%, 13%, respectively), making it highly effective as an early predictor of exploits that could appear in the wild. A qualitative and a quantitative study are also conducted to investigate whether the likelihood of exploitation increases if a vulnerability is mentioned in each of the examined data sources. The proposed model is proven to be much more robust than adversarial examples—postings authored by adversaries in the attempt to induce the model to produce incorrect predictions. A discussion on the viability of the model is provided, showing cases where the classifier achieves high performance, and other cases where the classifier performs less efficiently.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.nist.gov

  2. 2.

    https://nvd.nist.gov

  3. 3.

    https://nvd.nist.gov/cpe.cfm

  4. 4.

    https://nvd.nist.gov/vuln-metrics/cvss

  5. 5.

    https://technet.microsoft.com/en-us/security/cc998259.aspx

  6. 6.

    https://helpx.adobe.com/security/severity-ratings.html

  7. 7.

    https://www.cisco.com/c/dam/m/en_ca/never-better/assets/files/midyear-security-report-2016.pdf

  8. 8.

    http://heartbleed.com

  9. 9.

    https://www.openssl.org

  10. 10.

    Ethical (white hat) hacker is a person who practices hacking activities against some computer network to identify its weaknesses and assess its security, rather than having malicious intent or seeking personal gain.

  11. 11.

    http://contagiodump.blogspot.com

  12. 12.

    http://www.securityfocus.com

  13. 13.

    https://www.talosintelligence.com/vulnerability_reports

  14. 14.

    An MSSP is a service provider that provides its clients with tools that continuously monitor and manage wide range of cybersecurity-related activities and operations, which may include threat intelligence, virus and spam blocking, and vulnerability and risk assessment.

  15. 15.

    https://www.exploit-db.com

  16. 16.

    http://www.zerodayinitiative.com

  17. 17.

    https://www.cyr3con.ai

  18. 18.

    https://www.symantec.com

  19. 19.

    TPR is a metric that measures the proportion of exploited vulnerabilities that are correctly predicted from all exploited vulnerabilities.

  20. 20.

    FPR is a metric that measures the proportion of non-exploited vulnerabilities that are incorrectly predicted as being exploited from the total number of all non-exploited vulnerabilities.

  21. 21.

    https://twitter.com

  22. 22.

    Twitter posts, called tweets, are limited to 280 characters.

  23. 23.

    Note that these metrics are sensitive to the underlying class distribution and sensitive to the ratio of class rebalancing.

  24. 24.

    https://www.virustotal.com

  25. 25.

    https://www.securityfocus.com There are many examples where attack signatures are reported by Symantec, but not reported by SecurityFocus. Also, there are vulnerabilities SecurityFocus reports as exploited, and those exist in software whose vendors are well-covered by Symantec, yet Symantec does not report them.

  26. 26.

    https://www.offensive-security.com

  27. 27.

    https://technet.microsoft.com/en-us/security/bulletins.aspx

  28. 28.

    https://www.symantec.com/security-center/a-z

  29. 29.

    https://www.symantec.com/security_response/attacksignatures/

  30. 30.

    https://cloud.google.com/translate/docs

  31. 31.

    https://www.adobe.com/products/flashplayer.html

  32. 32.

    The harmonic mean of precision and recall.

  33. 33.

    http://scikit-learn.org

  34. 34.

    https://nvd.nist.gov/vuln/detail/CVE-2015-3350

References

  1. Pfleeger CP, Pfleeger SL, Margulies J (2015) Security in computing, 5th edn. Prentice Hall, Upper Saddle River, NJ, USA

    Google Scholar 

  2. Bilge L, Dumitras T (2012) Before we knew it: an empirical study of zero-day attacks in the real world. In: Yu T, Danezis G, Gligor V (eds) Proceedings of the 2012 ACM Conference on Computer and Communications Security. ACM, New York, pp 833–844. https://doi.org/10.1145/2382196.2382284

  3. Frei S, Schatzmann D, Plattner B, Trammell B (2010) Modeling the security ecosystem–The dynamics of (in)security. In: Moore T, Pym D, Ioannidis C (eds) Economics of information security and privacy. Springer, Boston, pp 79–106. https://doi.org/10.1007/978-1-4419-6967-5_6

    Chapter  Google Scholar 

  4. Allodi L, Massacci F (2014) Comparing vulnerability severity and exploits using case-control studies. ACM Trans Inform Syst Secur 17(1), Article No. 1. https://doi.org/10.1145/2630069

    Article  Google Scholar 

  5. Durumeric Z, Kasten J, Adrian D, Halderman JA, Bailey M, Li F, Weaver N, Amann J, Beekman J, Payer M, Weaver N, Adrian D, Paxson V, Bailey M, Halderman JA (2014) The matter of Heartbleed. In: Williamson C, Akella A, Taft N (eds) Proceedings of the 2014 Conference on Internet Measurement Conference. ACM, New York, pp 475–488. https://doi.org/10.1145/2663716.2663755

  6. Edkrantz M, Said A (2015) Predicting cyber vulnerability exploits with machine learning. In: Thirteenth Scandinavian Conference on Artificial Intelligence, pp 48–57. https://doi.org/10.3233/978-1-61499-589-0-48

  7. Nayak K, Marino D, Efstathopoulos P, Dumitraş T (2014) Some vulnerabilities are different than others. In: Stavrou A, Bos H, Portokalidis G (eds) Research in attacks, intrusions and defenses. Springer, Cham, pp 426–446. https://doi.org/10.1007/978-3-319-11379-1_21

    Google Scholar 

  8. Sabottke C, Suciu O, Dumitras T (2015) Vulnerability disclosure in the age of social media: exploiting Twitter for predicting real-world exploits. In: Proceedings of the 24th USENIX Security Symposium. USENIX Association, Berkeley, CA, USA, pp 1041–1056. https://www.usenix.org/sites/default/files/sec15_full_proceedings.pdf

  9. Allodi L, Massacci F (2012) A preliminary analysis of vulnerability scores for attacks in wild: the EKITS and SYM datasets. In: Yu T, Christodorescu M (eds) Proceedings of the 2012 ACM Workshop on Building Analysis Datasets and Gathering Experience Returns for Security. ACM, New York, pp 17–24. https://doi.org/10.1145/2382416.2382427

  10. Mittal S, Das PK, Mulwad V, Joshi A, Finin T (2016) CyberTwitter: using Twitter to generate alerts for cybersecurity threats and vulnerabilities. In: Subrahmanian VS, Rokne J, Kimar R, Caverlee J, Tong H (eds) Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE Press, Piscataway, NJ, USA, pp 860–867

    Google Scholar 

  11. Marin E, Diab A, Shakarian P (2016) Product offerings in malicious hacker markets. In: Zhou L, Kaati L, Mao W, Wang GA (eds) Proceedings of the 2016 IEEE Conference on Intelligence and Security Informatics. The Printing House, Stoughton, WI, USA, pp 187–189. https://doi.org/10.1109/ISI.2016.7745465

  12. Samtani S, Chinn K, Larson C, Chen H (2016) AZSecure hacker assets portal: cyber threat intelligence and malware analysis. In: Zhou L, Kaati L, Mao W, Wang GA (eds) Proceedings of the 2016 IEEE Conference on Intelligence and Security Informatics. The Printing House, Stoughton, WI, USA, pp 19–24. https://doi.org/10.1109/ISI.2016.7745437

  13. Allodi L (2017) Economic factors of vulnerability trade and exploitation. In: Thuraisingham B, Evans D, Malkin T, Xu D (eds) Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security. ACM, New York, pp 1483–1499. https://doi.org/10.1145/3133956.3133960

  14. Bullough BL, Yanchenko AK, Smith CL, Zipkin JR (2017) Predicting exploitation of disclosed software vulnerabilities using open-source data. In: Verma R, Thuraisingham B (eds) Proceedings of the 3rd ACM on International Workshop on Security and Privacy Analytics. ACM, New York, pp 45–53. https://doi.org/10.1145/3041008.3041009

  15. Allodi L, Shim W, Massacci F (2013) Quantitative assessment of risk reduction with cybercrime black market monitoring. In: 2013 IEEE Security and Privacy Workshops. IEEE Computer Society, Los Alamitos, CA, USA, pp 165–172. https://doi.org/10.1109/SPW.2013.16

  16. Bozorgi M, Saul LK, Savage S, Voelker GM (2010) Beyond heuristics: learning to classify vulnerabilities and predict exploits. In: Rao B, Krishnapuram B, Tomkins A, Yang Q (eds) Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, New York, pp 105–114. https://doi.org/10.1145/1835804.1835821

  17. Motoyama M, McCoy D, Levchenko K, Savage S, Voelker GM (2011) An analysis of underground forums. In: Thiran P, Willinger W (eds) Proceedings of the 2011 ACM SIGCOMM Conference on Internet Measurement. ACM, New York, pp 71–80. https://doi.org/10.1145/2068816.2068824

  18. Holt TJ, Lampke E (2010) Exploring stolen data markets online: products and market forces. Crim Justice Stud 23(1):33–50. https://doi.org/10.1080/14786011003634415

    Article  Google Scholar 

  19. Shakarian J, Gunn AT, Shakarian P (2016) Exploring malicious hacker forums. In: Jajodia S, Subrahmanian V, Swarup V, Wang C (eds) Cyber deception. Springer, Cham, pp 259–282. https://doi.org/10.1007/978-3-319-32699-3_11

    Chapter  Google Scholar 

  20. Nunes E, Diab A, Gunn A, Marin E, Mishra V, Paliath V, Robertson J, Shakarian J, Thart A, Shakarian P (2016) Darknet and deepnet mining for proactive cybersecurity threat intelligence. In: Chen H, Hariri S, Thuraisingham B, Zeng D (eds) Proceedings of the 2016 IEEE Conference on Intelligence and Security Informatics, pp 7–12. https://doi.org/10.1109/ISI.2016.7745435

  21. Robertson J, Diab A, Marin E, Nunes E, Paliath V, Shakarian J, Shakarian P (2017) Darkweb cyber threat intelligence mining. Cambridge University Press, New York. https://doi.org/10.1017/9781316888513

    Article  Google Scholar 

  22. Liu Y, Sarabi A, Zhang J, Naghizadeh P, Karir M, Bailey M, Liu M (2015) Cloudy with a chance of breach: forecasting cyber security incidents. In: Proceedings of the 24th USENIX Security Symposium. USENIX Association, Berkeley, CA, USA, pp 1009–1024. https://www.usenix.org/sites/default/files/sec15_full_proceedings.pdf

  23. Soska N, Christin K (2014) Automatically detecting vulnerable websites before they turn malicious. In: Proceedings of the 23rd USENIX Security Symposium. USENIX Association, Berkeley, CA, USA, pp 625–640. https://www.usenix.org/sites/default/files/sec14_full_proceedings.pdf

  24. Almukaynizi M, Nunes E, Dharaiya K, Senguttuvan M, Shakarian J, Shakarian P (2017) Proactive identification of exploits in the wild through vulnerability mentions online. In: Sobiesk E, Bennett D, Maxwell P (eds) Proceedings of the 2017 International Conference on Cyber Conflict. Curran Associates, Red Hook, NY, USA, pp 82–88. https://doi.org/10.1109/CYCONUS.2017.8167501

  25. Zhang S, Caragea D, Ou X (2011) An empirical study on using the national vulnerability database to predict software vulnerabilities. In: Hameurlain A, Liddle SW, Schewe KD, Zhou X (eds) Database and expert systems applications. Springer, Heidelberg, pp 217–231. https://doi.org/10.1007/978-3-642-23088-2_15

    Google Scholar 

  26. Hao S, Kantchelian A, Miller B, Paxson V, Feamster N (2016) PREDATOR: proactive recognition and elimination of domain abuse at time-of-registration. In: Weippl E, Katzenbeisser S, Kruegel C, Myers A, Halevi S (eds) Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security. ACM, New York, pp 1568-1579. https://doi.org/10.1145/2976749.2978317

  27. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297. https://doi.org/10.1023/A:1022627411411

    Article  MATH  Google Scholar 

  28. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Int Res 16(1):321–357. https://doi.org/10.1613/jair.953

    Article  MATH  Google Scholar 

  29. Allodi L, Massacci F, Williams JM (2017) The work-averse cyber attacker model: theory and evidence from two million attack signatures. https://doi.org/10.2139/ssrn.2862299

    Article  Google Scholar 

  30. Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  31. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140. https://doi.org/10.1007/BF00058655

    Article  MathSciNet  MATH  Google Scholar 

  32. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay É (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  33. Guo D, Shamai S, Verdu S (2005) Mutual information and minimum mean-square error in Gaussian channels. IEEE Trans Inform Theory 51(4):1261–1282. https://doi.org/10.1109/TIT.2005.844072

    Article  MathSciNet  MATH  Google Scholar 

  34. Galar M, Fernandez A, Barrenechea E, Bustince H, Herrera F (2012) A review on ensembles for the class imbalance problem: bagging-, boosting-, and hybrid-based approach. IEEE Trans Syst Man Cybern C 42(4):463–484. https://doi.org/10.1109/TSMCC.2011.2161285

    Article  Google Scholar 

  35. Barreno M, Bartlett PL, Chi FJ, Joseph AD, Nelson B, Rubinstein BIP, Saini U, Tygar JD (2008) Open problems in the security of learning. In: Balfanz D, Staddon J (eds) Proceedings of the 1st ACM Workshop on AISec. ACM, New York, pp 19–26. https://doi.org/10.1145/1456377.1456382

  36. Barreno M, Nelson B, Joseph AD, Tygar J (2010) The security of machine learning. Mach Learn 81(2):121–148. https://doi.org/10.1007/s10994-010-5188-5

    Article  MathSciNet  Google Scholar 

  37. Biggio B, Nelson B, Laskov P (2011) Support vector machines under adversarial label noise. In: Hsu C-N, Lee WS (eds) Proceedings of the 3rd Asian Conference on Machine Learning, pp 97–112. http://www.jmlr.org/proceedings/papers/v20/biggio11/biggio11.pdf

Download references

Acknowledgements

Some of the authors were supported by the Office of Naval Research (ONR) contract N00014-15-1-2742, the Office of Naval Research (ONR) Neptune program and the ASU Global Security Initiative (GSI). Paulo Shakarian and Jana Shakarian are supported by the Office of the Director of National Intelligence (ODNI) and the Intelligence Advanced Research Projects Activity (IARPA) via the Air Force Research Laboratory (AFRL) contract number FA8750-16-C-0112. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of ODNI, IARPA, AFRL, or the U.S. Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammed Almukaynizi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Almukaynizi, M., Nunes, E., Dharaiya, K., Senguttuvan, M., Shakarian, J., Shakarian, P. (2019). Patch Before Exploited: An Approach to Identify Targeted Software Vulnerabilities. In: Sikos, L. (eds) AI in Cybersecurity. Intelligent Systems Reference Library, vol 151. Springer, Cham. https://doi.org/10.1007/978-3-319-98842-9_4

Download citation

Publish with us

Policies and ethics