Skip to main content

An Improved Decision System for URL Accesses Based on a Rough Feature Selection Technique

  • Chapter
  • First Online:

Part of the book series: Studies in Computational Intelligence ((SCI,volume 621))

Abstract

Corporate security is usually one of the matters in which companies invest more resources, since the loss of information directly translates into monetary losses. Security issues might have an origin in external attacks or internal security failures, but an important part of the security breaches is related to the lack of awareness that the employees have with regard to the use of the Web. In this work we have focused on the latter problem, describing the improvements to a system able to detect anomalous and potentially insecure situations that could be dangerous for a company. This system was initially conceived as a better alternative to what are known as black/white lists. These lists contain URLs whose access is banned or dangerous (black list), or URLs to which the access is permitted or allowed (white list). In this chapter, we propose a system that can initially learn from existing black/white lists and then classify a new, unknown, URL request either as “should be allowed” or “should be denied”. This system is described, as well as its results and the improvements made by means of an initial data pre-processing step based on applying Rough Set Theory for feature selection. We prove that high accuracies can be obtained even without including a pre-processing step, reaching between 96 and 97 % of correctly classified patterns. Furthermore, we also prove that including the use of Computational Intelligence techniques for pre-processing the data enhances the system performance, in terms of running time, while the accuracies remain close to 97 %. Indeed, among the obtained results, we demonstrate that it is possible to obtain interesting rules which are not based only on the URL string feature, for classifying new unknown URLs access requests as allowed or as denied.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Taken from a log file released to us by a Spanish company.

  2. 2.

    The set of rules has been written by the same company, with respect to its employees.

  3. 3.

    Data which was gathered from the real world, and was not artificially generated.

  4. 4.

    Format of Weka files.

  5. 5.

    Trees can be deployed as rules.

References

  1. Alfaro-Cid, E., Sharman, K., Esparcia-Alcázar, A.: A genetic programming approach for bankruptcy prediction using a highly unbalanced database. In: Giacobini, M. (ed.) Applications of Evolutionary Computing. Lecture Notes in Computer Science, vol. 4448, pp. 169–178. Springer, Heidelberg (2007). http://dx.doi.org/10.1007/978-3-540-71805-5_19

  2. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  3. Breivik, G.: Abstract misuse patterns—a new approach to security requirements. Master thesis. Department of Information Science. University of Bergen, Bergen, N-5020 NORWAY (2002)

    Google Scholar 

  4. Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Int. Res. 16(1), 321–357 (2002). http://dl.acm.org/citation.cfm?id=1622407.1622416

    Google Scholar 

  5. Chawla, N.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, USA (2005). http://dx.doi.org/10.1007/0-387-25465-X_40

  6. Chelly, Z.: New danger classification methods in an imprecise framework. Ph.D. thesis. Laboratoire de Recherche Opérationelle de Décision et de Contrôle de Processus, Institut Supérieur de Gestion, Tunisia (2014)

    Google Scholar 

  7. Cheswick, W.R., Bellovin, S.M., Rubin, A.D.: Firewalls and Internet Security: Repelling the Wily Hacker. Addison-Wesley Longman Publishing Co., Inc., Boston (2003)

    MATH  Google Scholar 

  8. Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123 (1995)

    Google Scholar 

  9. Danezis, G.: Inferring privacy policies for social networking services. In: Proceedings of the 2nd ACM Workshop on Security and Artificial Intelligence. AISec 2009, pp. 5–10. ACM, New York (2009). http://doi.acm.org/10.1145/1654988.1654991

  10. Elomaa, T., Kaariainen, M.: An analysis of reduced error pruning. Artif. Intell. Res. 15, 163–187 (2001)

    MathSciNet  MATH  Google Scholar 

  11. Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Shavlik, J. (ed.) Fifteenth International Conference on Machine Learning, pp. 144–151. Morgan Kaufmann, San Francisco (1998)

    Google Scholar 

  12. Frank, E., Witten, I.H.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann Publishers, San Francisco (2011)

    MATH  Google Scholar 

  13. Greenstadt, R., Beal, J.: Cognitive security for personal devices. In: Proceedings of the 1st ACM Workshop on Workshop on AISec. AISec 2008, pp. 27–30. ACM, New York (2008). http://doi.acm.org/10.1145/1456377.1456383

  14. Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: Fourth International Conference on Natural Computation. ICNC 2008, vol. 4, pp. 192–201, October 2008

    Google Scholar 

  15. Harris, E.: The Next Step in the Spam Control War: Greylisting (2003)

    Google Scholar 

  16. Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449, October 2002. http://dl.acm.org/citation.cfm?id=1293951.1293954

    Google Scholar 

  17. Jensen, R., Shen, Q.: Semantics-preserving dimensionality reduction: Rough and fuzzy-rough-based approaches. IEEE Trans. Knowl. Data Eng. 17(1), 1 (2005)

    Article  Google Scholar 

  18. Jensen, R., Shen, Q.: Fuzzy-rough sets assisted attribute selection. IEEE Trans. Fuzzy Syst. 15(1), 73–89 (2007)

    Article  Google Scholar 

  19. Kaeo, M.: Designing Network Security. Cisco Press, Indianapolis (2003)

    Google Scholar 

  20. Kelley, P.G., Hankes Drielsma, P., Sadeh, N., Cranor, L.F.: User-controllable learning of security and privacy policies. In: Proceedings of the 1st ACM Workshop on Workshop on AISec. AISec 2008, pp. 11–18. ACM, New York (2008). http://doi.acm.org/10.1145/1456377.1456380

  21. Lim, Y.T., Cheng, P.C., Clark, J., Rohatgi, P.: Policy evolution with genetic programming: a comparison of three approaches. In: IEEE Congress on Evolutionary Computation. CEC 2008. (IEEE World Congress on Computational Intelligence), pp. 1792–1800, June 2008

    Google Scholar 

  22. Lim, Y.T., Cheng, P.C., Rohatgi, P., Clark, J.A.: Mls security policy evolution with genetic programming. In: Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation. GECCO 2008, pp. 1571–1578. ACM, New York (2008). http://doi.acm.org/10.1145/1389095.1389395

  23. Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Springer, USA (1998)

    Book  MATH  Google Scholar 

  24. Ludl, C., McAllister, S., Kirda, E., Kruegel, C.: On the effectiveness of techniques to detect phishing sites. In: Hämmerli, B.M., Sommer, R. (eds.) Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 20–39. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  25. MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297, no. 14. California, USA (1967)

    Google Scholar 

  26. Martin, B.: Instance-based learning: nearest neighbor with generalization. Master’s thesis, University of Waikato, Hamilton, New Zealand (1995)

    Google Scholar 

  27. McAfee: Mcafee labs threats report, June 2014 . http://www.mcafee.com/uk/about/newsroom/research-reports.aspx

  28. Mora, A., De las Cuevas, P., Merelo, J.: Going a step beyond the black and white lists for url accesses in the enterprise by means of categorical classifiers. In: Proceedings of the International Conference on Evolutionary Computation Theory and Applications (ECTA). SCITEPRESS, pp. 125–134 (2014)

    Google Scholar 

  29. Mora, A., De las Cuevas, P., Merelo, J., Zamarripa, S., Juan, M., Esparcia-Alcázar, A., Burvall, M., Arfwedson, H., Hodaie, Z.: MUSES: a corporate user-centric system which applies computational intelligence methods. In: Shin, D. et al., (ed.) 29th Symposium On Applied Computing, pp. 1719–1723 (2014)

    Google Scholar 

  30. Netcraft: November 2014 web server survey (2014). http://news.netcraft.com/archives/category/web-server-survey/

  31. Pawlak, Z., Polkowski, L., Skowron, A.: Rough set theory. In: Wah, B.W. (ed.) Wiley Encyclopedia of Computer Science and Engineering. Wiley, Hoboken (2008)

    Google Scholar 

  32. Quinlan, J.R.: Simplifying decision trees. Man Mach. Stud. 27(3), 221–234 (1987)

    Article  Google Scholar 

  33. Quinlan, J.R.: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  34. Seigneur, J.M., Kölndorfer, P., Busch, M., Hochleitner, C.: A survey of trust and risk metrics for a BYOD mobile working world. In: Third International Conference on Social Eco-Informatics (2013)

    Google Scholar 

  35. Shen, Q., Jensen, R.: Rough sets, their extensions and applications. Int. J. Autom. Comput. 4(3), 217–228 (2007)

    Article  Google Scholar 

  36. Stanton, J.M., Stam, K.R., Mastrangelo, P., Jolton, J.: Analysis of end user security behaviors. Comput. Secur. 24(2), 124–133 (2005)

    Article  Google Scholar 

  37. Suarez-Tangil, G., Palomar, E., Fuentes, J., Blasco, J., Ribagorda, A.: Automatic rule generation based on genetic programming for event correlation. In: Herrero, A., Gastaldo, P., Zunino, R., Corchado, E. (eds.) Computational Intelligence in Security for Information Systems, Advances in Intelligent and Soft Computing, vol. 63, pp. 127–134. Springer, Heidelberg (2009). http://dx.doi.org/10.1007/978-3-642-04091-7_16

    Google Scholar 

  38. Team, S.: Squid website (2013). http://www.squid-cache.org/

  39. Team, S.: Squid faq—squid log files (2014)

    Google Scholar 

  40. Team, T.J.D.: Drools documentation. version 6.0.1.final (2013). http://docs.jboss.org/drools/release/6.0.1.Final/drools-docs/html/index.html

  41. Team, T.J.D.: Drools website (2013). http://www.jboss.org/drools.html

  42. Waikato, U.: Weka (1993), University of Waikato, September 2014, http://www.cs.waikato.ac.nz/ml/weka/

  43. Wessels, D.: Squid: The Definitive Guide, 1st edn. O’Reilly Media Inc., Sebastopol (2004)

    Google Scholar 

  44. Wiki, S.: Squid hierarchy (2014)

    Google Scholar 

  45. Wilson, D.C., Leake, D.B.: Maintaining case-based reasoners: dimensions and directions. Comput. Intell. 17(2), 196–213 (2001)

    Article  Google Scholar 

  46. Zhong, N., Dong, J., Ohsuga, S.: Using rough sets with heuristics for feature selection. J. Intell. Inf. Syst. 16(3), 199–214 (2001)

    Article  MATH  Google Scholar 

Download references

Acknowledgments

The authors would like to thank GENIL-SSV’2015 for ensuring the visit of Dr. Zeineb Chelly to be part of this project. We thank Dr. Zeineb Chelly from Institut Supérieur de Gestion, Tunisia for her technical insight, recommendations and suggestions and for her assistance during the practical experiments. This paper has been funded in part by European project MUSES (FP7-318508), along with Spanish National project TIN2011-28627-C04-02 (ANYSELF), project P08-TIC-03903 (EVORQ) awarded by the Andalusian Regional Government, and projects 83 (CANUBE), and GENIL PYR-2014-17, both awarded by the CEI-BioTIC UGR.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to P. de las Cuevas .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

de las Cuevas, P., Chelly, Z., Mora, A.M., Merelo, J.J., Esparcia-Alcázar, A.I. (2016). An Improved Decision System for URL Accesses Based on a Rough Feature Selection Technique. In: Abielmona, R., Falcon, R., Zincir-Heywood, N., Abbass, H. (eds) Recent Advances in Computational Intelligence in Defense and Security. Studies in Computational Intelligence, vol 621. Springer, Cham. https://doi.org/10.1007/978-3-319-26450-9_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26450-9_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26448-6

  • Online ISBN: 978-3-319-26450-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics