An Improved Decision System for URL Accesses Based on a Rough Feature Selection Technique

de las Cuevas, P.; Chelly, Z.; Mora, A. M.; Merelo, J. J.; Esparcia-Alcázar, A. I.

doi:10.1007/978-3-319-26450-9_6

An Improved Decision System for URL Accesses Based on a Rough Feature Selection Technique

P. de las Cuevas⁶,
Z. Chelly⁷,
A. M. Mora⁶,
J. J. Merelo⁶ &
…
A. I. Esparcia-Alcázar⁸

Chapter
First Online: 20 December 2015

1180 Accesses
2 Citations
4 Altmetric

Part of the book series: Studies in Computational Intelligence ((SCI,volume 621))

Abstract

Corporate security is usually one of the matters in which companies invest more resources, since the loss of information directly translates into monetary losses. Security issues might have an origin in external attacks or internal security failures, but an important part of the security breaches is related to the lack of awareness that the employees have with regard to the use of the Web. In this work we have focused on the latter problem, describing the improvements to a system able to detect anomalous and potentially insecure situations that could be dangerous for a company. This system was initially conceived as a better alternative to what are known as black/white lists. These lists contain URLs whose access is banned or dangerous (black list), or URLs to which the access is permitted or allowed (white list). In this chapter, we propose a system that can initially learn from existing black/white lists and then classify a new, unknown, URL request either as “should be allowed” or “should be denied”. This system is described, as well as its results and the improvements made by means of an initial data pre-processing step based on applying Rough Set Theory for feature selection. We prove that high accuracies can be obtained even without including a pre-processing step, reaching between 96 and 97 % of correctly classified patterns. Furthermore, we also prove that including the use of Computational Intelligence techniques for pre-processing the data enhances the system performance, in terms of running time, while the accuracies remain close to 97 %. Indeed, among the obtained results, we demonstrate that it is possible to obtain interesting rules which are not based only on the URL string feature, for classifying new unknown URLs access requests as allowed or as denied.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
Taken from a log file released to us by a Spanish company.
2.
The set of rules has been written by the same company, with respect to its employees.
3.
Data which was gathered from the real world, and was not artificially generated.
4.
Format of Weka files.
5.
Trees can be deployed as rules.

References

Alfaro-Cid, E., Sharman, K., Esparcia-Alcázar, A.: A genetic programming approach for bankruptcy prediction using a highly unbalanced database. In: Giacobini, M. (ed.) Applications of Evolutionary Computing. Lecture Notes in Computer Science, vol. 4448, pp. 169–178. Springer, Heidelberg (2007). http://dx.doi.org/10.1007/978-3-540-71805-5_19
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Breivik, G.: Abstract misuse patterns—a new approach to security requirements. Master thesis. Department of Information Science. University of Bergen, Bergen, N-5020 NORWAY (2002)
Google Scholar
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: synthetic minority over-sampling technique. J. Artif. Int. Res. 16(1), 321–357 (2002). http://dl.acm.org/citation.cfm?id=1622407.1622416
Google Scholar
Chawla, N.: Data mining for imbalanced datasets: an overview. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 853–867. Springer, USA (2005). http://dx.doi.org/10.1007/0-387-25465-X_40
Chelly, Z.: New danger classification methods in an imprecise framework. Ph.D. thesis. Laboratoire de Recherche Opérationelle de Décision et de Contrôle de Processus, Institut Supérieur de Gestion, Tunisia (2014)
Google Scholar
Cheswick, W.R., Bellovin, S.M., Rubin, A.D.: Firewalls and Internet Security: Repelling the Wily Hacker. Addison-Wesley Longman Publishing Co., Inc., Boston (2003)
MATH Google Scholar
Cohen, W.W.: Fast effective rule induction. In: Proceedings of the Twelfth International Conference on Machine Learning, pp. 115–123 (1995)
Google Scholar
Danezis, G.: Inferring privacy policies for social networking services. In: Proceedings of the 2nd ACM Workshop on Security and Artificial Intelligence. AISec 2009, pp. 5–10. ACM, New York (2009). http://doi.acm.org/10.1145/1654988.1654991
Elomaa, T., Kaariainen, M.: An analysis of reduced error pruning. Artif. Intell. Res. 15, 163–187 (2001)
MathSciNet MATH Google Scholar
Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Shavlik, J. (ed.) Fifteenth International Conference on Machine Learning, pp. 144–151. Morgan Kaufmann, San Francisco (1998)
Google Scholar
Frank, E., Witten, I.H.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann Publishers, San Francisco (2011)
MATH Google Scholar
Greenstadt, R., Beal, J.: Cognitive security for personal devices. In: Proceedings of the 1st ACM Workshop on Workshop on AISec. AISec 2008, pp. 27–30. ACM, New York (2008). http://doi.acm.org/10.1145/1456377.1456383
Guo, X., Yin, Y., Dong, C., Yang, G., Zhou, G.: On the class imbalance problem. In: Fourth International Conference on Natural Computation. ICNC 2008, vol. 4, pp. 192–201, October 2008
Google Scholar
Harris, E.: The Next Step in the Spam Control War: Greylisting (2003)
Google Scholar
Japkowicz, N., Stephen, S.: The class imbalance problem: a systematic study. Intell. Data Anal. 6(5), 429–449, October 2002. http://dl.acm.org/citation.cfm?id=1293951.1293954
Google Scholar
Jensen, R., Shen, Q.: Semantics-preserving dimensionality reduction: Rough and fuzzy-rough-based approaches. IEEE Trans. Knowl. Data Eng. 17(1), 1 (2005)
Article Google Scholar
Jensen, R., Shen, Q.: Fuzzy-rough sets assisted attribute selection. IEEE Trans. Fuzzy Syst. 15(1), 73–89 (2007)
Article Google Scholar
Kaeo, M.: Designing Network Security. Cisco Press, Indianapolis (2003)
Google Scholar
Kelley, P.G., Hankes Drielsma, P., Sadeh, N., Cranor, L.F.: User-controllable learning of security and privacy policies. In: Proceedings of the 1st ACM Workshop on Workshop on AISec. AISec 2008, pp. 11–18. ACM, New York (2008). http://doi.acm.org/10.1145/1456377.1456380
Lim, Y.T., Cheng, P.C., Clark, J., Rohatgi, P.: Policy evolution with genetic programming: a comparison of three approaches. In: IEEE Congress on Evolutionary Computation. CEC 2008. (IEEE World Congress on Computational Intelligence), pp. 1792–1800, June 2008
Google Scholar
Lim, Y.T., Cheng, P.C., Rohatgi, P., Clark, J.A.: Mls security policy evolution with genetic programming. In: Proceedings of the 10th Annual Conference on Genetic and Evolutionary Computation. GECCO 2008, pp. 1571–1578. ACM, New York (2008). http://doi.acm.org/10.1145/1389095.1389395
Liu, H., Motoda, H.: Feature Extraction, Construction and Selection: A Data Mining Perspective. Springer, USA (1998)
Book MATH Google Scholar
Ludl, C., McAllister, S., Kirda, E., Kruegel, C.: On the effectiveness of techniques to detect phishing sites. In: Hämmerli, B.M., Sommer, R. (eds.) Detection of Intrusions and Malware, and Vulnerability Assessment, pp. 20–39. Springer, Heidelberg (2007)
Chapter Google Scholar
MacQueen, J., et al.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297, no. 14. California, USA (1967)
Google Scholar
Martin, B.: Instance-based learning: nearest neighbor with generalization. Master’s thesis, University of Waikato, Hamilton, New Zealand (1995)
Google Scholar
McAfee: Mcafee labs threats report, June 2014 . http://www.mcafee.com/uk/about/newsroom/research-reports.aspx
Mora, A., De las Cuevas, P., Merelo, J.: Going a step beyond the black and white lists for url accesses in the enterprise by means of categorical classifiers. In: Proceedings of the International Conference on Evolutionary Computation Theory and Applications (ECTA). SCITEPRESS, pp. 125–134 (2014)
Google Scholar
Mora, A., De las Cuevas, P., Merelo, J., Zamarripa, S., Juan, M., Esparcia-Alcázar, A., Burvall, M., Arfwedson, H., Hodaie, Z.: MUSES: a corporate user-centric system which applies computational intelligence methods. In: Shin, D. et al., (ed.) 29th Symposium On Applied Computing, pp. 1719–1723 (2014)
Google Scholar
Netcraft: November 2014 web server survey (2014). http://news.netcraft.com/archives/category/web-server-survey/
Pawlak, Z., Polkowski, L., Skowron, A.: Rough set theory. In: Wah, B.W. (ed.) Wiley Encyclopedia of Computer Science and Engineering. Wiley, Hoboken (2008)
Google Scholar
Quinlan, J.R.: Simplifying decision trees. Man Mach. Stud. 27(3), 221–234 (1987)
Article Google Scholar
Quinlan, J.R.: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)
Google Scholar
Seigneur, J.M., Kölndorfer, P., Busch, M., Hochleitner, C.: A survey of trust and risk metrics for a BYOD mobile working world. In: Third International Conference on Social Eco-Informatics (2013)
Google Scholar
Shen, Q., Jensen, R.: Rough sets, their extensions and applications. Int. J. Autom. Comput. 4(3), 217–228 (2007)
Article Google Scholar
Stanton, J.M., Stam, K.R., Mastrangelo, P., Jolton, J.: Analysis of end user security behaviors. Comput. Secur. 24(2), 124–133 (2005)
Article Google Scholar
Suarez-Tangil, G., Palomar, E., Fuentes, J., Blasco, J., Ribagorda, A.: Automatic rule generation based on genetic programming for event correlation. In: Herrero, A., Gastaldo, P., Zunino, R., Corchado, E. (eds.) Computational Intelligence in Security for Information Systems, Advances in Intelligent and Soft Computing, vol. 63, pp. 127–134. Springer, Heidelberg (2009). http://dx.doi.org/10.1007/978-3-642-04091-7_16
Google Scholar
Team, S.: Squid website (2013). http://www.squid-cache.org/
Team, S.: Squid faq—squid log files (2014)
Google Scholar
Team, T.J.D.: Drools documentation. version 6.0.1.final (2013). http://docs.jboss.org/drools/release/6.0.1.Final/drools-docs/html/index.html
Team, T.J.D.: Drools website (2013). http://www.jboss.org/drools.html
Waikato, U.: Weka (1993), University of Waikato, September 2014, http://www.cs.waikato.ac.nz/ml/weka/
Wessels, D.: Squid: The Definitive Guide, 1st edn. O’Reilly Media Inc., Sebastopol (2004)
Google Scholar
Wiki, S.: Squid hierarchy (2014)
Google Scholar
Wilson, D.C., Leake, D.B.: Maintaining case-based reasoners: dimensions and directions. Comput. Intell. 17(2), 196–213 (2001)
Article Google Scholar
Zhong, N., Dong, J., Ohsuga, S.: Using rough sets with heuristics for feature selection. J. Intell. Inf. Syst. 16(3), 199–214 (2001)
Article MATH Google Scholar

Download references

Acknowledgments

The authors would like to thank GENIL-SSV’2015 for ensuring the visit of Dr. Zeineb Chelly to be part of this project. We thank Dr. Zeineb Chelly from Institut Supérieur de Gestion, Tunisia for her technical insight, recommendations and suggestions and for her assistance during the practical experiments. This paper has been funded in part by European project MUSES (FP7-318508), along with Spanish National project TIN2011-28627-C04-02 (ANYSELF), project P08-TIC-03903 (EVORQ) awarded by the Andalusian Regional Government, and projects 83 (CANUBE), and GENIL PYR-2014-17, both awarded by the CEI-BioTIC UGR.

Author information

Authors and Affiliations

Department of Computer Architecture and Computer Technology, University of Granada, Granada, Spain
P. de las Cuevas, A. M. Mora & J. J. Merelo
Laboratoire de Recherche Opérationelle de Décision Et de Contrôle de Processus, Institut Supérieur de Gestion, Tunis, Tunisia
Z. Chelly
University of Valencia, Valencia, Spain
A. I. Esparcia-Alcázar

Authors

P. de las Cuevas
View author publications
You can also search for this author in PubMed Google Scholar
Z. Chelly
View author publications
You can also search for this author in PubMed Google Scholar
A. M. Mora
View author publications
You can also search for this author in PubMed Google Scholar
J. J. Merelo
View author publications
You can also search for this author in PubMed Google Scholar
A. I. Esparcia-Alcázar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to P. de las Cuevas .

Editor information

Editors and Affiliations

Larus Technologies Corporation, Ottawa, Ontario, Canada
Rami Abielmona
Larus Technologies Corporation, Ottawa, Ontario, Canada
Rafael Falcon
Faculty of Computer Science, Dalhousie University, Halifax, Nova Scotia, Canada
Nur Zincir-Heywood
School of Engineering and Information Technology, University of New South Wales, Canberra, Aust Capital Terr, Australia
Hussein A. Abbass

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

de las Cuevas, P., Chelly, Z., Mora, A.M., Merelo, J.J., Esparcia-Alcázar, A.I. (2016). An Improved Decision System for URL Accesses Based on a Rough Feature Selection Technique. In: Abielmona, R., Falcon, R., Zincir-Heywood, N., Abbass, H. (eds) Recent Advances in Computational Intelligence in Defense and Security. Studies in Computational Intelligence, vol 621. Springer, Cham. https://doi.org/10.1007/978-3-319-26450-9_6

Download citation

DOI: https://doi.org/10.1007/978-3-319-26450-9_6
Published: 20 December 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26448-6
Online ISBN: 978-3-319-26450-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics