Skip to main content

Enhanced Web Log Cleaning Algorithm for Web Intrusion Detection

  • Conference paper
Recent Advances in Information and Communication Technology

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 265))

  • 996 Accesses

Abstract

Web logs play the crucial role in detecting web attack. However, analyzing web logs become a challenge due to the huge log volume issue. The objective of this research is to create a web log cleaning algorithm for web intrusion detection. Studies on previous works showed that there are five major web log attributes needed in web log cleaning algorithm for intrusion detection, namely multimedia files, web robots request, HTTP status code, HTTP method and other files. The enhanced algorithm is based on these five major web log attributes along with a set of rules and conditions. Our experiment shows that the proposed algorithm is able to clean noisy data effectively with a percentage of reduction of 40.41 and at the same time maintain the readiness for web intrusion detection at a low false negative rate (0.00531). Future works may address the web intrusion detection mechanism.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Suthaharan, S., Panchagnula, T.: Relevance feature selection with data cleaning for intrusion detection system. In: Proceedings of the IEEE SoutheastCon, pp. 1–6. IEEE (2012)

    Google Scholar 

  2. Salama, S.E., Marie, M.I., El-Fangary, L.M., Helmy, Y.K.: Web Server Logs Preprocessing for Web Intrusion Detection. Computer and Information Science 4, 123–133 (2011)

    Article  Google Scholar 

  3. Patil, P., Patil, U.: Preprocessing of web server log file for web mining. World Journal of Science and Technology 2, 14–18 (2012)

    Google Scholar 

  4. Farid, D.M., Rahman, M.Z., Rahman, C.M.: Adaptive Intrusion Detection based on Boosting and Naive Bayesian Classifier. International Journal of Computer Applications 24, 12–19 (2011)

    Article  Google Scholar 

  5. Eshaghi, M., Gawali, S.Z.: Web Usage Mining Based on Complex Structure of XML for Web IDS. IJITEE International Journal of Innovative Technology and Exploring Engineering 2, 323–326 (2013)

    Google Scholar 

  6. Suen, H.Y., Lau, W.C., Yue, O.: Detecting Anomalous Web Browsing via Diffusion Wavelets. In: International Conference on Communications, pp. 1–6. IEEE (2010)

    Google Scholar 

  7. Chauhan, P., Singh, N., Chandra, N.: Deportment of Logs for Securing the Host System. In: 5th International Conference on Computational Intelligence and Communication Networks, pp. 355–359. IEEE (2013)

    Google Scholar 

  8. Aye, T.T.: Web log cleaning for mining of web usage patterns. In: 3rd International Conference on Computer Research and Development, pp. 490–494. IEEE (2011)

    Google Scholar 

  9. Raju, G., Satyanarayana, P.: Knowledge Discovery from Web Usage Data: Complete Preprocessing Methodology. IJCSNS International Journal of Computer Science and Network Security 8, 179–186 (2008)

    Google Scholar 

  10. Vellingiri, J., Pandian, S.C.: A Novel Technique for Web Log mining with Better Data Cleaning and Transaction Identification. Journal of Computer Science 7, 683–689 (2011)

    Article  Google Scholar 

  11. Reddy, K.S., Varma, G., Babu, I.R.: Preprocessing the web server logs: an illustrative approach for effective usage mining. ACM SIGSOFT Software Engineering Notes 37, 1–5 (2012)

    Google Scholar 

  12. Castellano, G., Fanelli, A., Torsello, M.: Log data preparation for mining web usage patterns. In: Proceedings of IADIS International Conference Applied Computing, pp. 371–378 (2007)

    Google Scholar 

  13. Suneetha, K., Krishnamoorthi, R.: Identifying user behavior by analyzing web server access log file. IJCSNS International Journal of Computer Science and Network Security 9, 327–332 (2009)

    Google Scholar 

  14. Anand, S., Aggarwal, R.R.: An Efficient Algorithm for Data Cleaning of Log File using File Extensions. International Journal of Computer Applications 48, 13–18 (2012)

    Article  Google Scholar 

  15. Stamm, S., Stern, B., Markham, G.: Reining in the web with content security policy. In: Proceedings of the 19th International Conference on World Wide Web, pp. 921–930. ACM (2010)

    Google Scholar 

  16. Bomhardt, C., Gaul, W., Schmidt-Thieme, L.: Web robot detection-preprocessing web logfiles for robot detection. In: New Developments in Classification and Data Analysis, pp. 113–124 (2005)

    Google Scholar 

  17. Doran, D., Gokhale, S.S.: Web robot detection techniques: overview and limitations. Data Mining and Knowledge Discovery 22, 183–210 (2011)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yew Chuan Ong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Ong, Y.C., Ismail, Z. (2014). Enhanced Web Log Cleaning Algorithm for Web Intrusion Detection. In: Boonkrong, S., Unger, H., Meesad, P. (eds) Recent Advances in Information and Communication Technology. Advances in Intelligent Systems and Computing, vol 265. Springer, Cham. https://doi.org/10.1007/978-3-319-06538-0_31

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-06538-0_31

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-06537-3

  • Online ISBN: 978-3-319-06538-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics