Skip to main content

Bot or Human? A Behavior-Based Online Bot Detection System

  • Chapter
  • First Online:
From Database to Cyber Security

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11170))

Abstract

The abuse of Internet online services by automated programs, known as bots, poses a serious threat to Internet users. Bots target popular Internet online services, such as web blogs and online social networks, to distribute spam and malware. In this work, we will first characterize the human behaviors and bot behaviors in online services. Based on the behavior characterization, we propose an effective detection system to accurately distinguish bots from humans. Our proposed detection system consists of two main components: (1) a client-side logger and (2) a server-side classifier. The client-side logger records user behavioral events such as mouse movement and keystroke data, and provides this data in batches to a server-side classifier which identifies a user as human or bot. Our experimental results demonstrate that our proposed detection is able to achieve very high accuracy with negligible overhead.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The form is usually well-structured, and the ID/name of each input field remains constant. For example, <input type=“text” name=“email” /> is the text field to enter email address. Thus, the bot author programs the bot to recognize fields and fill in appropriate content.

  2. 2.

    There are other similar bot tools that may generate simple human behavior, such as AutoIt [3] and AutoMe [4].

  3. 3.

    The page layout is different from page to page, and may affect how the Human Mimic Bot works. For example, by moving down the same amount of pixels, the mouse enters the comment form on one page, but falls out of the form on another page.

  4. 4.

    For example, the position of the submit button may vary in the webpage layout. The bot must be customized to move to the button and generate a click event on it.

  5. 5.

    Form Inject Bot generates no UI data. As Replay Bot replays traces generated by human, it is inappropriate to include human traces to characterize bot behavior.

  6. 6.

    Kolmogorov-Smirnov test presents P-value of the distribution fitting at 0.882 with a 99% confidence level.

  7. 7.

    Take the following Mouse Move record as an example, {“time”:1278555037098, “type”:“Mouse Move”, “X”:590, “Y”:10, “tagName”:“DIV”, “tagID”:“footnote”}. The “time” field contains the time stamp of the event in the unit of millisecond. The two coordinates, X and Y, denote the mouse cursor position. The last two fields describe the name and ID of the DOM element where the event happens, such as <div ID=“footnote”>. In a record of Mouse Press, {“time”:1278555074750, “type”:“Mouse Press”, “virtualKey”:0x01, “tagName”:“HTML”}, The “virtualKey” field denotes the virtual-key code of 0x01 in hexadecimal value, which corresponds to the left mouse button here.

  8. 8.

    Average speed is distance over duration, and move efficiency is displacement over distance.

  9. 9.

    Input is converted the ARFF format required by Weka [1].

  10. 10.

    As our classification only involves two categories, human and bot, a majority means more than half of the votes.

  11. 11.

    The idle time is not included in the traces. The bot trace consists of 30 h of Human Mimic Bot data and 2 h of Replay Bot data.

  12. 12.

    The true positive rate is the ratio of the number of bots which are correctly classified to the number of all the bots.

  13. 13.

    The true negative rate is the ratio of the number of humans which are correctly classified to the number of all the humans.

  14. 14.

    A series of consecutive actions represent continuous behavior well.

References

  1. Attribute-relation file format (arff). http://www.cs.waikato.ac.nz/ml/weka/arff.html

  2. Autohotkey - free mouse and keyboard macro program with hotkeys. http://www.autohotkey.com/

  3. Autoit, automation and scripting language. http://www.autoitscript.com/site/autoit/

  4. Autome - automate mouse and keyboard actions. http://www.asoftech.com/autome/

  5. Blogbot by incansoft. http://blogbot.auto-submitters.com/

  6. Global mouse and keyboard library. http://www.codeproject.com/KB/system/globalmousekeyboardlib.aspx

  7. Json, javascript object notation. http://www.json.org/

  8. Ultimate wordpress comment submitter. http://www.wordpresscommentspammer.com/

  9. Virtual-key codes. http://msdn.microsoft.com/en-us/library/ms927178.aspx

  10. Ahmed, A.A.E., Traore, I.: A new biometric technology based on mouse dynamics. IEEE Trans. Dependable Secure Comput. 4(3), 165–179 (2007)

    Article  Google Scholar 

  11. von Ahn, L., Blum, M., Hopper, N.J., Langford, J.: CAPTCHA: using hard AI problems for security. In: Biham, E. (ed.) EUROCRYPT 2003. LNCS, vol. 2656, pp. 294–311. Springer, Heidelberg (2003). https://doi.org/10.1007/3-540-39200-9_18

    Chapter  Google Scholar 

  12. Van Balen, N., Ball, C.T., Wang, H.: A behavioral biometrics based approach to online gender classification. In: Deng, R., Weng, J., Ren, K., Yegneswaran, V. (eds.) SecureComm 2016. LNICST, vol. 198, pp. 475–495. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59608-2_27

    Chapter  Google Scholar 

  13. Chu, Z., Gianvecchio, S., Wang, H., Jajodia, S.: Who is tweeting on Twitter: human, bot or cyborg? In: Proceedings of the 2010 Annual Computer Security Applications Conference, Austin, TX, USA (2010)

    Google Scholar 

  14. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley-Interscience, New York (2006)

    MATH  Google Scholar 

  15. Funk, C., Liu, Y.: Symmetry reCAPTCHA. In: Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, June 2016

    Google Scholar 

  16. Gianvecchio, S., Wang, H.: Detecting covert timing channels: an entropy-based approach. In: Proceedings of the 2007 ACM Conference on Computer and Communications Security, Alexandria, VA, USA, October–November 2007

    Google Scholar 

  17. Gianvecchio, S., Wu., Z., Xie, M., Wang, H.: Battle of botcraft: fighting bots in online games with human observational proofs. In: Proceedings of the 16th ACM Conference on Computer and Communications Security, Chicago, IL, USA (2009)

    Google Scholar 

  18. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The weka data mining software: an update. SIGKDD Explor. Newsl. 11, 10–18 (2009)

    Article  Google Scholar 

  19. Jackson, C., Bortz, A., Boneh, D., Mitchell, J.C.: Protecting browser state from web privacy attacks. In: Proceedings of the 15th International Conference on World Wide Web, pp. 737–744 (2006)

    Google Scholar 

  20. Kohavi, R., Quinlan, R.: Decision tree discovery. In: Handbook of Data Mining and Knowledge Discovery, pp. 267–276. University Press (1999)

    Google Scholar 

  21. McLachlan, G., Do, K., Ambroise, C.: Analyzing Microarray Gene Expression Data. Wiley, Hoboken (2004)

    Book  Google Scholar 

  22. Mohta, A.: Bots are back in Yahoo! chat rooms. http://www.technospot.net/blogs/bots-are-back-in- yahoo-chat-room/

  23. Mohta, A.: Yahoo! chat adds CAPTCHA check to remove bots. http://www.technospot.net/blogs/yahoo-chat-captcha- check-to-remove-bots/

  24. Porta, A., et al.: Measuring regularity by means of a corrected conditional entropy in sympathetic outflow. Biol. Cybern. 78(1), 71–78 (1998)

    Article  Google Scholar 

  25. Quinlan, J.R.: Discovering Rules from Large Collections of Examples: A Case Study. Edinburgh University Press, Edinburgh (1979)

    Google Scholar 

  26. Zheng, N., Bai, K., Huang, H., Wang, H.: You are how you touch: user verification on smartphones via tapping behaviors. In: Proceedings of IEEE Conference on Network Protocol (ICNP 2014), Research Triangle Park, NC, USA, October 2014

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haining Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Chu, Z., Gianvecchio, S., Wang, H. (2018). Bot or Human? A Behavior-Based Online Bot Detection System. In: Samarati, P., Ray, I., Ray, I. (eds) From Database to Cyber Security. Lecture Notes in Computer Science(), vol 11170. Springer, Cham. https://doi.org/10.1007/978-3-030-04834-1_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04834-1_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04833-4

  • Online ISBN: 978-3-030-04834-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics