Advertisement

Using Transfer Learning to Detect Phishing in Countries with a Small Population

  • Wernsen WongEmail author
  • Yun Sing KohEmail author
  • Gillian Dobbie
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1127)

Abstract

An increasing number of people are using social media services and with it comes a more attractive outlet for phishing attacks. Phishers curate tweets that lead users to websites that download malware. This is a major issue as phishers can gain access to the user’s digital identity and perform malicious acts. Phishing attacks also have a potential to be similar in different regions, perhaps at different time periods. We investigate the use of transfer learning to detect phishing models learned in one region to detect phishing in other regions. We use a semi-supervised algorithm to train a model on a US based dataset that we then apply to New Zealand. First, we evaluate how effectively transfer learning can be used in different regions to detect potential phishing attacks on online social networks in real time. Secondly, we investigate the different phishing attacks and discuss the differences in phishing attack features detected for different countries. We have collected a real world Twitter dataset over 6 months and show that we are able to detect phishing successfully using US phishing models despite only a low level of phishing occurring in smaller populations such as New Zealand.

Keywords

Phishing detection Transfer learning Model transfer 

Notes

Acknowledgements

This research is supported by InternetNZ (Grant No:IR170017).

References

  1. 1.
    Aggarwal, A., Rajadesingan, A., Kumaraguru, P.: PhishAri: automatic realtime phishing detection on Twitter. In: 2012 eCrime Researchers Summit, pp. 1–12. IEEE (2012)Google Scholar
  2. 2.
    Al-Stouhi, S., Reddy, C.K.: Adaptive boosting for transfer learning using dynamic updates. In: Gunopulos, D., Hofmann, T., Malerba, D., Vazirgiannis, M. (eds.) ECML PKDD 2011. LNCS (LNAI), vol. 6911, pp. 60–75. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-23780-5_14CrossRefGoogle Scholar
  3. 3.
    Arnold, A., Nallapati, R., Cohen, W.W.: A comparative study of methods for transductive transfer learning. In: Seventh IEEE International Conference on Data Mining Workshops, ICDMW 2007, pp. 77–82, October 2007Google Scholar
  4. 4.
    Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on Twitter. In: Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference (CEAS), vol. 6, p. 12 (2010)Google Scholar
  5. 5.
    Dai, W., Yang, Q., Xue, G.R., Yu, Y.: Boosting for transfer learning. In: Proceedings of the 24th ICML, pp. 193–200. ACM, New York (2007)Google Scholar
  6. 6.
    Dai, W., Yang, Q., Xue, G.R., Yu, Y.: Self-taught clustering. In: Proceedings of the 25th ICML, pp. 200–207. ACM, New York (2008)Google Scholar
  7. 7.
    Dudani, S.A.: The distance-weighted k-nearest-neighbor rule. IEEE Trans. Syst. Man Cybern. 4, 325–327 (1976)CrossRefGoogle Scholar
  8. 8.
    Farhadi, A., Forsyth, D., White, R.: Transfer learning in sign language. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2007)Google Scholar
  9. 9.
    Go, A., Huang, L., Bhayani, R.: Twitter sentiment analysis. Entropy 17, 252 (2009)Google Scholar
  10. 10.
    Jeong, S.Y., Koh, Y.S., Dobbie, G.: Phishing detection on Twitter streams. In: Cao, H., Li, J., Wang, R. (eds.) PAKDD 2016. LNCS (LNAI), vol. 9794, pp. 141–153. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-42996-0_12CrossRefGoogle Scholar
  11. 11.
    Liu, F.T., Ting, K.M., Zhou, Z.H.: Isolation forest. In: 2008 Eighth IEEE International Conference on Data Mining, pp. 413–422. IEEE (2008)Google Scholar
  12. 12.
    Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)CrossRefGoogle Scholar
  13. 13.
    Smith, K.: 58 incredible and interesting twitter stats and statistics (2019). https://www.brandwatch.com/blog/twitter-stats-and-statistics
  14. 14.
    Wang, P., Domeniconi, C., Hu, J.: Using wikipedia for co-clustering based cross-domain text classification. In: Eighth IEEE International Conference on Data Mining, pp. 1085–1090. IEEE (2008)Google Scholar
  15. 15.
    @yoyoel, @delbius: How Twitter is fighting spam and malicious automation (2018). https://blog.twitter.com/official/en_us/topics/company/2018/how-twitter-is-fighting-spam-and-malicious-automation.html
  16. 16.
    Zangerle, E., Specht, G.: Sorry, I was hacked: a classification of compromised Twitter accounts. In: Proceedings of the 29th Annual ACM Symposium on Applied Computing, pp. 587–593. ACM (2014)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.School of Computer ScienceThe University of AucklandAucklandNew Zealand

Personalised recommendations