Breaking Anonymity of Social Network Accounts by Using Coordinated and Extensible Classifiers Based on Machine Learning

  • Eina Hashimoto
  • Masatsugu Ichino
  • Tetsuji Kuboyama
  • Isao Echizen
  • Hiroshi YoshiuraEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9844)


A method for de-anonymizing social network accounts is presented to clarify the privacy risks of such accounts as well as to deter their misuse such as by posting copyrighted, offensive, or bullying contents. In contrast to previous de-anonymization methods, which link accounts to other accounts, the presented method links accounts to resumes, which directly represent identities. The difficulty in using machine learning for de-anonymization, i.e. preparing positive examples of training data, is overcome by decomposing the learning problem into subproblems for which training data can be harvested from the Internet. Evaluation using 3 learning algorithms, 2 kinds of sentence features, 238 learned classifiers, 2 methods for fusing scores from the classifiers, and 30 volunteers’ accounts and resumes demonstrated that the proposed method is effective. Because the training data are harvested from the Internet, the more information that is available on the Internet, the greater the effectiveness of the presented method.


Social network Privacy de-anonymization re-identification 


  1. 1.
    Gurses, S., Rizk, R., Gunther, O.: Privacy design in online social networks: learning from privacy breaches and community feedback. In: Proceedings of 29th International Conference on Information Systems, pp.1–10, Paris (2008)Google Scholar
  2. 2.
    Mixi: Infographics for finding out the newest data of mixi. (in Japanese)
  3. 3.
    Narayanan, A., Shmatikov, V.: De-anonymizing social networks. In: Proceedings of 30th IEEE Security & Privacy, pp.173–187, Oakland (2009)Google Scholar
  4. 4.
    Goga, O., Lei, H. et al.: On exploiting innocuous user activity for correlating accounts across social network sites. ICSI Technical reports - University of Berkeley (2012)Google Scholar
  5. 5.
    Almishari M, Kaafar, M., et al.: Stylometric linkability of Tweets. In: Proceedings of 13th Workshop on Privacy in the Electronic Society, pp.205–208, Scottsdale (2014)Google Scholar
  6. 6.
    Narayanan A., Paskov, H., et al.: On the feasibility of internet-scale author identification. In: Proceedings of 33rd IEEE Symposium on Security and Privacy, pp.300–314, San Francisco (2012)Google Scholar
  7. 7.
    Backstrom, R., Dwork, C., Kleinberg, J.: Wherefore art thou R3579X? anonymized social networks, hidden patterns, and structural steganography. In: Proceedings of 16th International World Wide Web Conference, pp. 181–190, Banff (2007)Google Scholar
  8. 8.
    Lam, I.-F., Chen, K.-T., Chen, L.-J.: Involuntary information leakage in social network services. In: Matsuura, K., Fujisaki, E. (eds.) IWSEC 2008. LNCS, vol. 5312, pp. 167–183. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  9. 9.
    Mao, H., Shuai, X., Kapadia, A.: Loose tweets: an analysis of privacy leaks on Twitter. In: Proceedings of 10th ACM Workshop on Privacy in the Electronic Society, Denver (2011)Google Scholar
  10. 10.
    Kótyuk, G., Buttyan, L.: A machine learning based approach for predicting undisclosed attributes in social networks. In: Proceedings of IEEE 4th International Workshop on Security and Social Networking, pp.361–366, Budapest (2012)Google Scholar
  11. 11.
    Polakis, I., Kontaxis, G., et al.: Using social networks to harvest email addresses. In: Proceedings of 9th ACM Workshop on Privacy in Electronic Society, pp.11–20, Chicago (2010)Google Scholar
  12. 12.
    TwiPro: Searching profiles of Twitter users. (In Japanese)
  13. 13.
    Caliskan-Islam, A., Walsh, J., Greenstadt, R.: Privacy detective: detecting private information and collective privacy behavior in a large social network. In: Proceedings of 13th Workshop on Privacy in the Electronic Society, pp. 35–46, Scottsdale (2014)Google Scholar
  14. 14.
    Hart, M., Manadhata, P., Johnson, R.: Text classification for data loss prevention. In: Fischer-Hübner, S., Hopper, N. (eds.) PETS 2011. LNCS, vol. 6794, pp. 18–37. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  15. 15.
    Burger, J., Henderson J., et al.: Discriminating gender on Twitter. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, pp.1301–1309, Edinburgh (2011)Google Scholar
  16. 16.
    Cheng, Z., Caverlee, J., Lee, K.: You are where you Tweet: a content-based approach to geo-locating Twitter users. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management, pp.759–768, Toronto (2010)Google Scholar
  17. 17.
    Pennacchiotti, M., Popescu, A.-M.: A machine learning approach to Twitter user classification. In: Proceedings of 5th International AAAI Conference on Weblogs and Social Media, pp. 281–288, Barcelona (2011)Google Scholar

Copyright information

© IFIP International Federation for Information Processing 2016

Authors and Affiliations

  • Eina Hashimoto
    • 1
  • Masatsugu Ichino
    • 1
  • Tetsuji Kuboyama
    • 2
  • Isao Echizen
    • 3
  • Hiroshi Yoshiura
    • 1
    Email author
  1. 1.University of Electro-CommunicationsTokyoJapan
  2. 2.Gakushuin UniversityTokyoJapan
  3. 3.National Institute of InformaticsTokyoJapan

Personalised recommendations