Internet Identity Analysis and Similarities Detection

  • Krzysztof Wilaszek
  • Tomasz Wójcik
  • Andrzej Opaliński
  • Wojciech Turek
Part of the Communications in Computer and Information Science book series (CCIS, volume 287)


Growing popularity of Web 2.0 systems created huge set of publicly available data, which is continuously expanded by users of the Internet. The anonymity of publications in Web systems encourages some users to publish false or illegal statements. Tools for identifying portal users, who publish such posts could result in higher quality of information and could be useful for law enforcement services. In this paper a method for finding similar Internet identities is introduced. Detected similarities can be used for finding several accounts of the same person. The method is based on calculating various measures characterizing forums users. It uses Web crawling system to collect data from forums. A prototype system for finding similar users is described and tests results are presented.


Web crawling identity analysis text processing 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Opalinski, A., Turek, W.: Information retrieval and identity analysis. In: Metody sztucznej inteligencji w dzialaniach na rzecz bezpieczenstwa publicznego, pp. 173–194 (2009) ISBN 978-83-7464-268-2Google Scholar
  2. 2.
    Chen, H., Chung, W., Xu, J.J., Wang, G., Qin, Y., Chau, M.C.L.: Crime data mining: a general framework and some examples. Computer 37(4), 50–56 (2004)CrossRefGoogle Scholar
  3. 3.
    Chang, W., Chung, W., Chen, H., Chou, S.: An International Perspective on Fighting Cybercrime. In: Chen, H., Miranda, R., Zeng, D.D., Demchak, C.C., Schroeder, J., Madhusudan, T. (eds.) ISI 2003. LNCS, vol. 2665, pp. 379–384. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  4. 4.
    Ott, M., Choi, Y., Cardie, C., Hancock, J.T.: Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, vol. 1, pp. 309–319. Association for Computational Linguistics, Stroudsburg (2011)Google Scholar
  5. 5.
    Jindal, N., Liu, B.: Opinion spam and analysis. In: Proceedings of the International Conference on Web Search and Web Data Mining (WSDM 2008), pp. 219–230. ACM, New York (2008)CrossRefGoogle Scholar
  6. 6.
    Abbasi, A., Chen, H.: Writeprints: A stylometric approach to identity-level identification and similarity detection in cyberspace. ACM Trans. Inf. Syst. 26(2), Article 7 (2008)Google Scholar
  7. 7.
    Turney, P.D.: Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL 2002), pp. 417–424. Association for Computational Linguistics, Stroudsburg (2002)Google Scholar
  8. 8.
    Wiebe, J., Wilson, T., Cardie, C.: Annotating expressions of opinions and emotions in language. Lang Resources and Evaluation 39, 65–210 (2005)CrossRefGoogle Scholar
  9. 9.
    Pang, B., Lee, L.: A sentimental education: sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics (ACL 2004), Article 271. Association for Computational Linguistics, Stroudsburg (2004)Google Scholar
  10. 10.
    Slaninova, K., Martinovic, J., Drazdilova, P., Obadi, G., Snasel, V.: Analysis of Social Networks Extracted from Log Files. In: Handbook of Social Network Technologies and Applications, Part 1, pp. 115–146 (2010)Google Scholar
  11. 11.
    Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications. Cambridge University Press, Cambridge (1994)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Krzysztof Wilaszek
    • 1
  • Tomasz Wójcik
    • 1
  • Andrzej Opaliński
    • 1
  • Wojciech Turek
    • 1
  1. 1.AGH University of Science and TechnologyKrakowPoland

Personalised recommendations