World Wide Web

, Volume 22, Issue 4, pp 1447–1480 | Cite as

Target oriented network intelligence collection: effective exploration of social networks

  • Rami Puzis
  • Liron Kachko
  • Barak Hagbi
  • Roni SternEmail author
  • Ariel Felner


Target Oriented Network Intelligence Collection (TONIC) is a crawling process whose goal is to find social network profiles that contain information about a given target. Such profiles are called leads and the TONIC problem is how to minimize crawling costs incurred while finding them. We model this problem as a search problem in an unknown graph and present a best-first search approach for solving it. Three key challenges are (1) which profiles to consider crawling to, (2) how to prioritize the crawling order, and (3) when additional crawling is not worthwhile. For the first challenge, we propose two frameworks: the Restricted TONIC Framework (RTF), that restricts the search to immediate neighbors of previously found leads, and the Extended TONIC Framework (ETF), that extends the scope of the search to a wider neighborhood. Guidelines for when to choose which framework are provided. For the second challenge, we propose a set of effective topology-based heuristics that guide the search towards profiles that are more likely to be leads. For the third challenge, we propose to use data collected in previously executed crawls to learn when additional crawling is expected to be useful.


Artificial intelligence Heuristic search Online social networks 



  1. 1.
    Adamic, L.A., Lukose, R.M., Puniyani, A.R., Huberman, B.A.: Search in power-law networks. Phys. Rev. E 64, 046135 (2001)Google Scholar
  2. 2.
    Aggarwal, C.C., Al-Garawi, F., Yu, P.S.: Intelligent crawling on the world wide web with arbitrary predicates. In: Proceedings of the 10th international conference on World Wide Web. ACM, pp. 96–105 (2001)Google Scholar
  3. 3.
    Almpanidis, G., Kotropoulos, C., Pitas, I.: Combining text and link analysis for focused crawling—an application for vertical search engines. Inf. Syst. 32(6), 886–908 (2007)Google Scholar
  4. 4.
    Altshuler, Y., Aharony, N., Fire, M., Elovici, Y., Pentland, A.: Incremental learning with accuracy prediction of social and individual properties from mobile-phone data, CoRR, vol. arXiv:1111.4645. [Online]. Available: (2011)
  5. 5.
    Altshuler, Y., Elovici, Y., Cremers, A.B., Aharony, N., Pentland, A.: Security and Privacy in Social Networks. Springer, Berlin (2012)Google Scholar
  6. 6.
    Backstrom, L., Huttenlocher, D., Kleinberg, J., Lan, X.: Group formation in large social networks: Membership, growth, and evolution. In: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 44–54 (2006)Google Scholar
  7. 7.
    Barabási, A.-L., Réka, A.: Emergence of scaling in random networks. Science 286(5439), 509–512 (1999)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Bidoki, A.M.Z., Yazdani, N., Ghodsnia, P.: FICA: A fast intelligent crawling algorithm. In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. IEEE Computer Society, pp. 635–641 (2007)Google Scholar
  9. 9.
    Bnaya, Z., Puzis, R., Stern, R., Felner, A.: Social network search as a volatile multi-armed bandit problem. ASE Human 2(2), pp–84 (2013)Google Scholar
  10. 10.
    Bujlow, T., Carela-Español, V., Sole-Pareta, J., Barlet-Ros, P.: A survey on web tracking: mechanisms, implications, and defenses. Proc. IEEE 105(8), 1476–1510 (2017)Google Scholar
  11. 11.
    Cai, R., Yang, J.-M., Lai, W., Wang, Y., Zhang, L.: irobot: An intelligent crawler for web forums. In: Proceedings of the 17th international conference on World Wide Web. ACM, pp. 447–456 (2008)Google Scholar
  12. 12.
    Chakrabarti, S., Van den Berg, M., Dom, B.: Focused crawling: a new approach to topic-specific web resource discovery. Comput. Netw. 31(11), 1623–1640 (1999)Google Scholar
  13. 13.
    Chang, C., Kayed, M., Girgis, M., Shaalan, K., et al.: A survey of web information extraction systems. IEEE Trans. Knowl. Data Eng. 18(10), 1411 (2006)Google Scholar
  14. 14.
    Chen, Z., Ma, J., Lei, J., Yuan, B., Lian, L.: An improved shark-search algorithm based on multi-information. In: 2007. FSKD 2007. Fourth International Conference on Fuzzy Systems and Knowledge Discovery. IEEE, vol. 4, pp. 659–658 (2007)Google Scholar
  15. 15.
    Chen, T., Guestrin, C.: XGBoost: A scalable tree boosting system. In: ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 785–794 (2016)Google Scholar
  16. 16.
    Cho, J., Garcia-Molina, H., Page, L.: Efficient crawling through url ordering. Comput. Netw. ISDN Syst. 30, 161–172 (1998)Google Scholar
  17. 17.
    Croft, W., Metzler, D., Strohman, T.: Search engines: Information retrieval in practice. Addison-Wesley, Reading (2010)Google Scholar
  18. 18.
    Davis, D., Lichtenwalter, R., Chawla, N.V.: Multi-relational link prediction in heterogeneous information networks. In: 2011 International Conference on Advances in Social Networks Analysis and Mining (ASONAM). IEEE, pp. 281–288 (2011)Google Scholar
  19. 19.
    De Bra, P., Post, R.: Searching for Arbitrary Information in the Www: the Fish-Search for Mosaic. In: WWW (1994)Google Scholar
  20. 20.
    Diligenti, M., Coetzee, F., Lawrence, S., Giles, C.L., Gori, M., et al.: Focused crawling using context graphs. In: VLDB, pp. 527–534 (2000)Google Scholar
  21. 21.
    Dong, Y., Tang, J., Wu, S., Tian, J., Chawla, N.V., Rao, J., Cao, H.: Link prediction and recommendation across heterogeneous social networks. In: 2012 IEEE 12th International Conference on Data Mining. IEEE, pp. 181–190 (2012)Google Scholar
  22. 22.
    Ermakova, T., Fabian, B., Bender, B., Klimek, K.: Web Tracking – a Literature Review on the State of Research. In: HICSS 51 (2018)Google Scholar
  23. 23.
    Felner, A., Stern, R., Ben-Yair, A., Kraus, S., Netanyahu, N.: PhA*: Finding the shortest path with A* in unknown physical environments. J. Artif. Intell. Res. 21, 631–679 (2004)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Fire, M., Tenenboim, L., Lesser, O., Puzis, R., Rokach, L., Elovici, Y.: Link prediction in social networks using computationally efficient topological features. In: IEEE international conference on social computing (SocialCom), pp. 73–80 (2011)Google Scholar
  25. 25.
    Fire, M., Katz, G., Elovici, Y., Shapira, B., Rokach, L.: Predicting student exam’s scores by analyzing social network data. In: AMT, pp. 584–595 (2012)Google Scholar
  26. 26.
    Fire, M., Tenenboim-Chekina, L., Puzis, R., Lesser, O., Rokach, L., Elovici, Y.: Computationally efficient link prediction in a variety of social networks. ACM Trans Intell Syst Technol (TIST) 5(1), 10 (2013)Google Scholar
  27. 27.
    Fire, M., Tenenboim-Chekina, L., Puzis, R., Lesser, O., Rokach, L., Elovici, Y.: Computationally efficient link prediction in a variety of social networks, ACM Trans. Intell. Syst. Technol. 5(1), 1–25 (2014)Google Scholar
  28. 28.
    Gjoka, M., Kurant, M., Butts, C.T., Markopoulou, A.: Walking in facebook: A case study of unbiased sampling of osns. In: INFOCOM, pp. 1–9 (2010)Google Scholar
  29. 29.
    Hersovici, M., Jacovi, M., Maarek, Y.S., Pelleg, D., Shtalhaim, M., Ur, S.: The shark-search algorithm. an application: tailored web site mapping. Comput. Netw. ISDN Syst. 30(1), 317–326 (1998)Google Scholar
  30. 30.
    Jarvelin, K., Kekalainen, J.: Cumulated gain-based evaluation of ir techniques. ACM Trans. Inf Syst 20(4), 422–446 (2002)Google Scholar
  31. 31.
    Katz, L.: A new status index derived from sociometric analysis. Psychometrika 18(1), 39–43 (1953)zbMATHGoogle Scholar
  32. 32.
    Kleinberg, J.M.: Authoritative sources in a hyperlinked environment. J. ACM 46(5), 604–632 (1999)MathSciNetzbMATHGoogle Scholar
  33. 33.
    Klerks, P.: The network paradigm applied to criminal organizations: Theoretical nitpicking or a relevant doctrine for investigators? recent developments in the netherlands. Connections 24(3), 53–65 (2001)Google Scholar
  34. 34.
    Kurant, M., Gjoka, M., Butts, C.T., Markopoulou, A.: Walking on a graph with a magnifying glass: Stratified sampling via weighted random walks. In: ACM Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), pp. 281–292 (2011)Google Scholar
  35. 35.
    Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in large networks: Natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6(1), 29–123 (2009)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Li, X., Smith, J.D., Dinh, T.N., Thai, M.T.: Privacy issues in light of reconnaissance attacks with incomplete information. In: IEEE/WIC/ACM International Conference on Web Intelligence (WI), pp. 311–318 (2016)Google Scholar
  37. 37.
    Li, X., Smith, J.D., Thai, M.T.: Adaptive reconnaissance attacks with near-optimal parallel batching. In: 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE. pp. 699–709 (2017)Google Scholar
  38. 38.
    Liben-Nowell, D., Kleinberg, J.: The link-prediction problem for social networks. J Amer Soc Inf Sci Technol 58(7), 1019–1031 (2007)Google Scholar
  39. 39.
    McPherson, M., Smith-Lovin, L., Cook, J.M.: Birds of a feather: Homophily in social networks. Annu. Rev. Sociol. 27(1), 415–444 (2001)Google Scholar
  40. 40.
    Menczer, F., Pant, G., Srinivasan, P., Ruiz, M.E.: Evaluating topic-driven web crawlers. In: Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval. ACM, pp. 241–249 (2001)Google Scholar
  41. 41.
    Mislove, A., Viswanath, B., Gummadi, K.P., Druschel, P.: You are who you know: inferring user profiles in online social networks. In: Proceedings of the third ACM international conference on Web search and data mining. ACM, pp. 251–260 (2010)Google Scholar
  42. 42.
    Mitchell, T.M.: Machine learning. McGraw-Hill, McGraw-Hill (1997)zbMATHGoogle Scholar
  43. 43.
    Pawlas, P., Domański, A., Domańska, J.: Universal web pages content parser. In: Computer Networks. Springer, pp. 130–138 (2012)Google Scholar
  44. 44.
    Russell, S.J., Norvig, P.: Artificial intelligence - A modern approach pearson education (2010)Google Scholar
  45. 45.
    Samama-Kachko, L., Puzis, R., Stern, R., Felner, A.: Extended Framework for Target Oriented Network Intelligence Collection. In: Symposium on Combinatorial Search (SoCS) (2014)Google Scholar
  46. 46.
    Stern, R., Kalech, M., Felner, A.: Searching for a K-Clique in Unknown Graphs. In: SOCS (2010)Google Scholar
  47. 47.
    Stern, R.: Finding patterns in an unknown graph. AI Commun. 25(3), 229–256 (2012)MathSciNetzbMATHGoogle Scholar
  48. 48.
    Stern, R.T., Samama, L., Puzis, R., Beja, T., Bnaya, Z., Felner, A.: TONIC Target Oriented Network Intelligence Collection for the Social Web. In: AAAI (2013)Google Scholar
  49. 49.
    Takac, L., Zabovsky, M.: Data analysis in public social networks. In: International Scientific Conference and International Workshop Present Day Trends of Innovations, pp. 1–6 (2012)Google Scholar
  50. 50.
    Tang, J., Lou, T., Kleinberg, J.: Inferring social ties across heterogenous networks. In: Proceedings of the fifth ACM international conference on Web search and data mining. ACM, pp. 743–752 (2012)Google Scholar
  51. 51.
    Tang, J., Yao, L., Zhang, D., Zhang, J.: A combination approach to web user profiling. ACM Trans. Knowl. Discov. Data 5(1), 2:1–2:44 (2010)Google Scholar
  52. 52.
    Vempaty, N.R., Kumar, V., Korf, R.E.: Depth-first vs best-first search. In: National Conference on Artificial Intelligence (AAAI), pp. 434–440 (1991)Google Scholar
  53. 53.
    Wang, W., Chen, X., Zou, Y., Wang, H., Dai, Z.: A focused crawler based on naive bayes classifier. In: 2010 Third International Symposium on Intelligent Information Technology and Security Informatics (IITSI). IEEE, pp. 517–521 (2010)Google Scholar
  54. 54.
    Watts, D.J., Strogatz, S.: Collective dynamics of ’small-world’ networks. Nature 393, 6684 (1998)zbMATHGoogle Scholar
  55. 55.
    Zilberstein, S.: Using anytime algorithms in intelligent systems. AI Mag. 17(3), 73–83 (1996)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Software and Information Systems EngineeringBen-Gurion University of the NegevBe’er ShevaIsrael

Personalised recommendations