Skip to main content

SAINT: Supervised Actor Identification for Network Tuning

  • Chapter
Mining Social Networks and Security Informatics

Part of the book series: Lecture Notes in Social Networks ((LNSN))

  • 2655 Accesses

Abstract

Whenever the actors of a social network are not uniquely identifiable in the data, then entity resolution in the form of actor identification becomes a critical facet of a social network construction process. Here we develop SAINT, a pipeline for supervised entity resolution that uses relational information to improve, or tune, the quality of the constructed network. The first phase of SAINT uses attribute only based entity resolution to create an initial social network. Relational information between actors, actor network properties and other relational output of the first classification phase, are used in a second phase to improve the results of the original entity resolution. When compared to single phased approaches, the results from this two phased approach are consistently superior in both recall and precision measures. Embedded within SAINT are a series of evaluation checkpoints designed to measure both the quality of the individual classifiers and their impact within the entire pipeline. Our evaluation results provide insight on the potential propagation of error and open research questions for further improvement of the individual classifiers within the entire pipeline. As the main application of the process is to improve actor identification in social networks, we characterise the impact that entity resolution has on the final constructed network. We compare the network constructed using SAINT with a ground truth network using perfect entity resolution and use global and local network measures to study the differences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Adamic L, Adar E (2003) Friends and neighbors on the web. Soc Netw 25:211–230

    Article  Google Scholar 

  2. Ananthakrishna R, Chaudhuri S, Ganti V (2002) Eliminating fuzzy duplicates in data warehouses. In: Proceedings of the 28th international conference on very large data bases, VLDB’02, VLDB Endowment, pp 586–597

    Chapter  Google Scholar 

  3. Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. ACM, New York

    Google Scholar 

  4. Baxter R, Christen P, Churches T (2003) A comparison of fast blocking methods for record linkage. In: Proceedings of the KDD-2003 workshop on data cleaning, record linkage, and object consolidation, Washington DC, vol 3. pp 25–27

    Google Scholar 

  5. Benjelloun O, Garcia-Molina H, Kawai H, Larson TE, Menestrina D, Su Q, Thavisomboon S, Widom J (2006) Generic entity resolution in the serf project. Technical Report 2006-14, Stanford InfoLab

    Google Scholar 

  6. Bhattacharya I, Getoor L (2007) Collective entity resolution in relational data. ACM Trans Knowl Discov Data 1:5

    Article  Google Scholar 

  7. Bilgic M, Licamele L, Getoor L, Shneiderman B (2006) D-dupe: an interactive tool for entity resolution in social networks, 31 2006–Nov. 2, pp. 43–50.

    Google Scholar 

  8. Blondel V, Guillaume J, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008:P10008

    Article  Google Scholar 

  9. Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167

    Article  ADS  Google Scholar 

  10. Christen P (2006) A comparison of personal name matching: techniques and practical issues. Tech. Rep. TR-CS-06-02

    Google Scholar 

  11. Christen P (2008) Automatic record linkage using seeded nearest neighbour and support vector machine classification. In: KDD ’08: proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 151–159

    Chapter  Google Scholar 

  12. Christen P, Churches T, Hegland M (2004) Febrl – a parallel open source data linkage system. In: Dai H, Srikant R, Zhang C (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science, vol 3056. Springer, Berlin, pp 638–647

    Chapter  Google Scholar 

  13. Cohen WW, Ravikumar P, Fienberg SE (2003) A comparison of string distance metrics for name-matching tasks, pp 73–78

    Google Scholar 

  14. Dunn H (1946) Record linkage. Am J Publ Health 36:1412

    Article  Google Scholar 

  15. Elmagarmid AK, Ipeirotis PG, Verykios VS (2007) Duplicate record detection: a survey. IEEE Trans Knowl Data Eng 19:1–16

    Article  Google Scholar 

  16. Farrugia M, Quigley A (2009) Enhancing airline customer relationship management data by inferring ties between passengers. In: Proceedings of the international conference on social computing

    Google Scholar 

  17. Farrugia M, Hurley N, Quigley A (2011) Snap: towards a validation of the social network assembly pipeline. In: International conference on advances in social network analysis and mining, pp 228–235

    Chapter  Google Scholar 

  18. Fellegi I, Sunter A (1969) A theory for record linkage. J Am Stat Assoc 64:1183–1210

    Article  Google Scholar 

  19. Hernández M, Stolfo S (1998) Real-world data is dirty: data cleansing and the merge/purge problem. Data Min Knowl Discov 2:9–37

    Article  Google Scholar 

  20. Hirschman L, Chinchor N (1997) Muc-7 coreference task definition – version 3.0

    Google Scholar 

  21. Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18:39–43

    Article  MATH  Google Scholar 

  22. Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58:1019–1031

    Article  Google Scholar 

  23. Lü L, Zhou T (2011) Link prediction in complex networks: a survey. Phys A, Stat Mech Appl 390(6):1150–1170

    Article  Google Scholar 

  24. Macskassy S, Provost F (2003) A simple relational classifier. In: Proc. of the 2nd workshop on multi-relational data mining (MRDM 03), pp 64–76

    Google Scholar 

  25. Makrehchi M, Kamel M (2007) A text classification framework with a local feature ranking for learning social networks. In: 2007 seventh IEEE international conference on data mining, ICDM 2007, pp 589–594

    Google Scholar 

  26. Menestrina D, Whang S, Garcia-Molina H (2010) Evaluating entity resolution results. Proc VLDB Endow 3:208–219

    Google Scholar 

  27. Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33:31–88

    Article  Google Scholar 

  28. Newman M (2001) Scientific collaboration networks. I. Network construction and fundamental results. Phys Rev E 64:16131

    Article  ADS  Google Scholar 

  29. Piatetsky-Shapiro G, Djeraba C, Getoor L, Grossman R, Feldman R, Zaki M (2006) What are the grand challenges for data mining, KDD-2006 panel report. ACM SIGKDD Explor Newsl 8:70–77

    Article  Google Scholar 

  30. Porter E, Winkler W, of the Census B, States U, Division SR (1997) Approximate string comparison and its effects on an advanced record linkage system. US Bureau of the Census

    Google Scholar 

  31. Qiu J, Lin Z, Tang C, Qiao S (2009) Discovering organizational structure in dynamic social network. In: 2009 ninth IEEE International conference on data mining, ICDM ’09, pp 932–937

    Chapter  Google Scholar 

  32. Quercia D, Lathia N, Calabrese F, Di Lorenzo G, Crowcroft J (2010) Recommending social events from mobile phone location data. In: 2010 IEEE 10th international conference on data mining, ICDM, pp 971–976

    Chapter  Google Scholar 

  33. Scharstein D, Szeliski R (2002) A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int J Comput Vis 47:7–42. Has 1205 citations

    Article  MATH  Google Scholar 

  34. Tan P, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley, Reading

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Farrugia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Farrugia, M., Hurley, N., Quigley, A. (2013). SAINT: Supervised Actor Identification for Network Tuning. In: Özyer, T., Erdem, Z., Rokne, J., Khoury, S. (eds) Mining Social Networks and Security Informatics. Lecture Notes in Social Networks. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-6359-3_6

Download citation

  • DOI: https://doi.org/10.1007/978-94-007-6359-3_6

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-007-6358-6

  • Online ISBN: 978-94-007-6359-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics