SAINT: Supervised Actor Identification for Network Tuning

Farrugia, Michael; Hurley, Neil; Quigley, Aaron

doi:10.1007/978-94-007-6359-3_6

Michael Farrugia⁵,
Neil Hurley⁵ &
Aaron Quigley⁶

Part of the book series: Lecture Notes in Social Networks ((LNSN))

2655 Accesses

Abstract

Whenever the actors of a social network are not uniquely identifiable in the data, then entity resolution in the form of actor identification becomes a critical facet of a social network construction process. Here we develop SAINT, a pipeline for supervised entity resolution that uses relational information to improve, or tune, the quality of the constructed network. The first phase of SAINT uses attribute only based entity resolution to create an initial social network. Relational information between actors, actor network properties and other relational output of the first classification phase, are used in a second phase to improve the results of the original entity resolution. When compared to single phased approaches, the results from this two phased approach are consistently superior in both recall and precision measures. Embedded within SAINT are a series of evaluation checkpoints designed to measure both the quality of the individual classifiers and their impact within the entire pipeline. Our evaluation results provide insight on the potential propagation of error and open research questions for further improvement of the individual classifiers within the entire pipeline. As the main application of the process is to improve actor identification in social networks, we characterise the impact that entity resolution has on the final constructed network. We compare the network constructed using SAINT with a ground truth network using perfect entity resolution and use global and local network measures to study the differences.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Adamic L, Adar E (2003) Friends and neighbors on the web. Soc Netw 25:211–230
Article Google Scholar
Ananthakrishna R, Chaudhuri S, Ganti V (2002) Eliminating fuzzy duplicates in data warehouses. In: Proceedings of the 28th international conference on very large data bases, VLDB’02, VLDB Endowment, pp 586–597
Chapter Google Scholar
Baeza-Yates R, Ribeiro-Neto B (1999) Modern information retrieval. ACM, New York
Google Scholar
Baxter R, Christen P, Churches T (2003) A comparison of fast blocking methods for record linkage. In: Proceedings of the KDD-2003 workshop on data cleaning, record linkage, and object consolidation, Washington DC, vol 3. pp 25–27
Google Scholar
Benjelloun O, Garcia-Molina H, Kawai H, Larson TE, Menestrina D, Su Q, Thavisomboon S, Widom J (2006) Generic entity resolution in the serf project. Technical Report 2006-14, Stanford InfoLab
Google Scholar
Bhattacharya I, Getoor L (2007) Collective entity resolution in relational data. ACM Trans Knowl Discov Data 1:5
Article Google Scholar
Bilgic M, Licamele L, Getoor L, Shneiderman B (2006) D-dupe: an interactive tool for entity resolution in social networks, 31 2006–Nov. 2, pp. 43–50.
Google Scholar
Blondel V, Guillaume J, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008:P10008
Article Google Scholar
Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Min Knowl Discov 2:121–167
Article ADS Google Scholar
Christen P (2006) A comparison of personal name matching: techniques and practical issues. Tech. Rep. TR-CS-06-02
Google Scholar
Christen P (2008) Automatic record linkage using seeded nearest neighbour and support vector machine classification. In: KDD ’08: proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, pp 151–159
Chapter Google Scholar
Christen P, Churches T, Hegland M (2004) Febrl – a parallel open source data linkage system. In: Dai H, Srikant R, Zhang C (eds) Advances in knowledge discovery and data mining. Lecture notes in computer science, vol 3056. Springer, Berlin, pp 638–647
Chapter Google Scholar
Cohen WW, Ravikumar P, Fienberg SE (2003) A comparison of string distance metrics for name-matching tasks, pp 73–78
Google Scholar
Dunn H (1946) Record linkage. Am J Publ Health 36:1412
Article Google Scholar
Elmagarmid AK, Ipeirotis PG, Verykios VS (2007) Duplicate record detection: a survey. IEEE Trans Knowl Data Eng 19:1–16
Article Google Scholar
Farrugia M, Quigley A (2009) Enhancing airline customer relationship management data by inferring ties between passengers. In: Proceedings of the international conference on social computing
Google Scholar
Farrugia M, Hurley N, Quigley A (2011) Snap: towards a validation of the social network assembly pipeline. In: International conference on advances in social network analysis and mining, pp 228–235
Chapter Google Scholar
Fellegi I, Sunter A (1969) A theory for record linkage. J Am Stat Assoc 64:1183–1210
Article Google Scholar
Hernández M, Stolfo S (1998) Real-world data is dirty: data cleansing and the merge/purge problem. Data Min Knowl Discov 2:9–37
Article Google Scholar
Hirschman L, Chinchor N (1997) Muc-7 coreference task definition – version 3.0
Google Scholar
Katz L (1953) A new status index derived from sociometric analysis. Psychometrika 18:39–43
Article MATH Google Scholar
Liben-Nowell D, Kleinberg J (2007) The link-prediction problem for social networks. J Am Soc Inf Sci Technol 58:1019–1031
Article Google Scholar
Lü L, Zhou T (2011) Link prediction in complex networks: a survey. Phys A, Stat Mech Appl 390(6):1150–1170
Article Google Scholar
Macskassy S, Provost F (2003) A simple relational classifier. In: Proc. of the 2nd workshop on multi-relational data mining (MRDM 03), pp 64–76
Google Scholar
Makrehchi M, Kamel M (2007) A text classification framework with a local feature ranking for learning social networks. In: 2007 seventh IEEE international conference on data mining, ICDM 2007, pp 589–594
Google Scholar
Menestrina D, Whang S, Garcia-Molina H (2010) Evaluating entity resolution results. Proc VLDB Endow 3:208–219
Google Scholar
Navarro G (2001) A guided tour to approximate string matching. ACM Comput Surv 33:31–88
Article Google Scholar
Newman M (2001) Scientific collaboration networks. I. Network construction and fundamental results. Phys Rev E 64:16131
Article ADS Google Scholar
Piatetsky-Shapiro G, Djeraba C, Getoor L, Grossman R, Feldman R, Zaki M (2006) What are the grand challenges for data mining, KDD-2006 panel report. ACM SIGKDD Explor Newsl 8:70–77
Article Google Scholar
Porter E, Winkler W, of the Census B, States U, Division SR (1997) Approximate string comparison and its effects on an advanced record linkage system. US Bureau of the Census
Google Scholar
Qiu J, Lin Z, Tang C, Qiao S (2009) Discovering organizational structure in dynamic social network. In: 2009 ninth IEEE International conference on data mining, ICDM ’09, pp 932–937
Chapter Google Scholar
Quercia D, Lathia N, Calabrese F, Di Lorenzo G, Crowcroft J (2010) Recommending social events from mobile phone location data. In: 2010 IEEE 10th international conference on data mining, ICDM, pp 971–976
Chapter Google Scholar
Scharstein D, Szeliski R (2002) A taxonomy and evaluation of dense two-frame stereo correspondence algorithms. Int J Comput Vis 47:7–42. Has 1205 citations
Article MATH Google Scholar
Tan P, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley, Reading
Google Scholar

Download references

Author information

Authors and Affiliations

University College Dublin, Dublin, Ireland
Michael Farrugia & Neil Hurley
University of St Andrews, St Andrews, Scotland, UK
Aaron Quigley

Authors

Michael Farrugia
View author publications
You can also search for this author in PubMed Google Scholar
Neil Hurley
View author publications
You can also search for this author in PubMed Google Scholar
Aaron Quigley
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Farrugia .

Editor information

Editors and Affiliations

Department of Computer Engineering, TOBB University, Sogutozu Cad No. 43, Sogutozu Ankara, Turkey
Tansel Özyer
Information Technologies Institute, TUBITAK BILGEM, Gebze, Kocaeli, 41470, Turkey
Zeki Erdem
Computer Science, University of Calgary, University Dr. NW 2500, Calgary, T2N 1N4, Canada
Jon Rokne
American University of Sharjah, Universities City, Sharjah, Saudi Arabia
Suheil Khoury

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Farrugia, M., Hurley, N., Quigley, A. (2013). SAINT: Supervised Actor Identification for Network Tuning. In: Özyer, T., Erdem, Z., Rokne, J., Khoury, S. (eds) Mining Social Networks and Security Informatics. Lecture Notes in Social Networks. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-6359-3_6

Download citation

DOI: https://doi.org/10.1007/978-94-007-6359-3_6
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-6358-6
Online ISBN: 978-94-007-6359-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics