Advertisement

Unsupervised Network Alignment

  • Jiawei Zhang
  • Philip S. Yu
Chapter

Abstract

Identifying the common users shared by different online social sites is a very hard task even for humans. Manually labeling of the anchor links can be extremely challenging, expensive (in human efforts, time, and money costs), and tedious, and the scale of the real-world online social networks involving millions even billions of users also renders the training data labeling much more difficult. In this chapter, we will introduce several approaches to resolve the network alignment problem based on the unsupervised learning setting instead, where no labeled training data will be needed in model building.

References

  1. 1.
    L. Adamic, R. Lukose, A. Puniyani, B. Huberman, Search in power-law networks. Phys. Rev. E 64, 046135 (2001). cs.NI/0103016Google Scholar
  2. 2.
    Y. Aflaloa, A. Bronsteinb, R. Kimmel, On convex relaxation of graph isomorphism. Proc. Natl. Acad. Sci. U S A 112(10), 2942–2947 (2015)MathSciNetCrossRefGoogle Scholar
  3. 3.
    M. Avriel, Nonlinear Programming: Analysis and Methods (Prentice-Hall, Englewood Cliffs, 1976)zbMATHGoogle Scholar
  4. 4.
    L. Babai, Graph isomorphism in quasipolynomial time. CoRR, abs/1512.03547 (2015)Google Scholar
  5. 5.
    R. Baeza-Yates, B. Ribeiro-Neto, Modern Information Retrieval (Addison-Wesley Longman Publishing Co., Inc., Boston, 1999)Google Scholar
  6. 6.
    C. Bettstetter, On the minimum node degree and connectivity of a wireless multihop network, in Proceedings of the 3rd ACM International Symposium on Mobile Ad Hoc Networking & Computing (MobiHoc ’02) (ACM, New York, 2002)Google Scholar
  7. 7.
    S. Borgatti, M. Everett, A graph-theoretic perspective on centrality. Soc. Net. 28(4), 466–484 (2006)CrossRefGoogle Scholar
  8. 8.
    W. Cavnar, J. Trenkle, N-gram-based text categorization, in Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval (1994)Google Scholar
  9. 9.
    D. Chandler, The norm of the Schur product operation. Numer. Math. 4(1), 343–344 (1962)MathSciNetzbMATHCrossRefGoogle Scholar
  10. 10.
    M. Charikar, Similarity estimation techniques from rounding algorithms, in Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing (STOC ’02) (ACM, New York, 2002)Google Scholar
  11. 11.
    W. Cohen, P. Ravikumar, S. Fienberg, A comparison of string distance metrics for name-matching tasks, in Proceedings of the 2003 International Conference on Information Integration on the Web (IIWEB’03) (AAAI Press, Palo Alto, 2003)Google Scholar
  12. 12.
    D. Corneil, C. Gotlieb, An efficient algorithm for graph isomorphism. J. ACM 17(1), 51–64 (1970)MathSciNetzbMATHCrossRefGoogle Scholar
  13. 13.
    E. Dinic, Algorithm for solution of a problem of maximum flow in a network with power estimation. Sov. Math. Dokl. 11, 1277–1280 (1970)Google Scholar
  14. 14.
    J. Edmonds, R. Karp, Theoretical improvements in algorithmic efficiency for network flow problems. J. ACM 19(2), 248–264 (1972)zbMATHCrossRefGoogle Scholar
  15. 15.
    M. Garey, D. Johnson, Computers and Intractability; A Guide to the Theory of NP-Completeness (W. H. Freeman & Co., New York, 1990)zbMATHGoogle Scholar
  16. 16.
    A.A. Goldberg, S. Rao, Beyond the flow decomposition barrier. J. ACM 45(5), 783–797 (1998)MathSciNetzbMATHCrossRefGoogle Scholar
  17. 17.
    A. Goldberg, R. Tarjan, A new approach to the maximum-flow problem. J. ACM 35(4), 921–940 (1988)MathSciNetzbMATHCrossRefGoogle Scholar
  18. 18.
    W. Gomaa, A. Fahmy, Article: a survey of text similarity approaches. Int. J. Comput. Appl. 68(3), 13–18 (2013)Google Scholar
  19. 19.
    Z. Harris, Distributional structure. Word 10(23), 146–162 (1954)CrossRefGoogle Scholar
  20. 20.
    T. Harris, F. Ross, Fundamentals of a Method for Evaluating Rail Net Capacities. Research Memorandum (The RAND Corporation, Santa Monica, 1955)Google Scholar
  21. 21.
    D. Hirschberg, Algorithms for the longest common subsequence problem. J. ACM 24(4), 664–675 (1977)MathSciNetzbMATHCrossRefGoogle Scholar
  22. 22.
    M. Jaro, Advances in record-linkage methodology as applied to matching the 1985 census of Tampa, Florida. J. Am. Stat. Assoc. 84(406), 414–420 (1989)CrossRefGoogle Scholar
  23. 23.
    T. Joachims, A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization, in Proceedings of the Fourteenth International Conference on Machine Learning (ICML ’97) (Morgan Kaufmann Publishers Inc., San Francisco, 1997)Google Scholar
  24. 24.
    X. Kong, J. Zhang, P. Yu, Inferring anchor links across multiple heterogeneous social networks, in Proceedings of the 22nd ACM International Conference on Information & Knowledge Management (CIKM ’13) (ACM, New York, 2013)Google Scholar
  25. 25.
    D. Koutra, H. Tong, D. Lubensky, Big-align: fast bipartite graph alignment, in 2013 IEEE 13th International Conference on Data Mining (IEEE, Piscataway, 2013)Google Scholar
  26. 26.
    K. Kunen, Set Theory (Elsevier Science Publishers, Amsterdam, 1980)zbMATHGoogle Scholar
  27. 27.
    J. Lee, W. Han, R. Kasperovics, J. Lee, An in-depth comparison of subgraph isomorphism algorithms in graph databases, in Proceedings of the VLDB Endowment. VLDB Endowment (2012)Google Scholar
  28. 28.
    V.I. Levenshtein, Binary codes capable of correcting deletions, insertions and reversals. Sov. Phy. Dok. 10, 707 (1966)MathSciNetGoogle Scholar
  29. 29.
    C. Liao, K. Lu, M. Baym, R. Singh, B. Berger. IsoRankN: spectral methods for global alignment of multiple protein networks. Bioinformatics 25(12), i253–i258 (2009)CrossRefGoogle Scholar
  30. 30.
    B. McKay, Practical graph isomorphism. Congr. Numer. 30, 45–87 (1981)MathSciNetzbMATHGoogle Scholar
  31. 31.
    G. Navarro, A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)CrossRefGoogle Scholar
  32. 32.
    M. Newman, Analysis of weighted networks. Phy. Rev. E 70, 056131 (2004)CrossRefGoogle Scholar
  33. 33.
    K. Petersen, M. Pedersen, The Matrix Cookbook. Technical report. Technical University of Denmark, Lyngby (2012)Google Scholar
  34. 34.
    E. Ristad, P. Yianilos, Learning string-edit distance. IEEE Trans. Pattern Anal. Mach. Intell. 20(5), 522–532 (1998)CrossRefGoogle Scholar
  35. 35.
    G. Salton, C. Buckley, Term-weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)CrossRefGoogle Scholar
  36. 36.
    G. Salton, A. Wong, C.S. Yang, A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)zbMATHCrossRefGoogle Scholar
  37. 37.
    T. Shi, S. Kasahara, T. Pongkittiphan, N. Minematsu, D. Saito, K. Hirose, A measure of phonetic similarity to quantify pronunciation variation by using ASR technology, in 18th International Congress of Phonetic Sciences (ICPhS 2015) (University of Glasgow, Glasgow, 2015)Google Scholar
  38. 38.
    R. Singh, J. Xu, B. Berger, Global alignment of multiple protein interaction networks with application to functional orthology detection. Natl. Acad. Sci. 105(35), 12763–12768 (2008)CrossRefGoogle Scholar
  39. 39.
    The National Archives, The Soundex indexing system (2007)Google Scholar
  40. 40.
    S. Umeyama, An eigendecomposition approach to weighted graph matching problems. IEEE Trans. Pattern Anal. Mach. Intell. 10(5), 695–703 (1988)zbMATHCrossRefGoogle Scholar
  41. 41.
    R. Wagner, M. Fischer, The string-to-string correction problem. J. ACM 21(1), 168–173 (1974)MathSciNetzbMATHCrossRefGoogle Scholar
  42. 42.
    W. Winkler, String comparator metrics and enhanced decision rules in the Fellegi-Sunter model of record linkage, in Proceedings of the Section on Survey Research (1990)Google Scholar
  43. 43.
    D. Wipf, B. Rao, L0-norm minimization for basis selection, in Proceedings of the 17th International Conference on Neural Information Processing Systems (NIPS’04) (MIT Press, Cambridge, 2005)Google Scholar
  44. 44.
    M. Yu, G. Li, D. Deng, J. Feng, String similarity search and join: a survey. Front. Comput. Sci. 10(3), 399–417 (2016)CrossRefGoogle Scholar
  45. 45.
    R. Zafarani, H. Liu, Connecting users across social media sites: a behavioral-modeling approach, in Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’13) (ACM, New York, 2013)CrossRefGoogle Scholar
  46. 46.
    J. Zhang, P. Yu, MCD: mutual clustering across multiple social networks, in 2015 IEEE International Congress on Big Data (IEEE, Piscataway, 2015)Google Scholar
  47. 47.
    J. Zhang, P. Yu, Multiple anonymized social networks alignment, in 2015 IEEE International Conference on Data Mining (IEEE, Piscataway, 2015)Google Scholar
  48. 48.
    J. Zhang, P. Yu, PCT: partial co-alignment of social networks, in Proceedings of the 25th International Conference on World Wide Web (WWW ’16) (International World Wide Web Conferences Steering Committee Republic and Canton of Geneva, Geneva, 2016)CrossRefGoogle Scholar
  49. 49.
    J. Zhang, P. Yu, Z. Zhou, Meta-path based multi-network collective link prediction, in Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’14) (ACM, New York, 2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Jiawei Zhang
    • 1
  • Philip S. Yu
    • 2
  1. 1.Department of Computer ScienceFlorida State UniversityTallahasseeUSA
  2. 2.Department of Computer ScienceUniversity of IllinoisChicagoUSA

Personalised recommendations