Node classification over bipartite graphs through projection

Abstract

Many real-world large datasets correspond to bipartite graph data settings—think for example of users rating movies or people visiting locations. Although there has been some prior work on data analysis with such bigraphs, no general network-oriented methodology has been proposed yet to perform node classification. In this paper we propose a three-stage classification framework that effectively deals with the typical very large size of such datasets. The stages are: (1) top node weighting, (2) projection to a weighted unigraph, and (3) application of a relational classifier. This paper has two major contributions. Firstly, this general framework allows us to explore the design space, by applying different choices at the three stages, introducing new alternatives and mixing-and-matching to create new techniques. We present an empirical study of the predictive and run-time performances for different combinations of functions in the three stages over a large collection of bipartite datasets with sizes of up to \(20\,\hbox {million} \times 30\,\hbox {million}\) nodes. Secondly, thinking of classification on bigraph data in terms of the three-stage framework opens up the design space of possible solutions, where existing and novel functions can be mixed and matched, and tailored to the problem at hand. Indeed, in this work a novel, fast, accurate and comprehensible method emerges, called the SW-transformation, as one of the best-performing combinations in the empirical study.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Notes

  1. 1.

    This is not a fundamental limitation of the framework—additional features could be constructed and more sophisticated relational classifiers could be used. We leave that for future work.

  2. 2.

    In our case the beta function is tuned in three levels.

  3. 3.

    The grid we use searches for the optimal \(\alpha\) and \(\beta\) in the range between 0.1 and 12.1 with steps of 3 in the first level. We decrease the step size in each successive level by 3 times.

  4. 4.

    The transformation can also be applied to cases where the node labels are score probabilities.

  5. 5.

    A Python implementation of the SW-transformation is available online on https://github.com/SPraet/SW-transformation.

  6. 6.

    http://konect.uni-koblenz.de.

  7. 7.

    http://realitycommons.media.mit.edu.

  8. 8.

    http://socialnetworks.mpi-sws.org.

  9. 9.

    http://www.grouplens.org.

  10. 10.

    For this paper, we only consider binary classification, where multiclass problems are cast to several one-versus-all binary classification problems. Other approaches to multiclass problems can easily be incorporated within our proposed framework. For more details, see Sect. 6.

  11. 11.

    http://webscope.sandbox.yahoo.com/.

  12. 12.

    https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/.

  13. 13.

    All the datasets for which this function performed best have only between 3.19 and 7.25% positive labels.

  14. 14.

    Note that in addition to the entropy, the weights of the links also have impact on the prediction performance of wvRN.

  15. 15.

    All experiments are conducted on a 3.40 GHz Intel i7 CPU, with 8 GB RAM and a 64-bit operating system.

References

  1. Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230.

    Google Scholar 

  2. Allali, O., Magnien, C., & Latapy, M. (2011). Link prediction in bipartite graphs using internal links and weighted projection. In Conference on computer communications workshops (INFOCOM WKSHPS) (pp. 936–941). IEEE.

  3. Barber, M. J. (2007). Modularity and community detection in bipartite networks. Physical Review E, 76(6), 066102.

    MathSciNet  Google Scholar 

  4. Benchettara, N., Kanawati, R., & Rouveirol, C. (2010). Supervised machine learning applied to link prediction in bipartite social networks. In Advances in social networks analysis and mining (ASONAM) (pp. 326–330). IEEE.

  5. Borgatti, S. P., & Everett, M. G. (1997). Network analysis of 2-mode data. Social Networks, 19(3), 243–269.

    Google Scholar 

  6. Borgatti, S. P., & Halgin, D. S. (2011). Analyzing affiliation networks. In J. Scott & P. J. Carrington (Eds.), The Sage handbook of social network analysis (pp. 417–433). Thousand Oaks: SAGE Publications.

    Google Scholar 

  7. Brozovsky, L. & Petricek, V. (2007). Recommender system for online dating service. In Proceedings of conference znalosti 2007. Ostrava: VSB.

  8. Cancho, R. F. I., & Solé, R. V. (2001). The small world of human language. Proceedings of the Royal Society of London. Series B: Biological Sciences, 268(1482), 2261–2265.

    Google Scholar 

  9. Chen, X., Yu, G., Wang, J., Domeniconi, C., Li, Z., & Zhang, X. (2019). Activehne: Active heterogeneous network embedding. arXiv preprint arXiv:1905.05659.

  10. Cho, E., Myers, S. A., & Leskovec, J. (2011). Friendship and mobility: User movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1082–1090). ACM.

  11. Conitzer, V., Davenport, A., & Kalagnanam, J. (2006). Improved bounds for computing Kemeny rankings. In AAAI (Vol. 6, pp. 620–626).

  12. Cui, P., Wang, X., Pei, J., & Zhu, W. (2018). A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering, 31(5), 833–852.

    Google Scholar 

  13. de Cnudde, S., Martens, D., Evgeniou, T., & Provost, F. (2017). A benchmarking study of classification techniques for behavioral data. Working papers, University of Antwerp, Faculty of Applied Economics.

  14. Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7, 1–30.

    MathSciNet  MATH  Google Scholar 

  15. Dong, Y., Chawla, N. V., & Swami, A. (2017). metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 135–144). ACM.

  16. Doreian, P., Batagelj, V., & Ferligoj, A. (2004). Generalized blockmodeling of two-mode network data. Social Networks, 26(1), 29–53.

    Google Scholar 

  17. Du, W., Yu, S., Yang, M., Qu, Q., & Zhu, J. (2018). GPSP: Graph partition and space projection based approach for heterogeneous network embedding. In Companion proceedings of the the web conference (Vol. 2018, pp. 59–60).

  18. Eagle, N., & Pentland, A. (2006). Reality mining: Sensing complex social systems. Personal and Ubiquitous Computing, 10(4), 255–268.

    Google Scholar 

  19. Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.

    MATH  Google Scholar 

  20. Faust, K. (1997). Centrality in affiliation networks. Social Networks, 19(2), 157–191.

    Google Scholar 

  21. Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.

    MathSciNet  Google Scholar 

  22. Forbes, C., Evans, M., Hastings, N., & Peacock, B. (2011). Statistical distributions. New York: Wiley.

    Google Scholar 

  23. Gallagher, B., Tong, H., Eliassi-Rad, T., & Faloutsos, C. (2008). Using ghost edges for classification in sparsely labeled networks. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 256–264). ACM.

  24. Gao, M., Chen, L., He, X., & Zhou, A. (2018). Bine: Bipartite network embedding. In The 41st international ACM SIGIR conference on research and development in information retrieval (pp. 715–724). ACM.

  25. Getoor, L., & Taskar, B. (2007). Introduction to statistical relational learning (Vol. 1). Cambridge: MIT Press.

    Google Scholar 

  26. Goel, S., Hofman, J. M., & Sirer, M. I. (2012). Who does what on the web: A large-scale study of browsing behavior. In ICWSM.

  27. Goyal, P., & Ferrara, E. (2018). Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems, 151, 78–94.

    Google Scholar 

  28. Gregor, S., & Benbasat, I. (1999). Explanations from intelligent systems: Theoretical foundations and implications for practice. MIS Quarterly, 23(4), 497–530.

    Google Scholar 

  29. Grover, A. & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 855–864). ACM.

  30. Guillaume, J.-L., & Latapy, M. (2006). Bipartite graphs as models of complex networks. Physica A: Statistical Mechanics and its Applications, 371(2), 795–813.

    Google Scholar 

  31. Gupte, M. & Eliassi-Rad, T. (2012). Measuring tie strength in implicit social networks. In Proceedings of the 3rd annual ACM web science conference (pp. 109–118). ACM.

  32. Hu, J., Zeng, H.-J., Li, H., Niu, C., & Chen, Z. (2007). Demographic prediction based on user’s browsing behavior. In Proceedings of the 16th international conference on world wide web (pp. 151–160). ACM.

  33. Huang, H. S., Lin, K. L., & Hsu C.-N., & Hsu, J. Y. J. (2005). Item-triggered recommendation for identifying potential customers of cold sellers in supermarkets. In Workshop on the next stage of recommender systems research, in conjunction with the 2005 international conference on intelligent user interfaces (IUI 2005).

  34. Huang, X., Song, Q., Yang, F., & Hu, X. (2019). Large-scale heterogeneous feature embedding. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, pp. 3878–3885).

  35. Huang, Z., Li, X., & Chen, H. (2005). Link prediction approach to collaborative filtering. In Proceedings of the 5th ACM/IEEE-CS joint conference on digital libraries (pp. 141–142). ACM.

  36. Jensen, D. & Neville, J. (2002). Linkage and autocorrelation cause feature selection bias in relational learning. In Proceedings of the nineteenth international conference on machine learning, ICML ’02 (pp. 259–266). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.

  37. Jones, K. S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11–21.

    Google Scholar 

  38. Junqué de Fortuny, E., Martens, D., & Provost, F. (2013). Predictive modeling with big data: Is bigger really better? Big Data, 1(4), 215–226.

    Google Scholar 

  39. Khosravi, H. & Bina, B. (2010). A survey on statistical relational learning. In Canadian conference on artificial intelligence (pp. 256–268). Springer.

  40. Kim, J. H. (2017). Hypotheses generation using link prediction in a bipartite graph. CoRR arXiv:abs/1708.04725.

  41. Kipf, T. N. & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.

  42. Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 110(15), 5802–5805.

    Google Scholar 

  43. Lambiotte, R., & Ausloos, M. (2005). Uncovering collective listening habits and music genres in bipartite networks. Physical Review E, 72(6), 066107.

    Google Scholar 

  44. Latapy, M., Magnien, C., & Vecchio, N. D. (2008). Basic notations for the analysis of large two-mode networks. Social Networks, 30, 31–48.

    Google Scholar 

  45. Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019–1031.

    Google Scholar 

  46. Lind, P. G., Gonzalez, M. C., & Herrmann, H. J. (2005). Cycles and clustering in bipartite networks. Physical Review E, 72(5), 056127.

    Google Scholar 

  47. Liu, N., Huang, X., Li, J., & Hu, X. (2018). On interpretation of network embedding via taxonomy induction. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’18 (pp. 1812–1820). New York, NY, USA: ACM.

  48. Li, X., Wang, H., Gu, B., & Ling, C. X. (2015). Data sparseness in linear SVM. IJCAI (pp. 3628–3634).

  49. Lu, Q., & Getoor, L. (2003). Link-based classification. In ICML (Vol. 3, pp. 496–503).

  50. Macskassy, S. A., & Provost, F. (2003). A simple relational classifier. New York: NYU Stern School of Business.

    Google Scholar 

  51. Macskassy, S. A., & Provost, F. (2007). Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning Research, 8, 935–983.

    Google Scholar 

  52. Martens, D., Baesens, B., Van Gestel, T., & Vanthienen, J. (2007). Comprehensible credit scoring models using rule extraction from support vector machines. European Journal of Operational Research, 183(3), 1466–1476.

    MATH  Google Scholar 

  53. Martens, D. & Provost, F. (2011). Pseudo-social network targeting from consumer transaction data. Working paper CeDER-11-05, New York University—Stern School of Business.

  54. Martens, D., & Provost, F. (2014). Explaining data-driven document classifications. MIS Quarterly, 38(1), 73–100.

    Google Scholar 

  55. Martens, D., Provost, F., Clark, J., & de Fortuny, E. J. (2013). Mining fine-grained consumer payment data to improve targeted marketing. Working paper, New York University—Stern School of Business.

  56. Martens, D., Vanthienen, J., Verbeke, W., & Baesens, B. (2011). Performance of classification models from a user perspective. Decision Support Systems, 51(4), 782–793.

    Google Scholar 

  57. Newman, M. E. (2001a). Scientific collaboration networks. I. Network construction and fundamental results. Physical Review E, 64(1), 16–131.

    Google Scholar 

  58. Newman, M. E. (2001b). Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E, 64(1), 16–132.

    Google Scholar 

  59. Opsahl, T. (2011). Triadic closure in two-mode networks: Redefining the global and local clustering coefficients. Social Networks, 35(2), 159–167.

    Google Scholar 

  60. Perlich, C. & Provost, F. (2003). Aggregation-based feature invention and relational concept classes. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 167–176). ACM.

  61. Perlich, C., & Provost, F. (2006). Distribution-based aggregation for relational learning with identifier attributes. Machine Learning, 62(1–2), 65–105.

    Google Scholar 

  62. Perlich, C., & Świrszcz, G. (2011). On cross-validation and stacking: Building seemingly predictive models on random data. ACM SIGKDD Explorations Newsletter, 12(2), 11–15.

    Google Scholar 

  63. Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 701–710). ACM.

  64. Provost, F., Dalessandro, B., Hook, R., Zhang, X., & Murray, A. (2009). Audience selection for on-line brand advertising: Privacy-friendly social network targeting. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 707–716). ACM.

  65. Provost, F., & Fawcett, T. (2013). Data science for business: What you need to know about data mining and data-analytic thinking. Newton: O’Reilly Media Inc.

    Google Scholar 

  66. Provost, F., & Kolluri, V. (1999). A survey of methods for scaling up inductive algorithms. Data Mining and Knowledge Discovery, 3(2), 131–169.

    Google Scholar 

  67. Provost, F., Martens, D., & Murray, A. (2012). Geo-social network advertising. In 2012 winter conference on business intelligence.

  68. Provost, F., Martens, D., & Murray, A. (2015). Finding mobile consumers with a privacy-friendly geo-similarity network. Information Systems Research, 26(2), 243–265.

    Google Scholar 

  69. Raeder, T., Stitelman, O., Dalessandro, B., Perlich, C., & Provost, F. (2012). Design principles of massive, robust prediction systems. In Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1357–1365). ACM.

  70. Robins, G., & Alexander, M. (2004). Small worlds among interlocking directors: Network structure and distance in bipartite graphs. Computational & Mathematical Organization Theory, 10(1), 69–94.

    MATH  Google Scholar 

  71. Rrnyi, A. (1961). On measures of entropy and information. In Fourth Berkeley symposium on mathematical statistics and probability (pp. 547–561).

  72. Seierstad, C., & Opsahl, T. (2011). For the few not the many? The effects of affirmative action on presence, prominence, and social capital of women directors in norway. Scandinavian Journal of Management, 27(1), 44–54.

    Google Scholar 

  73. Sun, J., Qu, H., Chakrabarti, D., & Faloutsos, C. (2005). Neighborhood formation and anomaly detection in bipartite graphs. In Proceedings of the fifth IEEE international conference on data mining, ICDM ’05 (pp. 418–425). Washington, DC, USA: IEEE Computer Society.

  74. Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., & Mei, Q. (2015). Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web (pp. 1067–1077). International World Wide Web Conferences Steering Committee.

  75. Wang, D., Cui, P., & Zhu, W. (2016). Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1225–1234). ACM.

  76. Wang, H., Wang, J., Wang, J., Zhao, M., Zhang, W., Zhang, F., Xie, X., & Guo, M. (2018). Graphgan: Graph representation learning with generative adversarial nets. In Thirty-second AAAI conference on artificial intelligence.

  77. Weber, I., Garimella, V. R. K., & Borra, E. (2013). Inferring audience partisanship for youtube videos. In Proceedings of the 22nd international conference on world wide web companion (pp. 43–44). International World Wide Web Conferences Steering Committee.

  78. Young, H. P., & Levenglick, A. (1978). A consistent extension of Condorcet’s election principle. SIAM Journal on Applied Mathematics, 35(2), 285–300.

    MathSciNet  MATH  Google Scholar 

  79. Yu, H.-F., Lo, H.-Y., Hsieh, H.-P., Lou, J.-K., McKenzie, T. G., Chou, J.-W., Chung, P.-H., Ho, C.-H., Chang, C.-F., Wei, Y.-H., et al. (2010). Feature engineering and classifier ensemble for KDD cup 2010. In Proceedings of the KDD cup 2010 workshop (pp. 1–16).

  80. Zha, H., He, X., Ding, C., Simon, H., & Gu, M. (2001). Bipartite graph partitioning and data clustering. In Proceedings of the tenth international conference on information and knowledge management (pp. 25–32). ACM.

  81. Zhang, D., Yin, J., Zhu, X., & Zhang, C. (2018). Network representation learning: A survey. IEEE Transactions on Big Data, 6(1), 3–28.

    Google Scholar 

  82. Zhang, Y., Xiong, Y., Kong, X., & Zhu, Y. (2017). Learning node embeddings in interaction graphs. In Proceedings of the 2017 ACM on conference on information and knowledge management (pp. 397–406).

  83. Zhou, T., Ren, J., Medo, M., & Zhang, Y.-C. (2007). Bipartite network projection and personal recommendation. Physical Review E, 76(4), 046115.

    Google Scholar 

  84. Ziegler, C.-N., McNee, S. M., Konstan, J. A., & Lausen, G. (2005). Improving recommendation lists through topic diversification. In Proceedings of the 14th international conference on world wide web (pp. 22–32). ACM.

  85. Zweig, K. A., & Kaufmann, M. (2011). A systematic approach to the one-mode projection of bipartite graphs. Social Network Analysis and Mining, 1(3), 187–218.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Stiene Praet.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Editor: Luc De Raedt.

Appendix

Appendix

A Data distributions

See Figs. 9, 10, 11 and 12.

Fig. 9
figure9

Degree distributions of the top nodes (upper row) and bottom nodes (bottom row) for different datasets

Fig. 10
figure10

Degree distributions of the top nodes (upper row) and bottom nodes (bottom row) for different datasets

Fig. 11
figure11

Degree distributions of the top nodes (upper row) and bottom nodes (bottom row) for different datasets

Fig. 12
figure12

Degree distributions of the top nodes (upper row) and bottom nodes (bottom row) for different datasets

B Results tables

See Figs. 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 and Tables 7, 8, 9.

Table 7 Kemeny–Young ranking for all the combinations of techniques
Table 8 Beta grid search on three levels with the optimal \(\alpha\) and \(\beta\) parameters, as well as the corresponding AUC per level
Fig. 13
figure13

Predictive performance of the beta function in combination with SW-transformation when the parameters are tuned on a sample of the training data and trained on the full training data. The difference in predictive performance is limited

Fig. 14
figure14

Aggregated run-time results for each of the top node and aggregation functions with wvRN (including the SW-transformation). Since most of the top-node functions (except for the beta) have similar durations, the markers on the plots are very close to each other (and given in descending order). The SW-transformation outperforms all the other aggregation functions in combination with any non-tuning top-node function

Fig. 15
figure15

Aggregated run-time results for each of the top node and aggregation functions with the nLB classifier

Fig. 16
figure16

Aggregated run-time results for each of the top node and aggregation functions with the nLB100 classifier (nLB with 100 training instances)

Fig. 17
figure17

Aggregated run-time results for each of the top node and aggregation functions with the cdRN classifier

Fig. 18
figure18

Time improvement of nLB with sampling over 100 instances as compared to no sampling for different datasets. The top of each bar represents the time needed for the nLB classifier and the bottom of each bar the time required to train the nLB with 100 instances for the specific dataset

Table 9 Best combinations of methods per dataset
Fig. 19
figure19

Ranking of all combinations of methods, with the proposed combinations highlighted in red (Color figure online)

Fig. 20
figure20

Ranking of all combinations of methods, with the proposed combinations highlighted in red (Color figure online)

Fig. 21
figure21

Ranking of all combinations of methods, with the proposed combinations highlighted in red (Color figure online)

Fig. 22
figure22

Ranking of all combinations of methods, with the proposed combinations highlighted in red (Color figure online)

Fig. 23
figure23

Ranking of all combinations of methods, with the proposed combinations highlighted in red (Color figure online)

Fig. 24
figure24

Ranking of all combinations of methods, with the proposed combinations highlighted in red (Color figure online)

Fig. 25
figure25

Ranking of all combinations of methods, with the proposed combinations highlighted in red (Color figure online)

Fig. 26
figure26

Ranking of all combinations of methods, with the proposed combinations highlighted in red (Color figure online)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Stankova, M., Praet, S., Martens, D. et al. Node classification over bipartite graphs through projection. Mach Learn 110, 37–87 (2021). https://doi.org/10.1007/s10994-020-05898-0

Download citation

Keywords

  • Bipartite graphs
  • Two-mode networks
  • Node classification
  • Behavioral data