Abstract
Many real-world large datasets correspond to bipartite graph data settings—think for example of users rating movies or people visiting locations. Although there has been some prior work on data analysis with such bigraphs, no general network-oriented methodology has been proposed yet to perform node classification. In this paper we propose a three-stage classification framework that effectively deals with the typical very large size of such datasets. The stages are: (1) top node weighting, (2) projection to a weighted unigraph, and (3) application of a relational classifier. This paper has two major contributions. Firstly, this general framework allows us to explore the design space, by applying different choices at the three stages, introducing new alternatives and mixing-and-matching to create new techniques. We present an empirical study of the predictive and run-time performances for different combinations of functions in the three stages over a large collection of bipartite datasets with sizes of up to \(20\,\hbox {million} \times 30\,\hbox {million}\) nodes. Secondly, thinking of classification on bigraph data in terms of the three-stage framework opens up the design space of possible solutions, where existing and novel functions can be mixed and matched, and tailored to the problem at hand. Indeed, in this work a novel, fast, accurate and comprehensible method emerges, called the SW-transformation, as one of the best-performing combinations in the empirical study.
This is a preview of subscription content, access via your institution.








Notes
- 1.
This is not a fundamental limitation of the framework—additional features could be constructed and more sophisticated relational classifiers could be used. We leave that for future work.
- 2.
In our case the beta function is tuned in three levels.
- 3.
The grid we use searches for the optimal \(\alpha\) and \(\beta\) in the range between 0.1 and 12.1 with steps of 3 in the first level. We decrease the step size in each successive level by 3 times.
- 4.
The transformation can also be applied to cases where the node labels are score probabilities.
- 5.
A Python implementation of the SW-transformation is available online on https://github.com/SPraet/SW-transformation.
- 6.
- 7.
- 8.
- 9.
- 10.
For this paper, we only consider binary classification, where multiclass problems are cast to several one-versus-all binary classification problems. Other approaches to multiclass problems can easily be incorporated within our proposed framework. For more details, see Sect. 6.
- 11.
- 12.
- 13.
All the datasets for which this function performed best have only between 3.19 and 7.25% positive labels.
- 14.
Note that in addition to the entropy, the weights of the links also have impact on the prediction performance of wvRN.
- 15.
All experiments are conducted on a 3.40 GHz Intel i7 CPU, with 8 GB RAM and a 64-bit operating system.
References
Adamic, L. A., & Adar, E. (2003). Friends and neighbors on the web. Social Networks, 25(3), 211–230.
Allali, O., Magnien, C., & Latapy, M. (2011). Link prediction in bipartite graphs using internal links and weighted projection. In Conference on computer communications workshops (INFOCOM WKSHPS) (pp. 936–941). IEEE.
Barber, M. J. (2007). Modularity and community detection in bipartite networks. Physical Review E, 76(6), 066102.
Benchettara, N., Kanawati, R., & Rouveirol, C. (2010). Supervised machine learning applied to link prediction in bipartite social networks. In Advances in social networks analysis and mining (ASONAM) (pp. 326–330). IEEE.
Borgatti, S. P., & Everett, M. G. (1997). Network analysis of 2-mode data. Social Networks, 19(3), 243–269.
Borgatti, S. P., & Halgin, D. S. (2011). Analyzing affiliation networks. In J. Scott & P. J. Carrington (Eds.), The Sage handbook of social network analysis (pp. 417–433). Thousand Oaks: SAGE Publications.
Brozovsky, L. & Petricek, V. (2007). Recommender system for online dating service. In Proceedings of conference znalosti 2007. Ostrava: VSB.
Cancho, R. F. I., & Solé, R. V. (2001). The small world of human language. Proceedings of the Royal Society of London. Series B: Biological Sciences, 268(1482), 2261–2265.
Chen, X., Yu, G., Wang, J., Domeniconi, C., Li, Z., & Zhang, X. (2019). Activehne: Active heterogeneous network embedding. arXiv preprint arXiv:1905.05659.
Cho, E., Myers, S. A., & Leskovec, J. (2011). Friendship and mobility: User movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1082–1090). ACM.
Conitzer, V., Davenport, A., & Kalagnanam, J. (2006). Improved bounds for computing Kemeny rankings. In AAAI (Vol. 6, pp. 620–626).
Cui, P., Wang, X., Pei, J., & Zhu, W. (2018). A survey on network embedding. IEEE Transactions on Knowledge and Data Engineering, 31(5), 833–852.
de Cnudde, S., Martens, D., Evgeniou, T., & Provost, F. (2017). A benchmarking study of classification techniques for behavioral data. Working papers, University of Antwerp, Faculty of Applied Economics.
Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. The Journal of Machine Learning Research, 7, 1–30.
Dong, Y., Chawla, N. V., & Swami, A. (2017). metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 135–144). ACM.
Doreian, P., Batagelj, V., & Ferligoj, A. (2004). Generalized blockmodeling of two-mode network data. Social Networks, 26(1), 29–53.
Du, W., Yu, S., Yang, M., Qu, Q., & Zhu, J. (2018). GPSP: Graph partition and space projection based approach for heterogeneous network embedding. In Companion proceedings of the the web conference (Vol. 2018, pp. 59–60).
Eagle, N., & Pentland, A. (2006). Reality mining: Sensing complex social systems. Personal and Ubiquitous Computing, 10(4), 255–268.
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., & Lin, C.-J. (2008). LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research, 9, 1871–1874.
Faust, K. (1997). Centrality in affiliation networks. Social Networks, 19(2), 157–191.
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874.
Forbes, C., Evans, M., Hastings, N., & Peacock, B. (2011). Statistical distributions. New York: Wiley.
Gallagher, B., Tong, H., Eliassi-Rad, T., & Faloutsos, C. (2008). Using ghost edges for classification in sparsely labeled networks. In Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 256–264). ACM.
Gao, M., Chen, L., He, X., & Zhou, A. (2018). Bine: Bipartite network embedding. In The 41st international ACM SIGIR conference on research and development in information retrieval (pp. 715–724). ACM.
Getoor, L., & Taskar, B. (2007). Introduction to statistical relational learning (Vol. 1). Cambridge: MIT Press.
Goel, S., Hofman, J. M., & Sirer, M. I. (2012). Who does what on the web: A large-scale study of browsing behavior. In ICWSM.
Goyal, P., & Ferrara, E. (2018). Graph embedding techniques, applications, and performance: A survey. Knowledge-Based Systems, 151, 78–94.
Gregor, S., & Benbasat, I. (1999). Explanations from intelligent systems: Theoretical foundations and implications for practice. MIS Quarterly, 23(4), 497–530.
Grover, A. & Leskovec, J. (2016). node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining (pp. 855–864). ACM.
Guillaume, J.-L., & Latapy, M. (2006). Bipartite graphs as models of complex networks. Physica A: Statistical Mechanics and its Applications, 371(2), 795–813.
Gupte, M. & Eliassi-Rad, T. (2012). Measuring tie strength in implicit social networks. In Proceedings of the 3rd annual ACM web science conference (pp. 109–118). ACM.
Hu, J., Zeng, H.-J., Li, H., Niu, C., & Chen, Z. (2007). Demographic prediction based on user’s browsing behavior. In Proceedings of the 16th international conference on world wide web (pp. 151–160). ACM.
Huang, H. S., Lin, K. L., & Hsu C.-N., & Hsu, J. Y. J. (2005). Item-triggered recommendation for identifying potential customers of cold sellers in supermarkets. In Workshop on the next stage of recommender systems research, in conjunction with the 2005 international conference on intelligent user interfaces (IUI 2005).
Huang, X., Song, Q., Yang, F., & Hu, X. (2019). Large-scale heterogeneous feature embedding. In Proceedings of the AAAI conference on artificial intelligence (Vol. 33, pp. 3878–3885).
Huang, Z., Li, X., & Chen, H. (2005). Link prediction approach to collaborative filtering. In Proceedings of the 5th ACM/IEEE-CS joint conference on digital libraries (pp. 141–142). ACM.
Jensen, D. & Neville, J. (2002). Linkage and autocorrelation cause feature selection bias in relational learning. In Proceedings of the nineteenth international conference on machine learning, ICML ’02 (pp. 259–266). San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.
Jones, K. S. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11–21.
Junqué de Fortuny, E., Martens, D., & Provost, F. (2013). Predictive modeling with big data: Is bigger really better? Big Data, 1(4), 215–226.
Khosravi, H. & Bina, B. (2010). A survey on statistical relational learning. In Canadian conference on artificial intelligence (pp. 256–268). Springer.
Kim, J. H. (2017). Hypotheses generation using link prediction in a bipartite graph. CoRR arXiv:abs/1708.04725.
Kipf, T. N. & Welling, M. (2016). Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907.
Kosinski, M., Stillwell, D., & Graepel, T. (2013). Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 110(15), 5802–5805.
Lambiotte, R., & Ausloos, M. (2005). Uncovering collective listening habits and music genres in bipartite networks. Physical Review E, 72(6), 066107.
Latapy, M., Magnien, C., & Vecchio, N. D. (2008). Basic notations for the analysis of large two-mode networks. Social Networks, 30, 31–48.
Liben-Nowell, D., & Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American Society for Information Science and Technology, 58(7), 1019–1031.
Lind, P. G., Gonzalez, M. C., & Herrmann, H. J. (2005). Cycles and clustering in bipartite networks. Physical Review E, 72(5), 056127.
Liu, N., Huang, X., Li, J., & Hu, X. (2018). On interpretation of network embedding via taxonomy induction. In Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’18 (pp. 1812–1820). New York, NY, USA: ACM.
Li, X., Wang, H., Gu, B., & Ling, C. X. (2015). Data sparseness in linear SVM. IJCAI (pp. 3628–3634).
Lu, Q., & Getoor, L. (2003). Link-based classification. In ICML (Vol. 3, pp. 496–503).
Macskassy, S. A., & Provost, F. (2003). A simple relational classifier. New York: NYU Stern School of Business.
Macskassy, S. A., & Provost, F. (2007). Classification in networked data: A toolkit and a univariate case study. Journal of Machine Learning Research, 8, 935–983.
Martens, D., Baesens, B., Van Gestel, T., & Vanthienen, J. (2007). Comprehensible credit scoring models using rule extraction from support vector machines. European Journal of Operational Research, 183(3), 1466–1476.
Martens, D. & Provost, F. (2011). Pseudo-social network targeting from consumer transaction data. Working paper CeDER-11-05, New York University—Stern School of Business.
Martens, D., & Provost, F. (2014). Explaining data-driven document classifications. MIS Quarterly, 38(1), 73–100.
Martens, D., Provost, F., Clark, J., & de Fortuny, E. J. (2013). Mining fine-grained consumer payment data to improve targeted marketing. Working paper, New York University—Stern School of Business.
Martens, D., Vanthienen, J., Verbeke, W., & Baesens, B. (2011). Performance of classification models from a user perspective. Decision Support Systems, 51(4), 782–793.
Newman, M. E. (2001a). Scientific collaboration networks. I. Network construction and fundamental results. Physical Review E, 64(1), 16–131.
Newman, M. E. (2001b). Scientific collaboration networks. II. Shortest paths, weighted networks, and centrality. Physical Review E, 64(1), 16–132.
Opsahl, T. (2011). Triadic closure in two-mode networks: Redefining the global and local clustering coefficients. Social Networks, 35(2), 159–167.
Perlich, C. & Provost, F. (2003). Aggregation-based feature invention and relational concept classes. In Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 167–176). ACM.
Perlich, C., & Provost, F. (2006). Distribution-based aggregation for relational learning with identifier attributes. Machine Learning, 62(1–2), 65–105.
Perlich, C., & Świrszcz, G. (2011). On cross-validation and stacking: Building seemingly predictive models on random data. ACM SIGKDD Explorations Newsletter, 12(2), 11–15.
Perozzi, B., Al-Rfou, R., & Skiena, S. (2014). Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 701–710). ACM.
Provost, F., Dalessandro, B., Hook, R., Zhang, X., & Murray, A. (2009). Audience selection for on-line brand advertising: Privacy-friendly social network targeting. In Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 707–716). ACM.
Provost, F., & Fawcett, T. (2013). Data science for business: What you need to know about data mining and data-analytic thinking. Newton: O’Reilly Media Inc.
Provost, F., & Kolluri, V. (1999). A survey of methods for scaling up inductive algorithms. Data Mining and Knowledge Discovery, 3(2), 131–169.
Provost, F., Martens, D., & Murray, A. (2012). Geo-social network advertising. In 2012 winter conference on business intelligence.
Provost, F., Martens, D., & Murray, A. (2015). Finding mobile consumers with a privacy-friendly geo-similarity network. Information Systems Research, 26(2), 243–265.
Raeder, T., Stitelman, O., Dalessandro, B., Perlich, C., & Provost, F. (2012). Design principles of massive, robust prediction systems. In Proceedings of the 18th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 1357–1365). ACM.
Robins, G., & Alexander, M. (2004). Small worlds among interlocking directors: Network structure and distance in bipartite graphs. Computational & Mathematical Organization Theory, 10(1), 69–94.
Rrnyi, A. (1961). On measures of entropy and information. In Fourth Berkeley symposium on mathematical statistics and probability (pp. 547–561).
Seierstad, C., & Opsahl, T. (2011). For the few not the many? The effects of affirmative action on presence, prominence, and social capital of women directors in norway. Scandinavian Journal of Management, 27(1), 44–54.
Sun, J., Qu, H., Chakrabarti, D., & Faloutsos, C. (2005). Neighborhood formation and anomaly detection in bipartite graphs. In Proceedings of the fifth IEEE international conference on data mining, ICDM ’05 (pp. 418–425). Washington, DC, USA: IEEE Computer Society.
Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., & Mei, Q. (2015). Line: Large-scale information network embedding. In Proceedings of the 24th international conference on world wide web (pp. 1067–1077). International World Wide Web Conferences Steering Committee.
Wang, D., Cui, P., & Zhu, W. (2016). Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 1225–1234). ACM.
Wang, H., Wang, J., Wang, J., Zhao, M., Zhang, W., Zhang, F., Xie, X., & Guo, M. (2018). Graphgan: Graph representation learning with generative adversarial nets. In Thirty-second AAAI conference on artificial intelligence.
Weber, I., Garimella, V. R. K., & Borra, E. (2013). Inferring audience partisanship for youtube videos. In Proceedings of the 22nd international conference on world wide web companion (pp. 43–44). International World Wide Web Conferences Steering Committee.
Young, H. P., & Levenglick, A. (1978). A consistent extension of Condorcet’s election principle. SIAM Journal on Applied Mathematics, 35(2), 285–300.
Yu, H.-F., Lo, H.-Y., Hsieh, H.-P., Lou, J.-K., McKenzie, T. G., Chou, J.-W., Chung, P.-H., Ho, C.-H., Chang, C.-F., Wei, Y.-H., et al. (2010). Feature engineering and classifier ensemble for KDD cup 2010. In Proceedings of the KDD cup 2010 workshop (pp. 1–16).
Zha, H., He, X., Ding, C., Simon, H., & Gu, M. (2001). Bipartite graph partitioning and data clustering. In Proceedings of the tenth international conference on information and knowledge management (pp. 25–32). ACM.
Zhang, D., Yin, J., Zhu, X., & Zhang, C. (2018). Network representation learning: A survey. IEEE Transactions on Big Data, 6(1), 3–28.
Zhang, Y., Xiong, Y., Kong, X., & Zhu, Y. (2017). Learning node embeddings in interaction graphs. In Proceedings of the 2017 ACM on conference on information and knowledge management (pp. 397–406).
Zhou, T., Ren, J., Medo, M., & Zhang, Y.-C. (2007). Bipartite network projection and personal recommendation. Physical Review E, 76(4), 046115.
Ziegler, C.-N., McNee, S. M., Konstan, J. A., & Lausen, G. (2005). Improving recommendation lists through topic diversification. In Proceedings of the 14th international conference on world wide web (pp. 22–32). ACM.
Zweig, K. A., & Kaufmann, M. (2011). A systematic approach to the one-mode projection of bipartite graphs. Social Network Analysis and Mining, 1(3), 187–218.
Author information
Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Editor: Luc De Raedt.
Appendix
Appendix
A Data distributions
B Results tables
See Figs. 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26 and Tables 7, 8, 9.
Aggregated run-time results for each of the top node and aggregation functions with wvRN (including the SW-transformation). Since most of the top-node functions (except for the beta) have similar durations, the markers on the plots are very close to each other (and given in descending order). The SW-transformation outperforms all the other aggregation functions in combination with any non-tuning top-node function
Rights and permissions
About this article
Cite this article
Stankova, M., Praet, S., Martens, D. et al. Node classification over bipartite graphs through projection. Mach Learn 110, 37–87 (2021). https://doi.org/10.1007/s10994-020-05898-0
Received:
Revised:
Accepted:
Published:
Issue Date:
Keywords
- Bipartite graphs
- Two-mode networks
- Node classification
- Behavioral data