P2P Lending Analysis Using the Most Relevant Graph-Based Features
Peer-to-Peer (P2P) lending is an online platform to facilitate borrowing and investment transactions. A central problem for these P2P platforms is how to identify the most influential factors that are closely related to the credit risks. This problem is inherently complex due to the various forms of risks and the numerous influencing factors involved. Moreover, raw data of P2P lending are often high-dimension, highly correlated and unstable, making the problem more untractable by traditional statistical and machine learning approaches. To address these problems, we develop a novel filter-based feature selection method for P2P lending analysis. Unlike most traditional feature selection methods that use vectorial features, the proposed method is based on graph-based features and thus incorporates the relationships between pairwise feature samples into the feature selection process. Since the graph-based features are by nature completed weighted graphs, we use the steady state random walk to encapsulate the main characteristics of the graph-based features. Specifically, we compute a probability distribution of the walk visiting the vertices. Furthermore, we measure the discriminant power of each graph-based feature with respect to the target feature, through the Jensen-Shannon divergence measure between the probability distributions from the random walks. We select an optimal subset of features based on the most relevant graph-based features, through the Jensen-Shannon divergence measure. Unlike most existing state-of-the-art feature selection methods, the proposed method can accommodate both continuous and discrete target features. Experiments demonstrate the effectiveness and usefulness of the proposed feature selection algorithm on the problem of P2P lending platforms in China.
KeywordsFeature Selection Credit Risk Credit Rating Feature Selection Method Target Feature
This work is supported by the National Natural Science Foundation of China (Grant nos. 61602535, 61503422 and 61402389), and the Open Projects Program of National Laboratory of Pattern Recognition. Lu Bai is supported by the program for innovation research in Central University of Finance and Economics. Edwin R. Hancock is supported by a Royal Society Wolfson Research Merit Award. Lixin Cui is supported by the Young Scholar Development Fund of Central University of Finance and Economics, No. QJJ1540.
- 1.Bai, L., Bunke, H., Hancock, E.R.: An attributed graph kernel from the Jensen-Shannon divergence. In: Proceedings of ICPR, pp. 88–93 (2014). DBLP:conf/icpr/2014Google Scholar
- 2.Bai, L., Rossi, L., Bunke, H., Hancock, E.R.: Attributed graph kernels using the Jensen-Tsallis q-differences. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8724, pp. 99–114. Springer, Heidelberg (2014). doi: 10.1007/978-3-662-44848-9_7 Google Scholar
- 9.Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the ICML, pp. 359–366 (2000)Google Scholar
- 11.He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, NIPS 2005, Vancouver, British Columbia, Canada, 5–8 December 2005], pp. 507–514 (2005)Google Scholar
- 21.Jin, Y., Zhu, Y.D.: A data-driven approach to predict default risk of loan for online Peer-to-Peer (P2P) lending. In: Proceedings of Fifth International Conference on Communication Systems and Network Technologies, pp. 609–613 (2015)Google Scholar
- 24.Zhao, H., Le, W., Liu, Q., Ge, Y., Chen, E.: Investment recommendation in P2P lending: a portfolio perspective with risk management. In: Proceedings of ICDM, pp. 1109–1114 (2014)Google Scholar