P2P Lending Analysis Using the Most Relevant Graph-Based Features

  • Lixin Cui
  • Lu BaiEmail author
  • Yue Wang
  • Xiao Bai
  • Zhihong Zhang
  • Edwin R. Hancock
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10029)


Peer-to-Peer (P2P) lending is an online platform to facilitate borrowing and investment transactions. A central problem for these P2P platforms is how to identify the most influential factors that are closely related to the credit risks. This problem is inherently complex due to the various forms of risks and the numerous influencing factors involved. Moreover, raw data of P2P lending are often high-dimension, highly correlated and unstable, making the problem more untractable by traditional statistical and machine learning approaches. To address these problems, we develop a novel filter-based feature selection method for P2P lending analysis. Unlike most traditional feature selection methods that use vectorial features, the proposed method is based on graph-based features and thus incorporates the relationships between pairwise feature samples into the feature selection process. Since the graph-based features are by nature completed weighted graphs, we use the steady state random walk to encapsulate the main characteristics of the graph-based features. Specifically, we compute a probability distribution of the walk visiting the vertices. Furthermore, we measure the discriminant power of each graph-based feature with respect to the target feature, through the Jensen-Shannon divergence measure between the probability distributions from the random walks. We select an optimal subset of features based on the most relevant graph-based features, through the Jensen-Shannon divergence measure. Unlike most existing state-of-the-art feature selection methods, the proposed method can accommodate both continuous and discrete target features. Experiments demonstrate the effectiveness and usefulness of the proposed feature selection algorithm on the problem of P2P lending platforms in China.


Feature Selection Credit Risk Credit Rating Feature Selection Method Target Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work is supported by the National Natural Science Foundation of China (Grant nos. 61602535, 61503422 and 61402389), and the Open Projects Program of National Laboratory of Pattern Recognition. Lu Bai is supported by the program for innovation research in Central University of Finance and Economics. Edwin R. Hancock is supported by a Royal Society Wolfson Research Merit Award. Lixin Cui is supported by the Young Scholar Development Fund of Central University of Finance and Economics, No. QJJ1540.


  1. 1.
    Bai, L., Bunke, H., Hancock, E.R.: An attributed graph kernel from the Jensen-Shannon divergence. In: Proceedings of ICPR, pp. 88–93 (2014). DBLP:conf/icpr/2014Google Scholar
  2. 2.
    Bai, L., Rossi, L., Bunke, H., Hancock, E.R.: Attributed graph kernels using the Jensen-Tsallis q-differences. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8724, pp. 99–114. Springer, Heidelberg (2014). doi: 10.1007/978-3-662-44848-9_7 Google Scholar
  3. 3.
    Chen, Y., Miao, D., Wang, R.: A rough set approach to feature selection based on ant colony optimization. Pattern Recogn. Lett. 31(3), 226–233 (2010)CrossRefGoogle Scholar
  4. 4.
    Crook, J.N., Edelman, D., Thomas, L.C.: Recent developments in consumer credit risk assessment. Eur. J. Oper. Res. 183(3), 1447–1465 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Hand, D.J., Henley, W.E.: Statistical classification methods in consumer credit scoring: a review. J. R. Stat. Soc. Ser. A 160(3), 523–541 (1997)CrossRefGoogle Scholar
  6. 6.
    Guo, Y., Zhou, W., Luo, C., Liu, C., Xiong, H.: Instance-based credit risk assessment for investment decisions in P2P lending. Eur. J. Oper. Res. 249(2), 417–426 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  7. 7.
    Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)zbMATHGoogle Scholar
  8. 8.
    Hájek, P., Michalak, K.: Feature selection in corporate credit rating prediction. Knowl.-Based Syst. 51, 72–84 (2013)CrossRefGoogle Scholar
  9. 9.
    Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the ICML, pp. 359–366 (2000)Google Scholar
  10. 10.
    Han, J., Sun, Z., Hao, H.: Selecting feature subset with sparsity and low redundancy for unsupervised learning. Knowl.-Based Syst. 86, 210–223 (2015)CrossRefGoogle Scholar
  11. 11.
    He, X., Cai, D., Niyogi, P.: Laplacian score for feature selection. In: Advances in Neural Information Processing Systems 18 [Neural Information Processing Systems, NIPS 2005, Vancouver, British Columbia, Canada, 5–8 December 2005], pp. 507–514 (2005)Google Scholar
  12. 12.
    Huang, Y., McCullagh, P.J., Black, N.D.: An optimization of relieff for classification in large datasets. Data Knowl. Eng. 68(11), 1348–1356 (2009)CrossRefGoogle Scholar
  13. 13.
    Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324 (1997)CrossRefzbMATHGoogle Scholar
  14. 14.
    Last, M., Kandel, A., Maimon, O.: Information-theoretic algorithm for feature selection. Pattern Recogn. Lett. 22(6/7), 799–811 (2001)CrossRefzbMATHGoogle Scholar
  15. 15.
    Malekipirbazari, M., Aksakalli, V.: Risk assessment in social lending via random forests. Expert Syst. Appl. 42(10), 4621–4631 (2015)CrossRefGoogle Scholar
  16. 16.
    Pohjalainen, J., Räsänen, O., Kadioglu, S.: Feature selection methods and their combinations in high-dimensional classification of speaker likability, intelligibility and personality traits. Comput. Speech Lang. 29(1), 145–171 (2015)CrossRefGoogle Scholar
  17. 17.
    Press, W.H., Teukolsky, S.A., Vetterling, W.T., Flannery, B.P.: Numerical Recipes in C, 2nd edn. Cambridge University Press, Cambridge (1992)zbMATHGoogle Scholar
  18. 18.
    Saeys, Y., Inza, I., Larrañaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)CrossRefGoogle Scholar
  19. 19.
    Sotoca, J.M., Pla, F.: Supervised feature selection by clustering using conditional mutual information-based distances. Pattern Recogn. 43(6), 2068–2081 (2010)CrossRefzbMATHGoogle Scholar
  20. 20.
    Yeh, I.-C., Lien, C.-H.: The comparisons of data mining techniques for the predictive accuracy of probability of default of credit card clients. Expert Syst. Appl. 36(2), 2473–2480 (2009)CrossRefGoogle Scholar
  21. 21.
    Jin, Y., Zhu, Y.D.: A data-driven approach to predict default risk of loan for online Peer-to-Peer (P2P) lending. In: Proceedings of Fifth International Conference on Communication Systems and Network Technologies, pp. 609–613 (2015)Google Scholar
  22. 22.
    Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. 5, 1205–1224 (2004)MathSciNetzbMATHGoogle Scholar
  23. 23.
    Zhang, D., Chen, S., Zhou, Z.-H.: Constraint score: a new filter method for feature selection with pairwise constraints. Pattern Recogn. 41(5), 1440–1451 (2008)CrossRefzbMATHGoogle Scholar
  24. 24.
    Zhao, H., Le, W., Liu, Q., Ge, Y., Chen, E.: Investment recommendation in P2P lending: a portfolio perspective with risk management. In: Proceedings of ICDM, pp. 1109–1114 (2014)Google Scholar
  25. 25.
    Zhao, Z., Wang, L., Liu, H., Ye, J.: On similarity preserving feature selection. IEEE Trans. Knowl. Data Eng. 25(3), 619–632 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Lixin Cui
    • 1
  • Lu Bai
    • 1
    Email author
  • Yue Wang
    • 1
  • Xiao Bai
    • 2
  • Zhihong Zhang
    • 3
  • Edwin R. Hancock
    • 4
  1. 1.School of InformationCentral University of Finance and EconomicsBeijingChina
  2. 2.School of Computer Science and EngineeringBeihang UniversityBeijingChina
  3. 3.Software SchoolXiamen UniversityXiamenChina
  4. 4.Department of Computer ScienceUniversity of YorkYorkUK

Personalised recommendations