Forming appropriate peer groups for bank research: a cluster analysis of bank financial statements


Choosing appropriate peer groups for commercial banks is important to investors comparing bank performance, for regulators evaluating safety and soundness, for bank management looking at merger alternatives or relative performance, and for bank researchers testing hypotheses and making policy judgments about the banking system. We use commercial bank financial statements with common size variables as the inputs to a cluster analysis model to identify clusters or groups of banks with financial structures that are relatively homogeneous within groups and distinct across groups. Managerial strategies and idiosyncrasies, local and global economic conditions, and the regulatory environment shape bank financial statements, and financial statements should reflect the financial and operational differences across banks. Using year-end data from 2014, we cluster 6444 banks into several such groups. Our results show that bank clusters are formed largely around loan types, funding differences, and management’s strategic choices. We compare the ability of bank clusters and bank size to explain several widely used measures of bank performance and risk in additional years. These bank clusters are shown to have substantially greater explanatory power in regression models when compared to groupings based on bank size in several different years.

This is a preview of subscription content, log in to check access.


  1. 1.

    We repeat our analysis for three additional years (2015, 2016, and 2017) as discussed in Section 3.2.

  2. 2.

    We use pre-tax earnings to mitigate the effect of S-corporation banks that do not pay taxes at the corporate level.

  3. 3.

    These include the pseudo F-statistic, the approximate expected overall R-square, and the cubic clustering criterion.

  4. 4.

    The conclusions drawn in Section 3.2 are not sensitive to forming a different number of clusters.

  5. 5.

    The naming of clusters is arbitrary. For convenience, we name the clusters based on their descending number of members. Cluster 1 is the cluster with the largest number of members and Cluster 25 the smallest.

  6. 6.

    We will generally ignore the small clusters in our analysis because their number of members is too small for hypothesis testing.

  7. 7.

    Centroid distances between all pairs of clusters are available upon request.

  8. 8.

    The variance-weighted R-square is greater because, on average, the variables with higher variances had higher R-squares.

  9. 9.

    Using an F-test, the hypothesis that the regression coefficients are equal to zero is rejected at low p-values for all regressions, even those with the lowest R-squares.

  10. 10.

    All unreported results are available upon request.

  11. 11.

    We use one fewer size group than Hughes and Mester (2013) because we combine their two largest groups ($50 billion to $100 billion and > $100 billion) into one group due to the small number of banks in those size groups. We repeat the analysis with the bank size categories of Berger and Bouwman (2013) and Black and Hazelwood (2013) and find results that are qualitatively the same. Additional results are available upon request.

  12. 12.

    We omit results for 2015 and 2016 for brevity. Conclusions are unchanged, and results are available upon request.


  1. Ayadi R, De Groen WP, Sassi I, Mathlouthi W, Rey H, Aubry O (2016) Banking business models monitor 2015 EUROPE. International Research Centre on Cooperative Finance, Montreal

  2. Barth JR, Brumbaugh RD Jr, Litan RE (1992) The future of American banking. Routledge, New York.

    Google Scholar 

  3. Berger AN (1995) The relationship between capital and earnings in banking. J Money Credit Bank 27(2):432–456.

    Article  Google Scholar 

  4. Berger AN, Bouwman CHS (2013) How does capital affect bank performance during financial crises? J Financ Econ 109(1):146–176.

    Article  Google Scholar 

  5. Berger AN, Mester LJ (1997) Inside the black box: what explains differences in the efficiencies of financial institutions? J Bank Financ 21(7):895–947.

    Article  Google Scholar 

  6. Black LK, Hazelwood LN (2013) The effect of TARP on bank risk-taking. J Financ Stab 9(4):790–803.

    Article  Google Scholar 

  7. Carlson M, Shan H, Warusawitharana M (2013) Capital ratios and bank lending: a matched bank approach. J Financ Intermed 22(4):663–687.

    Article  Google Scholar 

  8. Cleary S, Hebb G (2016) An efficient and functional model for predicting bank distress: in and out of sample evidence. J Bank Financ 64(C):101–111.

    Article  Google Scholar 

  9. Cole RA, Gunther JW (1998) Predicting bank failures: a comparison of on- and off-site monitoring systems. J Financ Serv Res 13(2):103–117.

    Article  Google Scholar 

  10. Cole RA, White LJ (2012) Déjà vu all over again: the cause of US commercial bank failures this time around. J Finan Serv Res 42(1):5–29.

    Article  Google Scholar 

  11. Dardac N, Boitan IA (2009) Cluster analysis approach for banks’ risk profile: the Romanian evidence. Eur Res Studies J 12(1):109–118

    Google Scholar 

  12. Deakin EB (1976) Distributions of financial accounting ratios: some empirical evidence. Account Rev 51(1):90–96 Accessed 28 May 2019

  13. DeLong G (2001) Stockholder gains from focusing versus diversifying bank mergers. J Financ Econ 59(2):221–252.

    Article  Google Scholar 

  14. Demsetz RS, Strahan PE (1997) Diversification, size, and risk at bank holding companies. J Money Credit Bank 29(3):300–313.

    Article  Google Scholar 

  15. Demyanyk Y, Hasan I (2010) Financial crises and bank failures: a review of prediction methods. Omega 38(5):315–324.

    Article  Google Scholar 

  16. Dias JG, Ramos SB (2014) The aftermath of the subprime crisis: a clustering analysis of world banking sector. Rev Quant Finan Acc 42(2):293–308.

    Article  Google Scholar 

  17. Diaz BD, Azofra SS (2009) Determinants of premiums paid in European banking mergers and acquisitions. Int J Bank Account Financ 1(4):358–380.

    Article  Google Scholar 

  18. Elsas R, Hackethal A, Holhauser M (2010) The anatomy of bank diversification. J Bank Financ 34(6):1274–1287.

    Article  Google Scholar 

  19. Ercan H, Sayaseng S (2016) The cluster analysis of the banking sector in Europe. Economics and Management of Global Value Chains, Lengyel I, Vas Z (eds):111–127. Accessed 28 May 2019

  20. Hubbard GR, Kuttner KN, Palia DN (2002) Are there bank effects in borrower cost of funds? Evidence from a matched sample of borrowers and banks. J Bus 75(4):559–581.

    Article  Google Scholar 

  21. Hughes JP, Mester LJ (1998) Bank capitalization and cost: evidence of scale economies in risk management and signaling. Rev Econ Stat 80(2):314–325.

    Article  Google Scholar 

  22. Hughes JP, Mester LJ (2013) Who said large banks don’t experience scale economies? Evidence from a risk-return-driven cost function. J Financ Intermed 22(4):559–585.

    Article  Google Scholar 

  23. Hughes JP, Lang W, Mester LJ, Moon C-G (1996) Efficient banking under interstate branching. J Money Credit Bank 28(4):1045–1071.

    Article  Google Scholar 

  24. Jin JY, Kanagaretnam K, Lobo GJ (2011) Ability of accounting and audit quality variables to predict bank failure during the financial crisis. J Bank Financ 35(11):2811–2819.

    Article  Google Scholar 

  25. Laeven L, Levine R (2007) Is there a diversification discount in financial conglomerates? J Financ Econ 85(2):331–367.

    Article  Google Scholar 

  26. Lane WR, Looney SW, Wansley JW (1986) An application of the cox proportional hazards model to bank failure. J Bank Financ 10(4):511–531.

    Article  Google Scholar 

  27. Lev B, Sunder S (1979) Methodological issues in the use of financial ratios. J Account Econ 1(3):187–210.

    Article  Google Scholar 

  28. Meyer PA, Pifer HW (1970) Prediction of bank failures. J Financ 25(4):853–868.

    Article  Google Scholar 

  29. Ravi Kumar P, Ravi V (2007) Bankruptcy prediction in banks and firms via statistical and intelligent techniques—a review. Eur J Opers Res 180(1):1–28.

    Article  Google Scholar 

  30. Simonson DG, Stowe JD, Watson CJ (1983) A canonical correlation analysis of commercial bank asset/liability structures. J Financ Quant Anal 18(1):125–140.

    Article  Google Scholar 

  31. Sinkey JF Jr (1975) A multivariate statistical analysis of the characteristics of problem banks. J Financ 30(1):21–36.

    Article  Google Scholar 

  32. Sorensen CK, Gutierrez JMP (2006) Euro area banking sector integration using hierarchical cluster analysis techniques. European Central Bank Working Paper Series Accessed 28 May 2019

  33. White HL (1980) A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48(4):817–838.

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Travis R. Davidson.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Table 7 Correlations Table A presents correlations among the 33 common size variables. Correlations significant at the 1% level or better are bolded

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cyree, K.B., Davidson, T.R. & Stowe, J.D. Forming appropriate peer groups for bank research: a cluster analysis of bank financial statements. J Econ Finan 44, 211–237 (2020).

Download citation


  • Commercial bank taxonomy
  • Financial institutions
  • Cluster analysis
  • Bank financial statements
  • Bank peer groups

JEL classification

  • G21