Journal of Industry, Competition and Trade

, Volume 18, Issue 3, pp 253–294 | Cite as

Forecasting European high-growth Firms - A Random Forest Approach

  • Jurij WeinblatEmail author


High-growth firms (HGFs) have aroused considerable interest both by researchers and policymakers mainly because of their substantial contribution to job creation and to the advancement of the surrounding economy (Acs et al., Small Bus Res Summ (328):1–92 2008, Schreyer 2000). Any initiative to foster HGFs requires the ability to reliably anticipate them. There seems to be a consensus in previous mainly regression-based studies on the impossibility of such a prediction (Coad, Doc Trav Centre d’Econ Sorbonne 24:1–72 2007b). Using a novel random forest (RF) based approach and a recent data set (2004–2014) covering 179970 unique firms from nine European countries, we show the potential of a true out-of-sample prediction: depending on the country, we were able to determine up to 39% of all HGFs by selecting only ten percent of all firms. The RF algorithm is both used to determine relevant predictors and for the actual prediction and pattern analysis. Both the selection of the best RF and the cross-country comparisons are based on a Receiver Operating Characteristic analysis. We find that most accurate HGF predictions are possible in GB, France, and Italy and largely confirm this ranking using Venkatraman’s unpaired test. Apart from the firm’s size, age, and past growth, the sales per employee, the fixed assets ratio, and the debt ratio are quite important. Our “typical” HGFs determined using RF prototypes have been older and bigger than the remaining firms, which is counterintuitive and atypical in literature. Based on our finding, typical HGFs are not start-ups, which questions current political funding strategies. Apart from that, our results do not support and rather refute the existence of a survivorship bias. Moreover, approximately every fourth HGF remains to be a HGF in the next period.


High-growth firms Random forest Forecasting Variable importance 

JEL Classification

M13 L25 O51 O52 C53 


  1. Ablameyko S (2003) Neural networks for instrumentation, measurement and related industrial applications, 1st edn. IOS Press, CremaGoogle Scholar
  2. Acs Z, Parsons W, Tracy S (2008) High-impact firms: gazelles revisited. Small Business Research Summary No Volume(328):1–92.
  3. Acs ZJ, Mueller P (2008) Employment effects of business dynamics: Mice, gazelles and elephants. Small Bus Econ 30(1):85–100CrossRefGoogle Scholar
  4. Aiginger K (2006) Competitiveness: from a dangerous obsession to a welfare creating ability with positive externalities. J Indust Compet Trade 6(2):161–177CrossRefGoogle Scholar
  5. Albrecht WS, Stice EK, Stice JD (2007) Financial Accounting, 1st edn. Cengage LearningGoogle Scholar
  6. Alpaydin E (2004) Introduction to machine learning, vol 1. MIT Press, MassachusettsGoogle Scholar
  7. Altman EI (1968) Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. J Finance 23(4):589–609CrossRefGoogle Scholar
  8. Audretsch DB, Mahmood T (1994) Firm selection and industry evolution: the post-entry performance of new firms. J Evol Econ 4(3):243–260CrossRefGoogle Scholar
  9. Baily MN, Bartelsman EJ, Haltiwanger J (1996) Downsizing and productivity growth: Myth or reality? Small Bus Econ 8(4):259–278CrossRefGoogle Scholar
  10. Barringer BR, Jones FF, Neubaum DO (2005) A quantitative content analysis of the characteristics of rapid-growth firms and their founders. J Bus Ventur 20(5):663–687CrossRefGoogle Scholar
  11. Batista G, Prati RC, Monard MC (2004) A study of the behavior of several methods for balancing machine learning training data. ACM Sigkdd Explor Newslett 6 (1):20–29CrossRefGoogle Scholar
  12. Becchetti L (1995) Finance, investment and innovation: a theoretical and empirical comparative analysis. Empirica 22(3):167–184CrossRefGoogle Scholar
  13. Becchetti L, Trovato G (2002) The determinants of growth for small and medium sized firms. the role of the availability of external finance. Small Bus Econ 19(4):291–306CrossRefGoogle Scholar
  14. Becker HP (2010) Investition und finanzierung: grundlagen der betrieblichen finanzwirtschaft, 4th edn. Gabler Verlag, WiesbadenCrossRefGoogle Scholar
  15. Behr A, Weinblat J (2017) Default patterns in seven eu countries: A random forest approach. Int J Econ Bus 24(2):181–222CrossRefGoogle Scholar
  16. Birch D, Medoff J (1994) Gazelles. In: Solmon L, Levenson A (eds) Labor Markets, Employment Policy and Job Creation. Westview Press, Boulder, pp 159–168Google Scholar
  17. Birch DL (1981) Who creates jobs?. The public interest 65:3–14Google Scholar
  18. Boeri T, Cramer U (1992) Employment growth, incumbents and entrants: evidence from Germany. Int J Indust Organ 10(4):545–565CrossRefGoogle Scholar
  19. Bravo Biosca A (2010) Growth dynamics: Exploring business growth and contraction in europe and the us. Research report, NESTAGoogle Scholar
  20. Breiman L (1996) Bagging predictors. Mach Learn 24(2):123–140Google Scholar
  21. Breiman L (2001) Random forests. Mach Learn 45(1):5—-32CrossRefGoogle Scholar
  22. Breiman L, Cutler A (2004) Random forests.
  23. Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Wadsworth International Group, BelmontGoogle Scholar
  24. Brown I, Mues C (2012) An experimental comparison of classification algorithms for imbalanced credit scoring data sets. Expert Syst Appl 39(3):3446–3453CrossRefGoogle Scholar
  25. Buenstorf G, Cantner U, Hanusch H, Hutter M, Lorenz HW, Rahmeyer F (2013) The Two Sides of Innovation: Creation and Destruction in the Evolution of Capitalist Economies. Springer Science & Business Media, Dordrecht, LondonCrossRefGoogle Scholar
  26. Chandra DK, Ravi V, Bose I (2009) Failure prediction of dotcom companies using hybrid intelligent techniques. Expert Syst Appl 36(3):4830–4837CrossRefGoogle Scholar
  27. Chawla NV (2005) Data mining for imbalanced datasets: An overview. In: Data Mining and Knowledge Discovery Handbook. Springer, pp 853–867Google Scholar
  28. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) Smote: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357CrossRefGoogle Scholar
  29. Chen KS, Babb EM, Schrader LF (1985) Growth of large cooperative and proprietary firms in the us food sector. Agribusiness 1(2):201–210CrossRefGoogle Scholar
  30. Coad A (2007a) A closer look at serial growth rate correlation. Rev Indust Organ 31(1):69–82CrossRefGoogle Scholar
  31. Coad A (2007b) Firm growth: A survey. Doc Trav Centre d’Econ Sorbonne 24:1–72Google Scholar
  32. Coad A, Broekel T (2012) Firm growth and productivity growth: evidence from a panel var. Appl Econ 44(10):1251–1269CrossRefGoogle Scholar
  33. Coad A, Hölzl W (2009) On the autocorrelation of growth rates. J Indust Compet Trade 9(2):139–166CrossRefGoogle Scholar
  34. Coad A, Daunfeldt SO, Hölzl W, Johansson D, Nightingale P (2014a) High-growth firms: introduction to the special section. Indust Corp Chang 23(1):91–112CrossRefGoogle Scholar
  35. Coad A, Daunfeldt SO, Johansson D, Wennberg K (2014b) Whom do high-growth firms hire? Indust Corp Chang 23(1):293–327CrossRefGoogle Scholar
  36. Cross EP, Rarnchandani H (1995) Comparing classification accuracy of neural networks, binary logit regression and discriminant analysis for insolvency prediction of life insurers. J Econ Finan 19(13):1– 18Google Scholar
  37. Daunfeldt SO, Halvarsson D (2015) Are high-growth firms one-hit wonders? evidence from Sweden. Small Bus Econ 44(2):361–383CrossRefGoogle Scholar
  38. Daunfeldt SO, Elert N, Johansson D (2014) The economic contribution of high-growth firms: do policy implications depend on the choice of growth indicator? J Indust Compet Trade 14(3):337–365CrossRefGoogle Scholar
  39. Dunne T, Roberts MJ, Samuelson L (1989) The growth and failure of us manufacturing plants. Q J Econ 104(4):671–698CrossRefGoogle Scholar
  40. European Commission (2010) Communication from the commission europe 2020: A strategy for smart, sustainable and inclusive growth. Technical reportGoogle Scholar
  41. Fagiolo G, Luzzi A (2006) Do liquidity constraints matter in explaining firm size and growth? some evidence from the italian manufacturing industry. Indust Corp Chang 15(1):1–39CrossRefGoogle Scholar
  42. Fawcett T (2006) An introduction to roc analysis. Pattern Recogn lett 27 (8):861–874CrossRefGoogle Scholar
  43. Fotopoulos G, Louri H (2004) Firm growth and fdi: Are multinationals stimulating local industrial development? J Indust Compet Trade 4(3):163–189CrossRefGoogle Scholar
  44. Frydman H, Altman EI, Kao DL (1985) Introducing recursive partitioning for financial classification: The case of financial distress. J Finan 40(1):269–291CrossRefGoogle Scholar
  45. Gibrat R (1931) Les inégalités économiques. Recueil SireyGoogle Scholar
  46. Gorunescu F (2011) Data Mining: Concepts, Models and Techniques, vol 1. Springer Science & Business MediaGoogle Scholar
  47. Han J, Kamber M, Pei J (2011) Data mining: concepts and techniques, 3rd edn. Morgan Kaufmann, Amsterdam, Boston, Heidelberg, LondonGoogle Scholar
  48. Härdle W, Moro R, Schäfer D (2005) Predicting bankruptcy with support vector machines. In: Statistical Tools for Finance and Insurance. Springer, pp 225–248Google Scholar
  49. Harhoff D, Stahl K, Woywode M (1998) Legal form, growth and exit of west german firms—empirical results for manufacturing, construction, trade and service industries. J Indust Econ 46(4):453–488CrossRefGoogle Scholar
  50. Hart WE, Krasnogor N, Smith JE (2005) Recent advances in memetic algorithms, 1st edn. Springer Science and Business Media, Berlin, HeidelbergCrossRefGoogle Scholar
  51. Hassan MR, Ramamohanarao K, Karmakar C, Hossain MM, Bailey J (2010) A novel scalable multi-class roc for effective visualization and computation. In: Zaki M J, Yu J X, Ravidran B, Pudi V (eds) Advances in Knowledge Discovery and Data Mining, Part I: 14th Pacific-Asia Conference. Springer-Verlag, Berlin, Heidelberg, pp 107–120CrossRefGoogle Scholar
  52. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning, 2nd edn. Science + Business, Springer, New YorkCrossRefGoogle Scholar
  53. Henrekson M, Johansson D (2010) Gazelles as job creators: a survey and interpretation of the evidence. Small Bus Econ 35(2):227–244CrossRefGoogle Scholar
  54. Hölzl W (2014) Persistence, survival, and growth: a closer look at 20 years of fast-growing firms in Austria. Indust Corp Chang 23(1):199–231CrossRefGoogle Scholar
  55. Jovanovic B (1982) Selection and the evolution of industry. Econ: J Econ Soc 50 (3):649–670CrossRefGoogle Scholar
  56. Kartasheva AV, Traskin M (2011) Insurers’ insolvency prediction using random forest classification.
  57. Krzanowski WJ, Hand DJ (2009) ROC curves for continuous data. CRC PressGoogle Scholar
  58. Kumar PR, Ravi V (2007) Bankruptcy prediction in banks and firms via statistical and intelligent techniques – a review. Eur J Oper Res 180(1):1–28CrossRefGoogle Scholar
  59. Lam M (2004) Neural network techniques for financial performance prediction: integrating fundamental and technical analysis. Decis Support Syst 37(4):567–581CrossRefGoogle Scholar
  60. Levratto N, Zouikri M, Tessier L (2010) The determinants of growth for smes-a longitudinal study from french manufacturing firms. Technical report, CNRS-EconomiX,
  61. Löbbe H (2001) Klassifizierung landwirtschaftlicher jahresabschlüsse mittels neuronaler netze und fuzzy systeme PhD thesis. Rheinischen Friedrich-Wilhelms-Universität zu Bonn, HammGoogle Scholar
  62. Lopez-Garcia P, Puente S (2012) What makes a high-growth firm? a dynamic probit analysis using spanish firm-level data. Small Bus Econ 39(4):1029–1041CrossRefGoogle Scholar
  63. Maimon O, Rokach L (2006) Data mining and knowledge discovery handbook. Springer Science & Business Media, Tel-AvivGoogle Scholar
  64. National Commission on Entrepreneurship (2011) High-growth companies: Mapping america’s entrepreneurial landscape. Technical report, National Commission on EntrepreneurshipGoogle Scholar
  65. Ohlson JA (1980) Financial ratios and the probabilistic prediction of bankruptcy. J Account Res 18(1):109–131CrossRefGoogle Scholar
  66. Olson DL, Delen D, Meng Y (2012) Comparative analysis of data mining methods for bankruptcy prediction. Decis Support Syst 52(2):464–473CrossRefGoogle Scholar
  67. Organisation for Economic Co-operation and Development (2010) High-growth enterprises: What governments can do to make a difference. OECD Publish 1(1):1–238Google Scholar
  68. Pagans FG (2015) Predictive Analytics Using Rattle and Qlik Sense. Packt Publishing LtdGoogle Scholar
  69. Penner SJ (2004) Introduction to health care economics & financial management: fundamental concepts with practical applications, 1st edn. Lippincott Williams & Wilkins, New York, LondonGoogle Scholar
  70. Puri S (2012) Introduction to retail math, vol 1. Introduction to Retail Math, IndiaGoogle Scholar
  71. Pytlik M (1995) Diskriminanzanalyse und künstliche Neuronale Netze zur Klassifizierung von Jahresabschlüssen: Ein empirischer Vergleich. Europäischer Verlag der Wissenschaft, Frankfurt am MainGoogle Scholar
  72. Rokach L (2007) Data mining with decision trees: theory and applications. series in machine perception and artificial intelligence world scientific. Hackensack, LondonCrossRefGoogle Scholar
  73. Schneider O, Lindner A (2010) The value of lead logistics services. In: Vallespir B, Alix T (eds) Advances in Production Management Systems. New Challenges, New Approaches, pp 315–322Google Scholar
  74. Schreyer P (2000) High-growth firms and employment, oECD Science, Technology and Industry Working PapersGoogle Scholar
  75. Shane S (2009) Why encouraging more people to become entrepreneurs is bad public policy. Small Bus Econ 33(2):141–149CrossRefGoogle Scholar
  76. Shin KS, Lee TS, jung Kim H (2005) An application of support vector machines in bankruptcy prediction model. Expert Syst Appl 28(1):127–135CrossRefGoogle Scholar
  77. Shirata CY (1998) Financial ratios as predictors of bankruptcy in Japan: an empirical research. Tsukuba Coll Technol Jpn 1(1):1–17Google Scholar
  78. Stickney C, Weil R, Schipper K, Francis J (2009) Financial accounting: an introduction to concepts, methods and uses, 1st edn. Cengage Learning, MasonGoogle Scholar
  79. Strobl C, Boulesteix AL, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: Illustrations, sources and a solution. Bioinformatics 8 (25):1–21. Google Scholar
  80. Vause B (2009) Guide to analysing companies the economist. Wiley, New YorkGoogle Scholar
  81. Venkatraman E (2000) A permutation test to compare receiver operating characteristic curves. Biometrics 56(4):1134–1138CrossRefGoogle Scholar
  82. Venkatraman E, Begg CB (1996) A distribution-free procedure for comparing receiver operating characteristic curves from a paired experiment. Biometrika 83 (4):835–848CrossRefGoogle Scholar
  83. Verikas A, Gelzinis A, Bacauskiene M (2010) Mining data with random forests: A survey and results of new tests. Pattern Recogn 44(2):330–349CrossRefGoogle Scholar
  84. Wagner J (2007) Exports and productivity: A survey of the evidence from firm-level data. World Econ 30(1):60–82CrossRefGoogle Scholar
  85. Williams G (2011) Data mining with rattle and R: The art of excavating data for knowledge discovery. Springer Science & Business Media, New YorkCrossRefGoogle Scholar
  86. Witten IH, Frank E, Hall MA (2011) Data mining: Practical machine learning tools and techniques: practical machine learning tools and technique. Elsevier, Amsterdam, BostonGoogle Scholar
  87. Yeh CC, Chi DJ, Lin YR (2014) Going-concern prediction using hybrid random forests and rough set approach. Inf Sci 254:98–110CrossRefGoogle Scholar
  88. Zhou XH, Obuchowski NA, McClish DK (2014) Statistical Methods in Diagnostic Medicine. WileyGoogle Scholar
  89. Zighed DA, Komorowski J, Zytkow JM, Zytkow J (2000) Principles of data mining and knowledge discovery: 4th european conference, PKDD, 2000, Lyon, France, Proceedings, vol 1. Springer Science & Business Media, Berlin, Heidelberg, New YorkCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  1. 1.Faculty of Economics and Business, Chair of StatisticsUniversity of Duisburg-EssenEssenGermany

Personalised recommendations