Machine-Learning-Based Statistical Arbitrage Football Betting

  • Julian KnollEmail author
  • Johannes Stübinger
Technical Contribution


Across countries and continents, football (soccer) has drawn increasingly more attention over the last decades and developed into a huge commercial complex. Consequently, the market of bookmakers providing the possibility to bet on the result of football matches grew rapidly, especially with the appearance of the internet. With a high number of games every week in multiple countries, football league matches hold enormous potential for generating profits over time with the use of advanced betting strategies. In this paper, we use machine learning for predicting the outcome of football league matches by exploiting data about match characteristics. Based on insights from the field of statistical arbitrage stock market trading, we show that one could generate meaningful profits over time by betting accordingly. A simulation study analyzing the matches of the five top European football leagues from season 2013/14 to 2017/18 presented economically and statistically significant returns achieved by exploiting large data sets with modern machine learning algorithms. In contrast to these modern algorithms, the break-even point could not be reached with an ordinary linear regression approach or simple betting strategies, e.g. always betting on the home team.


Football Betting strategy Machine learning Statistical arbitrage Sports forecasting 



We are grateful to two anonymous referees for many helpful suggestions on this topic.


  1. 1.
    Archontakis F, Osborne E (2007) Playing it safe? A Fibonacci strategy for soccer betting. J Sports Econ 8(3):295–308Google Scholar
  2. 2.
    Avellaneda M, Lee JH (2010) Statistical arbitrage in the US equities market. Quant Finance 10(7):761–782MathSciNetzbMATHGoogle Scholar
  3. 3.
    Bernile G, Lyandres E (2011) Understanding investor sentiment: the case of soccer. Financ Manag 40(2):357–380Google Scholar
  4. 4.
    Bertram WK (2010) Analytic solutions for optimal statistical arbitrage trading. Phys A Stat Mech Appl 389(11):2234–2243Google Scholar
  5. 5.
    Bollinger J (2001) Bollinger on bollinger bands. McGraw-Hill, New YorkGoogle Scholar
  6. 6.
    Boshnakov G, Kharrat T, McHale IG (2017) A bivariate weibull count model for forecasting association football scores. Int J Forecast 33(2):458–466Google Scholar
  7. 7.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32zbMATHGoogle Scholar
  8. 8.
    Bühlmann P, Hothorn T (2007) Boosting algorithms: regularization, prediction and model fitting. Stat Sci 22(4):477–505MathSciNetzbMATHGoogle Scholar
  9. 9.
    Chen T, He T, Benesty M (2015) xgboost: extreme gradient boosting. R package version 0.3-0. In: Technical ReportGoogle Scholar
  10. 10.
    Choi D, Hui SK (2014) The role of surprise: understanding overreaction and underreaction to unanticipated events using in-play soccer betting market. J Econ Behav Org 107:614–629Google Scholar
  11. 11.
    Croxson K, Reade J (2014) Information and efficiency: goal arrival in soccer betting. Econ J 124(575):62–91Google Scholar
  12. 12.
    Dixon M, Coles S (1997) Modelling association football scores and inefficiencies in the football betting market. J R Stat Soc Ser C (Appl Stat) 46(2):265–280Google Scholar
  13. 13.
    Dragulescu AA, Dragulescu MAAA (2014) PROVIDE, R. Package ‘xlsx’. Cell, 2018, 9. Jg., Nr. 1, S. 5Google Scholar
  14. 14.
    Egidi L, Pauli F, Torelli N (2018) Combining historical data and bookmakers’ odds in modelling football scores. Stat Model 18(5–6):436–459MathSciNetGoogle Scholar
  15. 15.
    Endres S, Stübinger J (2019) Optimal trading strategies for Lévy-driven Ornstein–Uhlenbeck processes. Appl Econ 51(29):3153–3169Google Scholar
  16. 16.
    Endres S, Stübinger J (2019) Regime-switching modeling of high-frequency stock returns with Lévy jumps. Quantitative Finance, ForthcomingGoogle Scholar
  17. 17.
    Forrest D, Simmons R (2008) Sentiment in the betting market on Spanish football. Appl Econ 40(1):119–126Google Scholar
  18. 18.
    Franck E, Verbeek E, Nüesch S (2010) Prediction accuracy of different market structures—bookmakers versus a betting exchange. Int J Forecast 26(3):448–459Google Scholar
  19. 19.
    Franck E, Verbeek E, Nüesch S (2013) Inter-market arbitrage in betting. Economica 80(318):300–325Google Scholar
  20. 20.
    Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat: 1189–1232Google Scholar
  21. 21.
    Gatev E, Goetzmann WN, Rouwenhorst KG (2006) Pairs trading: performance of a relative-value arbitrage rule. Rev Financ Stud 19(3):797–827Google Scholar
  22. 22.
    Gil RGR, Levitt SD (2012) Testing the efficiency of markets in the 2002 World Cup. J Predict Markets 1(3):255–270Google Scholar
  23. 23.
    Godin F, Zuallaert J, Vandersmissen B, de Neve W, van de Walle R (2014) Beating the bookmakers: leveraging statistics and Twitter microposts for predicting soccer results. In: KDD workshop on large-scale sports analytics, New York, USA, 24–28 Aug 2014Google Scholar
  24. 24.
    Groll A, Kneib T, Mayr A, Schauberger G (2018) On the dependency of soccer scores—a sparse bivariate poisson model for the UEFA European football championship 2016. J Quant Anal Sports 14(2):65–79Google Scholar
  25. 25.
    Groll A, Ley C, Schauberger G, Van Eetvelde H (2019) A hybrid random forest to predict soccer matches in international tournaments. J Quant Anal ports. (to appear)Google Scholar
  26. 26.
    Groll A, Schauberger G, Tutz G (2015) Prediction of major international soccer tournaments based on team-specific regularized Poisson regression: an application to the FIFA World Cup 2014. J Quant Anal Sports 11(2):97–115Google Scholar
  27. 27.
    Hastie T, Tibshirani R, Friedman J, Franklin J (2005) The elements of statistical learning: data mining, inference and prediction. Math Intelligencer 27(2):83–85Google Scholar
  28. 28.
    Hothorn T, Hornik K, Zeileis A (2006) Unbiased recursive partitioning: a conditional inference framework. J Comput Graph Stat 15(3):651–674MathSciNetGoogle Scholar
  29. 29.
    Jegadeesh N, Titman S (1993) Returns to buying winners and selling losers: implications for stock market efficiency. J Finance 48(1):65–91Google Scholar
  30. 30.
    Kelly AH (1956) The fourteenth amendment reconsidered: the segregation question. Mich Law Rev 54(8):1049–1086Google Scholar
  31. 31.
    Knoll J, Stübinger J, Grottke M (2019) Exploiting social media with higher-order factorization machines: statistical arbitrage on high-frequency data of the S&P 500. Quant Finance 19(4):571–585MathSciNetGoogle Scholar
  32. 32.
    Koopman EME, Hakemulder F (2015) Effects of literature on empathy and self-reflection: a theoretical-empirical framework. J Lit Theory 9(1):79–111Google Scholar
  33. 33.
    Leifeld P (2013) texreg: conversion of statistical model output in R to HTML tables. J Stat Softw 55(8):1–24Google Scholar
  34. 34.
    Levitt SD (2004) Why are gambling markets organised so differently from financial markets? Econ J 114(495):223–246Google Scholar
  35. 35.
    Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22Google Scholar
  36. 36.
    Lisi F, Zanella G (2017) Tennis betting: can statistics beat bookmakers? Electron J Appl Stat Anal 10(3):790–808MathSciNetGoogle Scholar
  37. 37.
    Liu B, Chang LB, Geman H (2017) Intraday pairs trading strategies on high frequency data: the case of oil companies. Quant Finance 17(1):87–100MathSciNetzbMATHGoogle Scholar
  38. 38.
    Luckner S, Schröder J, Slamka C (2008) On the forecast accuracy of sports prediction markets. Negotiation, auctions, and market engineering. Springer, Berlin, Heidelberg, pp 227–234Google Scholar
  39. 39.
    Maher M (1982) Modelling association football scores. Stat Neerl 36(3):109–118Google Scholar
  40. 40.
    Meyer D, Dimitriadou E, Hornik K, Weingessel A, Leisch F (2017) e1071: misc functions of the department of statistics, probability theory group (formerly: E1071), TU Wien. R package version 1.6-8Google Scholar
  41. 41.
    Palomino F, Renneboog L, Zhang C (2009) Information salience, investor sentiment, and stock returns: the case of British soccer betting. J Corp Finance 15(3):368–387Google Scholar
  42. 42.
    Peterson BG, Carl P, Boudt K, Bennett R, Ulrich J, Zivot E, Wuertz D (2014) Performance analytics: econometric tools for performance and risk analysis. R package version 1.4. 3541Google Scholar
  43. 43.
    Pfaff B, McNeil A, Ulmann S (2013) QRM: provides R language code to examine quantitative risk management concepts. R package version 0.4-9.
  44. 44.
    R Core Team (2017) stats: a language and environment for statistical computing. R packageGoogle Scholar
  45. 45.
    Team RC, Wuertz D, Setz T, Chalabi Y (2015) timeSeries: Rmetrics —Financial time series objects. R package version, 3012Google Scholar
  46. 46.
    Rue H, Salvesen O (2000) Prediction and retrospective analysis of soccer matches in a league. J R Stat Soc Ser D (Stati) 49(3):399–418Google Scholar
  47. 47.
    Ryan JA, Ulrich JM (2017) quantmod: Quantitative financial modelling framework. R package version 0.4-12Google Scholar
  48. 48.
    Ryan JA, Ulrich JM (2014) xts: eXtensible time series. R package version 0.8-2Google Scholar
  49. 49.
    Schauberger G, Groll A, Tutz G (2018) Analysis of the importance of on-field covariates in the German Bundesliga. J Appl Stat 45(9):1561–1578MathSciNetGoogle Scholar
  50. 50.
    Spann M, Skiera B (2009) Sports forecasting: a comparison of the forecast accuracy of prediction markets, betting odds and tipsters. J Forecast 28(1):55–72MathSciNetGoogle Scholar
  51. 51.
    Stefani RT (1980) Improved least squares football, basketball, and soccer predictions. IEEE Trans Syst Man Cybernetics 10(2):116–123Google Scholar
  52. 52.
    Steinwart I, Christmann A (2008) Support vector machines. Springer, New YorkzbMATHGoogle Scholar
  53. 53.
    Stekler HO, Sendor D, Verlander R (2010) Issues in sports forecasting. Int J Forecast 26(3):606–621Google Scholar
  54. 54.
    Stübinger J (2019) Statistical arbitrage with optimal causal paths on high-frequency data of the S&P 500. Quant Finance 19(6):921–935MathSciNetGoogle Scholar
  55. 55.
    Stübinger J, Endres S (2018) Pairs trading with a mean-reverting jump-diffusion model on high-frequency data. Quant Finance 18(10):1735–1751MathSciNetzbMATHGoogle Scholar
  56. 56.
    Stübinger J, Knoll J (2018) Beat the bookmaker - Winning football bets with machine learning (Best Application Paper). In: proceedings of the 38th SGAI international conference on artificial intelligence, pp. 219–233. SpringerGoogle Scholar
  57. 57.
    Stübinger J, Mangold B, Krauss C (2018) Statistical arbitrage with vine copulas. Quanti Finance 18(11):1831–1849MathSciNetzbMATHGoogle Scholar
  58. 58.
    Tax N, Joustra Y (2015) Predicting the Dutch football competition using public data: a machine learning approach. Trans Knowl Data Eng 10(10):1–13Google Scholar
  59. 59.
    Trapletti A, Hornik K, Lebaron B (2007) Tseries: time series analysis and computational finance. R package version 0.10-11Google Scholar
  60. 60.
    Ulrich J (2016) TTR: technical trading rules. R packageGoogle Scholar
  61. 61.
    Wickham H, Bryan J (2016) readxl: Read Excel files. R package 1.0. 0. 2017Google Scholar
  62. 62.
    Wickham H, Francois R, Henry L, Müller K (2015) dplyr: a grammar of data manipulation. R package version 0.4, 3Google Scholar
  63. 63.
    Wickham H, Hester J, Francois R, Jylänki J, Jørgensen M (2017) readr: read rectangular text data. R foundation for statistical computing. R package version 1.1.1Google Scholar
  64. 64.
    Zeileis A (2006) Object-oriented computation of sandwich estimators. J Stat Softw 16(9):1–16Google Scholar
  65. 65.
    Zeileis A, Grothendieck G (2005) zoo: S3 infrastructure for regular and irregular time series. J Stat Softw 14(6):1–27Google Scholar
  66. 66.
    Zeileis A, Leitner C, Hornik K (2016) Predictive bookmaker consensus model for the UEFA Euro 2016. In: Working papers in economics and statisticsGoogle Scholar
  67. 67.
    Zeileis A, Leitner C, Hornik K (2018) Probabilistic forecasts for the 2018 FIFA World Cup based on the bookmaker consensus model. In: working papers in economics and statisticsGoogle Scholar
  68. 68.
    Zhou ZH (2012) Ensemble methods: foundations and algorithms. Chapman and Hall, Boca RatonGoogle Scholar

Copyright information

© Gesellschaft für Informatik e.V. and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Hochschule für Oekonomie and ManagementNurembergGermany
  2. 2.Friedrich-Alexander-Universität Erlangen-NürnbergNurembergGermany

Personalised recommendations