Soccer Competitiveness Using Shots on Target: Data Mining Approach

  • Neetu SinghEmail author
  • Apoorva Kanthwal
  • Prashant Bidhuri
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11589)


This paper presents the model for the competitiveness of soccer matches played in the top four European soccer leagues. Every soccer match in every league holds some importance and contributes towards the overall performance of the league compared to other leagues. These individual results constitute a single season. A lot of aspects of a team and a season are attributed to their final positions in the league. These positions, however, do not detail the competitiveness of a single match. This research aims to highlight the competitiveness in each match without any relation to how the season may have ended. A match gives out a lot of details towards how it was approached by a team. A win may not constitute competitiveness, but the approach does. The idea is to look at individual statistics of a match and use them to construct a model using SEMMA approach of data mining, that classifies the matches based on how competitive they were. This research constructs various models for classification as each model provides its own variant based on the different methodologies used in the individual models. Our analysis is mainly depended on, but not limited to, the number of attempted shots on goal and on the number of those shots that were on target. An important characteristic of the attempts on goals is that they are subjective to the performance of a team and its ability to try and secure a win in a match. This performance formulates competitiveness which is the basis of our research.


Competitiveness Data mining Shots on target Performance evaluation Misclassification rate 


  1. 1.
    Pawlowski, T., Christoph, B., Hovemann, A.: Top clubs’ performance and the competitive situation in European domestic football competitions. J. Sports Econ. 11(2), 186–202 (2010)CrossRefGoogle Scholar
  2. 2.
    Jessop, A.: A measure of competitiveness in leagues: a network approach. J. Oper. Res. Soc. 57(12), 1425–1434 (2006)CrossRefGoogle Scholar
  3. 3.
    Humphreys, B.R.: Alternative measures of competitive balance in sports leagues. J. Sports Econ. 3(2), 133–148 (2002)CrossRefGoogle Scholar
  4. 4.
    Criado, R., García, E., Pedroche, F., Romance, M.: A new method for comparing rankings through complex networks: model and analysis of competitiveness of major European soccer leagues. Chaos 23(4), 043114 (2013)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Owen, P.D.: Limitations of the relative standard deviation of win percentages for measuring competitive balance in sports leagues. Econ. Lett. 109(1), 38–41 (2010)CrossRefGoogle Scholar
  6. 6.
    Eckard, E.W.: The NCAA cartel and competitive balance in college football. Rev. Ind. Organ. 13(3), 347–369 (1998)CrossRefGoogle Scholar
  7. 7.
    Wibowo, C.P.: Clustering seasonal performances of soccer teams based on situational score line 1, vol. 1, no. 1, May 2016Google Scholar
  8. 8.
    Castellano, J., Casamichana, D., Lago, C.: The use of match statistics that discriminate between successful and unsuccessful soccer teams. J. Hum. Kinet. 31, 139–147 (2012)CrossRefGoogle Scholar
  9. 9.
    Brown, J.G.: Using a multiple imputation technique to merge data sets. Appl. Econ. Lett. 9(5), 311–314 (2002)CrossRefGoogle Scholar
  10. 10.
    Hellerstein, J.M.: Quantitative Data Cleaning for Large Databases. United Nations Economic Commission for Europe, February 2008Google Scholar
  11. 11.
    Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Appl. Artif. Intell. 17(5/6), 375 (2003)CrossRefGoogle Scholar
  12. 12.
    Refaat, M.: Steps of data preparation. In: Data Preparation for Data Mining Using SAS. Morgan Kaufmann, San Francisco (2007)CrossRefGoogle Scholar
  13. 13.
    Bagherzadeh-Khiabani, F., Ramezankhani, A., Azizi, F., Hadaegh, F., Steyerberg, E.W., Khalili, D.: A tutorial on variable selection for clinical prediction models: feature selection methods in data mining could improve the results. J. Clin. Epidemiol. 71(Supplement C), 76–85 (2016)CrossRefGoogle Scholar
  14. 14.
    Trappenberg, T., Ouyang, J., Back, A.: Input variable selection: mutual information and linear mixing measures. IEEE Trans. Knowl. Data Eng. 18(1), 37–46 (2006)CrossRefGoogle Scholar
  15. 15.
    Yoo, W., Mayberry, R., Bae, S., Singh, K., (Peter) He, Q., Lillard, J.W.: A study of effects of multicollinearity in the multivariable analysis. Int. J. Appl. Sci. Technol. 4(5), 9–19 (2014)Google Scholar
  16. 16.
    Schmueli, G., Bruce, P.C., Patel, N.R.: Data Mining for Business Analytics, Third. Wiley, Hoboken (2016)Google Scholar
  17. 17.
    Asheibi, A., Stirling, D., Sutanto, D.: Analyzing harmonic monitoring data using supervised and unsupervised learning. IEEE Trans. Power Delivery 24(1), 293–301 (2009)CrossRefGoogle Scholar
  18. 18.
    Baxter, M.J.: A review of supervised and unsupervised pattern recognition in archaeometry. Archaeometry 48(4), 671–694 (2006)CrossRefGoogle Scholar
  19. 19.
    Stoltzfus, J.C.: Logistic regression: a brief primer. Acad. Emerg. Med. 18(10), 1099–1104 (2011)CrossRefGoogle Scholar
  20. 20.
    Boritz, J.E., Kennedy, D.B., De Miranda e Albuquerque, A.: Predicting corporate failure using a neural network approach. Int. J. Intell. Syst. Account. Finan. Manag. 4(2), 95–111 (1995)CrossRefGoogle Scholar
  21. 21.
    Ince, H., Aktan, B.: A comparison of data mining techniques for credit scoring in banking: a managerial perspective. J. Bus. Econ. Manag. 10(3), 233–240 (2009)CrossRefGoogle Scholar
  22. 22.
    Tsai, C.-F., Chiou, Y.-J.: Earnings management prediction: a pilot study of combining neural networks and decision trees. Expert Syst. Appl. 36(3), 7183–7191 (2009). Part 2CrossRefGoogle Scholar
  23. 23.
    Barron, B.A.: The effects of misclassification on the estimation of relative risk. Biometrics 33(2), 414–418 (1977)CrossRefGoogle Scholar
  24. 24.
    Kayhan, V.O.: SAS Enterprise Miner Exercise and Assignment Handbook for Higher Education, Second. Valor Onur Kayhan (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Neetu Singh
    • 1
    Email author
  • Apoorva Kanthwal
    • 2
  • Prashant Bidhuri
    • 3
  1. 1.University of Illinois at SpringfieldSpringfieldUSA
  2. 2.SEI InvestmentsOaksUSA
  3. 3.Enterprise Cloud SolutionsNew YorkUSA

Personalised recommendations