Keywords

1 Introduction

Obesity in general is a growing epidemic that affects every age group. In the UK alone, the economic impact of dealing with obesity and related illnesses is almost same as the other major issues affecting the economy such as smoking and armed conflict as reported in a publication by McKinsey Global Institute on Overcoming Obesity [1]. The report discusses that more than 30% of the world population is either overweight or obese. If this growth rate prevails, the proportion of people in overweight and obese category is likely to rise up to 41% by the year 2030. The report concludes by recommending behaviour change interventions which can result in saving money in the long run as a result of reduced healthcare and increase in productivity levels. Implementation of this type of intervention could save almost £1 billion for the National Health Service (NHS) in the United Kingdom alone. The evidence suggests that the behavioural change interventions to combat obesity need further investigation to find workable solutions rather than waiting for a perfect solution. In the United Kingdom, the data presented in Health Survey for England reports that the percentage of obese children between the age of 2 and 15 has increased significantly since 1995 [2]. 16% of boys and 15% of girls in this age group were classed as obese. 14% of both genders were classed as overweight. This results in 30% of boys and 29% of girls being either overweight or obese. In 1995, 11% of boys and 12% of girls of 2–15 years of age were obese. A range of studies have however been monitoring the trend in obesity growth. One in specific included data from review included data from 467,294 children from Australia, China, England, France, Netherlands, New Zealand, Sweden, Switzerland and USA [3]. It was reported that the prevalence of childhood obesity may be plateauing worldwide. Another survey identified 52 obesity studies worldwide from 25 countries and reported some stability in obesity [4]. Similar claims were made by [5] and [6]. Some reports even suggest that childhood obesity may well be declining as a cumulative result of increased physical activity, television viewing decline and reduction in sugary drink consumption [7]. However, [4] also reported that this stability should be observed with caution since previous stable phases were followed by further increases in prevalence of obesity.

The Health Survey England (HSE) series was established to keep an eye on changes in nation’s health [8]. The survey was designed to acquire information about certain health conditions and other risk factors affecting them. It was reported that in 2013, 26% of men and 24% of women were obese. 41% of men and 33% of women were overweight but not obese. Looking at these combined figures gives cause for concern since 67% of men and 57% of women are above their normal weight for their height.

Childhood obesity is of great public concern as up to 90% of the overweight and obese childhood population will continue to be obese as adults [9]. Obesity is strongly linked with other negative health conditions such as type 2 diabetes, cardiovascular diseases, cancers and even death [10,11,12,13]. Considering all this, it is obvious that there is a pressing need to identify individuals at a risk of developing obesity as early as possible so that some preventative measures can be put in place as early as possible. The purpose of this study is to carry out apply, analyse and evaluate machine learning algorithms to classify adolescents at a risk of becoming overweight or obese using early childhood BMI as input features to machine learning. The BMI in adults is defined as a ratio of body mass in kilograms to the square of individual’s height in meters. In adults, BMIs of over 25 kg/m2 and 30 kg/m2 are classed as overweight and obese respectively. This formula cannot be applied to children and adolescents since their body mass index changes significantly with age [14]. It varies between 13 kg/m2 and 17 kg/m2 from birth to an age of 1. It then decreases to 15.5 kg/m2 at the age of 6 and increases to 21 kg/m2 at the age of 20. Therefore, normal overweight and obesity thresholds can’t be applied to children and adolescents. For this reason, this study uses overweight and obesity cut-off points recommended by International Obesity Task Force (IOTF). This paper is organised as follows:

  • Related Work: This section discusses the application of machine learning techniques in health and approaches tackling obesity.

  • Methods: This section deals with the data preprocessing, application of ML algorithms for the classification of imbalanced data, techniques used for the treatment of data imbalance and the evaluation of suitable algorithms for improved prediction accuracy.

  • Discussion: Performance of ML algorithms and their effect on prediction accuracy are discussed.

  • Conclusion and further work: The paper concludes the work carried out and the challenges experienced. Further work to build on this research is discussed.

2 Related Work

2.1 Machine Learning Applications in Health

A range of machine learning algorithms have been applied in health domains to predict the presence certain health conditions using a number of characteristic features. Artificial Neural Networks (ANNs) are a commonly used branch of Machine Learning (ML) methods that are used to correlate input parameters to corresponding output data. Numerous examples of application of ANNs in medical applications and engineering systems have been reported with a varying degree of success. A survey of over 300 recent contributions in applying Deep Learning (DL) and Convolutional Neural Networks (CNNs) to predict health risks/disease a set of medical images highlights the scale of ML applications in medical field [15]. A large number of examples of automated disease diagnosis using computational intelligence have been reported over the last decade or so [16,17,18]. ANNs have also been applied in various non-linear problem-solving scenarios including robotic decision making, swarm intelligence, aviation application and Artificial Intelligence (AI) in games [19]. Identification of diseases associated with the brain activity such as Parkinson’s, Schizophrenia, and Huntington’s disease from the Contingent Negative Variation (CNV) response in electroencephalograph has also been successfully implemented using ANNs [20]. Multilayer perceptron (MLP) ‬and probabilistic neural network (PNN) have been utilized for the prediction of osteoporosis ‬with bone densitometry [21]. A combination of several ML algorithms has also been successfully implemented to predict accurate amount of medication dosage required for patients suffering from sickle cell disease [22]. Administering accurate amount of medication based on patient’s condition is of paramount importance. In this study, a range of combined machine learning techniques were investigated to accurately predict the correct amount of medication required. The algorithms investigated included Random Forests, Support Vector Machines and a few variants of Neural Networks. It was reported that a combination of Multilayer Perceptron Neural Networks trained with Levenberg-Marquardt algorithm and Random Forests provided the best results.‬ ‬‬‬‬‬This work builds on authors’ earlier research where regression was used to predict age 14 BMI using the BMI values from earlier childhood [23].

2.2 Machine Learning Techniques for Tackling Obesity

Only a small number of studies conducted to predict childhood obesity were identified through literature searches. This is possibly due to the complex nature of the problem. Hence, it is important to utilise some complex prediction algorithm using machine learning methods or even ensemble of ensemble predictors to implement a robust and accurate prediction system rather than using simpler techniques such as linear regression or other statistical methods [24]. It is proposed that such a system should be able to:

  • at least match human decision maker’s behaviour

  • be sufficiently general and handle a wide range of variability

  • be applicable in practical situations with a proven degree of success

Reference [25] applied machine learning techniques to data collected from children before the age of 2 to predict future obesity. In this study, they used data collected on children prior to the second birthday using a clinical support system. They reported an accuracy of 85%, sensitivity of 89%, positive predictive value of 84% and negative predictive value of 88% using the ID3 algorithm for decision trees without pruning. The other algorithms tested were Naïve Bayes, Random Trees, Random Forests, C4.5 decision trees with pruning and Bayes Net. Several other studies also reported the use of machine learning algorithms used for predicting obesity. Reference [26] suggested that Radial Basis ANNs (RBANNs) are far more efficient than classical Back Propagation ANNs (BPANNs) but very large datasets would be required to train such systems. This study discussed algorithms only; results were not reported. Reference [27] discussed several algorithms for predicting childhood obesity. They recommended the suitability of ANNs, Naïve Bayes and Decision Trees. Several optimisation techniques have also been applied to achieve better prediction accuracy. For example, Genetic Algorithms were employed by [28] to improve the prediction accuracy to 92%. However, it must be noted that they used a very small sample size of 12 subjects. A comprehensive study, possibly the best one identified so far, was carried out by [29] to apply machine learning techniques to predict childhood obesity. They compared the performance metrics of several machine learning prediction algorithms. They compared logistic regression with six data mining techniques: Decision Trees, Association Rules, Neural Networks, Naïve Bayes, Bayesian Networks and Support Vector Machines. They considered prediction sensitivity the most important element in predicting obesity for their study. The highest reported sensitivity for their work was 62% in the case of Naïve Bayes and Bayesian Networks. This research group used a limited range of demographics (gender) and biometrics (weight, height and BMI) and the subjects were 2 year old children. It is envisaged that the prediction accuracy can be further improved by using a different set of parameters, using big data and other machine learning techniques such as deep learning to handle big data. Reference [30] applied machine learning techniques to measure and monitor physical activity in children. They evaluated Multilayer Perceptrons (MLPs), Support Vector Machines, Decision Trees, Naïve Bayes and K = 3 Nearest Neighbour algorithms. It was reported that MLPs outperformed all the other algorithms yielding an overall accuracy of 96%, sensitivity of 95% and specificity of 99%. It should be noted that the sample size in this case was also relatively small (22 participants). The investigation of Deep Learning techniques for future work was proposed.

3 Methods

3.1 Data

The data for this study are used from UK’s Millennium Cohort Study (MCS) [31]. The MCS is an ESRC funded research project that followed every child born in year 2000 and 2001. This study is the most recent of Britain’s well-known study that followed national longitudinal birth cohorts. The very first MCS sweep surveyed 18,818 babies and 18,552 families related to those children. This wave was conducted in 2001 and 2002 when the babies were 9 months old. Subsequent sweeps were carried out at the age of 3 years, 5 years, 7 years, 11 years and 14 years to examine each child’s growth as shown in Table 1:

Table 1. Millennium Cohort Study survey sweeps.

3.2 Classification Challenge

As discussed in the Introduction section, the BMI in adults is defined as a ratio of body mass in kilograms to the square of individual’s height in meters. There are predetermined BMI values setting the boundaries for normal, overweight and obese levels. In children and adolescents, these levels are calculated differently and vary with their age and gender. The data for each survey sweep were collected over the course of approximately 12 months. This means that the children in a given sweep having exact same BMI will have different obesity levels depending upon their age and gender. This makes the classification problem significantly challenging since the target obesity levels at the age of 14 don’t have linear relationship with the earlier measurements.

3.3 Data Preprocessing

A vast amount of data about children, their raising families and physical surroundings was collected including weight, height, obesity flags, socio economic conditions and indices of multiple deprivations. For the purpose of this study, body mass index values from 3 years, 5 years, 7 years and 11 years of age were used as input features to the machine learning algorithm and obesity flags for age 14 such as Normal and At risk (Overweight and Obese) were used as target variables as show in Fig. 1. The obesity flags for 14 year old subjects are determined by the following factors:

Fig. 1.
figure 1

Machine learning based obesity prediction model

  • Age at the time of survey

  • Weight in kilograms and height in meters and

  • Gender

These obesity flags are available in the dataset and are determined using growth charts published by the IOTF.

The dataset has a large number of missing values and outliers. There are 11110 non missing values for the obesity flag for age 14 (MCS6) survey. Since the obesity flag for the age 14 survey is being used as a target variable, imputing the missing values for this variable are not going to be useful for this study. A reliable predictor variable is required to implement a robust machine learning algorithm. Therefore, to be able to use reliable data, instances of missing values for the age 14 obesity flag are deleted. As it can be seen from the count field of Table 2, a large number of data values are still missing for the input variables.

Table 2. Missing values

The dataset analysis also indicates that the minimum and maximum data values are significantly higher than the first and third quartiles, highlighting the presence of outlier data points as shown in Table 3.

As highlighted earlier, the dataset has a considerably large number of outliers which need to be identified and dealt with in order to produce meaningful results. One way of visualising and estimating outliers is to use boxplots for all of the input variables. By choosing the boxplot whisker at an appropriate level, the data points that fall beyond these points can be identified as outliers. A boxplot diagram for the input variables BMI2, BMI3, BMI4 and BMI5 is shown in Fig. 2.

Fig. 2.
figure 2

Boxplot of uncleaned dataset

All data values beyond first and third quartile are identified and imputed using the mean value of each of the input variable. The boxplot for the cleaned data is shown in Fig. 3:

Fig. 3.
figure 3

Boxplot of cleaned dataset

3.4 Machine Learning Algorithm Implementation

The dataset is composed of 11110 instances and has three classes based on the obesity label. There are 8160 normal cases, 2126 overweight and 824 obese. Majority of the instances falling under the normal category, 19% under overweight and only just over 7% belonging to the obese category, makes the dataset highly imbalanced. As overweight and obese categories are both at risk, therefore instances belonging to these two classes have been combined and have been labeled as ‘At risk’. Although this reduces the imbalance a little, but majority of the algorithms are only able to classify the majority class with a high degree of accuracy.

A range of popular classification algorithms were employed to classify adolescents of age 14 at a risk of becoming overweight or obese using BMIs measured at ages 3, 5, 7 and 11 years. The results from the best performing algorithms are tabulated in Table 4. Although the overall accuracy is well over 80% for all of the algorithms used, because of the class and BMI imbalance issues, the individual class accuracy for the At risk class is still very poor as indicated by the recall metric shown in Table 4. For example, the weighted average for recall in the case of J48 decision tree algorithm is 85% and the normal class has a significantly higher accuracy of 94%, the individual class accuracy for the ‘At risk’ class is only 62%.

Table 3. Description of uncleaned data

Apparently, higher sensitivity or recall score for the minority class is of paramount importance in order to detect individuals at the risk of becoming overweight or obese as early as possible.

3.5 Proposed Approach

As discussed above, it is important to classify the minority class with a high degree of accuracy. To deal with the this class imbalance issue, it was proposed to synthesise new class instances for the minority class using SMOTE [32] and train ML algorithms on synthetically increased data. SMOTE generates additional data points using KNN approach by inserting synthetic data points on lines joining K nearest neighbours as shown in Fig. 4.

Fig. 4.
figure 4

Generating synthetic data points using SMOTE

A SMOTE oversampling of 138% was applied to the minority class to make the number of cases in the minority class equal to the number of cases in the majority class. However, for robustness, a trained network must be tested on an original, imbalanced dataset. For this reason, 30% sample of the original imbalanced data was kept for testing the ML algorithm accuracy. The same machine learning algorithms used in the case of imbalanced data classification were used to classify synthetically balanced dataset. The precision and recall score metrics are shown in Table 5 and plotted in Fig. 5.

Table 4. Classification results of imbalanced data.
Table 5. Classification results of balanced data.
Fig. 5.
figure 5

Precision and recall plot of Normal and At Risk classes

The recall metric shows a significant increase for the ‘At risk’ class accuracy. Although the normal class accuracy has decreased somewhat, but the increase in the minority class accuracy outweighs this loss since detecting individuals at a risk of becoming overweight or obese is far more important than misclassifying a normal individual.

The recall metric is comparison is plotted in Fig. 6 as a bar graph where the output of each algorithm is plotted before and after oversampling the minority class.

Fig. 6.
figure 6

Comparison of minority class accuracy for imbalanced and balanced datasets

4 Discussion

It is clear from Table 5 and Figs. 5 and 6 that synthetically enhancing the minority class significantly increases the sensitivity or recall metric for the minority class. Some of the algorithms that didn’t perform satisfactorily in the case of imbalanced data, their performance is also improved significantly when the dataset is synthetically balanced. For example, the Multi-Layer Perceptron (MLP) resulted in a minority class accuracy of 54% when the dataset was imbalanced but jumped to 92% in the case of balanced data. It should be noted that the performance of all the algorithms was tested on a sample of unaltered, imbalanced dataset. This dataset had 2448 in the normal class and 885 in the At Risk class. The precision value for the normal class for the MLP algorithm is 96%. This means, out of all those predicted as ‘Normal’, only 4% were predicted as ‘At Risk’. A recall value of 92% means that out of all those labeled as ‘At Risk’, only 8% were predicted as ‘Normal’. This results in an F1 score of 93.96%.

5 Conclusion and Further Work

Obesity in general but more importantly childhood obesity is of great global concern since a majority of obese children grow up to obese adults. Many studies have employed statistical methods to predict the probabilities of children growing into adults as obese. In this paper we evaluated several popular machine learning algorithms to accurately predict adolescents at risk of becoming overweight or obese at the teenage stage. Issues relating to low prediction accuracy because of data imbalance have been considered and dealt with using Synthetic Minority Oversampling Technique (SMOTE). An ensemble of classifiers capitalises on the prediction accuracies of individual classifier algorithms. For robustness, a sample of unaltered data is used to test the prediction accuracies. The results obtained are encouraging and the authors are continuing to develop further on this research. Future work is planned to include longitudinal and cross sectional data relating to participants to achieve even higher accuracies. It is also proposed to use predict obesity state at the age of 14 using the obesity flags from earlier ages since the BMI is age and gender dependent for ages from 2 to 20.