Keywords

1 Introduction

In Colombia, there are legally established institutions for the promotion and defense of consumer rights. Article 75 of the Consumer Statute, established the National Consumer Protection Network, consisting of multiple entities such as the Consumer’s Protection Councils, the Superintendence of Industry and Commerce, the Superintendence of Home Public Services, the Financial Superintendence, the National Health Superintendence, the Ports and Transportation Superintendence, the National Institute for Surveillance of Medicines and Foods, the National Television Commission, the Civil Aeronautics, the Mayoralties, and the consumer leagues [1].

For its part, inflation in Colombia has not been particularly innocuous to the agricultural sector. It has been assigned the dual character of victim and propagator of inflation [2]. Victim in several ways [3]: by credit constraints that arise as a control measure of inflation; in the elevation of both wages and the cost of modern inputs, which restricts their use and therefore affects productivity. In turn, the joint effect of these factors, tends to reduce the profitability of agriculture but, on the other hand, as an activity characterized by a composition of most the fixed assets such as land, inflation tends to significantly enhance the heritage of all producers.

There are traditional models that are probabilistic, deterministic, and hybrids, such as [4, 5]: Simple Moving Average, Weighted Moving Average, Exponential Smoothing, Regression Analysis, Box-Jenkins method (ARIMA), trend projections, etc., which have been used to generate forecasts, providing certain advantages and disadvantages compared to the others. However, these models are yet unable to offer good results in an environment of high uncertainty and constant changes. To this end, new paradigms based on numeric modeling of nonlinear systems are necessary, such as the Artificial Neural Networks (ANN), and the Support Vector Regression (SVR).

The present study proposes to Multi-Layer Perceptron with Multiple-Input Multiple-Output (MLP and MIMO) as a model for agricultural products price prediction together with the variation coefficient of the price reference from Colombian State as the warning level criterion, as well as macro and micro economic variables. Due to the agricultural nature of the product, the seasonal factor is integrated.

2 Theoretical Review

2.1 Artificial Neural Networks

Artificial Neural Networks (ANNs) can learn from data and can be used to construct reasonable input-output mapping, with no prior assumptions are made on the statistical model of the input data [6]. ANNs have nonlinear modeling capability with a data-driven approach so that the model is adaptively formed based on the features presented from the data [7].

An introduction to ANNs model specifications and implementation and their approximation properties has been provided from an econometric perspective [8]. Numerous studies have shown that ANNs can solve a variety of challenging computational problems, such as pattern classification, clustering or categorization, function approximation, prediction or forecasting, optimization (travelling salesman problem), retrieval by content, and control [9].

Some studies of ANN application related to financial early warning models have been conducted by [10] as well as [11] who used ANN as a classifier with a categorical output. Other authors used ANNs as financial forecasting models with continuous value. Some of them are [12] as well as [13], who implemented ANNs with a single-step prediction output. A previous study on ANNs forecasting model was also proposed by [14] for a multi-step prediction with a direct strategy, so the number of models is equal to the number of the prediction horizon. In the context of basic commodities price, the need for prediction is not limited to one-step forward but could be extended to include multi-step ahead predictions. Three strategies to tackle the multi-step forecasting problem can be considered, namely recursive, direct, and multiple output strategies [15]. The Multiple Input Multiple Output (MIMO) techniques train a single prediction model f that produces vector outputs of future prediction values.

The present study proposes to Multi-Layer Perceptron with Multiple Input and Multiple Output (MLP-MIMO) as to agricultural products price prediction model coupled with the coefficient of variation from the Colombian state price reference to the criteria of warning level.

2.2 Consideration of Microeconomic and Macroeconomic Indicators

The correct evaluation and interpretation of macroeconomic indicators is essential for any country, because they help to make decisions on fiscal or monetary policy and are signals that gives the market for the economic agents to take precautions. Likewise, the economic indicators are a way to predict and anticipate the phenomena, that is why it will be implemented in the study of the main economic indicators of Colombia. According to [16], the following elements are considered the main economic indicators:

  • Gross Domestic Product

  • The consumer price index

  • Producer Price Index

  • Employment indicators

  • Retail Sales

  • Consumer Confidence

2.3 Garson’s Algorithm to Determine the Level of Importance

Garson’s algorithm was developed to determine the degree or level of importance of an entry indicator in an ANN. In many cases related to the measurement of the variables, the weights in the hidden layer and their interactions in the output network are considered. A measure proposed by Garson [17] consists of dividing the weights of the hidden layer into components associated with each input node and then assigning each of them a percentage of the total weights.

Several studies show the effectiveness of the Garson algorithm to evaluate the importance of an entry in the RNA [18, 19, 20]. The certainty of the algorithm of Garson was experimentally determined, concluding that the measure is applied successfully under a wide variety of conditions. As a result of this analysis, the Garson´s algorithm, on a scale from 0 to 1, determines a unique value for each explanatory variable that describes the relationship with the response variable in the model.

3 Material and Methods

3.1 Data

In this study, the raw data were obtained from the National Administrative Department of Statistics of Colombia (DANE – Departamento Administrativo Nacional de Estadística), which provided a sales database of 115,022 farmers from the main agricultural regions of Colombia in the time period from 2015 to 2017 [21]. The macroeconomic variables considered in this study range from food inflation, GDP, employment rate, minimum wage to commercial balance and capital flow of the nation [22]. Internal factors such as demand, and substitute and complementary products were also analyzed. Seasonal factors were incorporated to adjust the forecasts obtained [23].

3.2 Methods

The used method is summarized as follows. For the selection of products, an order is emitted according to the f1-score harmonic average [24], which is defined as the reciprocal of the arithmetic mean of the reciprocals. This value is used to determine the average of variations with respect to time. It is mostly used to find the average values of efficiency and error. This study allows to obtain a factor considering that both sales and quantity are important. In total, 10 products were selected, Eq. (1) [25].

$$ f1 = 2 * \frac{Q * V}{Q + V} $$
(1)

Where:

  • F1 = harmonic mean f1-score

  • Q = quantity sold in units

  • V = sales in dollars.

With the sample of products to be considered, the interactivity with other products is analyzed to select the substitutes and complementary products. A quantitative alternative for selecting substitute and complementary products is the cross-elasticity of demand, which analyzes two products and, according to their sign, establishes whether it is substitute (+) or complementary (−). Then, the resulting maximum or minimum value represents greater interaction between both products, that is, the maximum positive value is chosen for substitute, and the maximum negative value for complementary. Cross-elasticity is defined as the proportional variation in the demanded quantity for a good or service caused by a proportional variation in the price of other goods [24].

The early warning model consists of three main components, namely preprocessing, predictive model, and post-processing, as depicted in Fig. 1 [26].

Fig. 1.
figure 1

Model of the early warning system [26].

Preprocessing: Before all raw data about commodity prices are presented to the predictive model, the preprocessing operations are applied on the data. The price surveys were conducted by local government at working days, so the commodity price data represent a daily basis with missing values in weekends and holidays. The data are therefore added to weekly data with the mean function to reduce the volume of the data for computational efficiency [26].

Predictive models: The predictive models are built from the trained final MLP-MIMO models obtained from the parameter tuning. Every single commodity in every city has its own model parameter structure. The weights after training are stored in a weight database so the model can be reloaded at any time [7].

Post-processing: The output of the predictive model is a normalized price prediction for eight weeks ahead. The post-processing is responsible for denormalizing the predicted price and determining early warning status based on the maximum predicted price according to Section ‘Price Spike and Early Warning Status Leveling’. An alert will be sent to the stakeholders when the price is above a given threshold (on status ‘watch’ or ‘monitoring’) by an email service [10].

Finally, the Garson’s algorithm for determining the level of importance was developed to determine the degree or level of importance of an input indicator in ANN. In many cases related to the measurement of the variables, the weights in the hidden layer and their interactions in the output network are considered. A measure proposed by [27] consists of dividing the weights of the hidden layer into components linked to each input node and then assigning each of them a percentage of the total weights.

4 Results and Discussion

4.1 Macroeconomic Indicators Considered

The indicators described in Tables 1 and 2 are the result of the availability analysis and correlation. In the first column, CODE indicates the codes that were used to facilitate the manipulation of data and related tables. In column two, INDICATOR, there are both microeconomic and macroeconomic indicators. Its value was analyzed with the units represented in parentheses. Because of the little space, the most important indicators out of the 25 macroeconomic resulting indicators are shown.

Table 1. Internal indicators used [23]
Table 2. External indicators used [22].

4.2 Main Products Considered

To select the products on which the forecasts were made, the f1-score criterion is used. The ordering by this factor considered both the quantity and the value of sales for the selection of the most important products. Table 3 presents the values of the f1-score factor for each selected product.

Table 3. Prioritization and selection of products

4.3 Model of the Early Warning System

According to [16], the increase in the price is considered normal when it is below a certain threshold. The threshold is derived from the government reference price established by the Ministry of Commerce and the variation coefficient noted (CVtarget). There are four degrees of warning status: normal, advisory, monitoring, and warning, whose criteria are presented in Table 4.

Table 4. Levels of warning status and their criteria.

According to [16], these levels should be adjusted according to the seasons of rain and drought in Colombia by a factor of 0,952 and 1,025 respectively.

The reference price of each product and the threshold for each warning status used in this study are presented in Table 5, according to [16] together with DANE [21].

Table 5. Reference price, interval price increase, and the levels of early warning status.

All models are trained using backpropagation algorithm with learning rate = 0.9 according to Eq. (2). The log-sigmoid function is used at the hidden layer(s) and the linear/purelin function is used at the output layer [27].

$$ W\left( {t + 1} \right) = W\left( t \right) - \eta \frac{{\partial E_{av} }}{\partial W} $$
(2)

Where W is the weights vector, t is the iteration number, η is the learning rate, is the partial derivative.

The combination of two activation functions gives a minimum average of mean squared error (MSE) on the validation set and a minimum training epoch compared to any other combinations. In some cases of the training in the cross-validation, the optimum learning rate value that gives a minimum of epoch training without increasing the MSE is 0.6, 0.7, or 0.8, but, in most cases, learning rate = 0.9 gives the minimum epoch, especially in the last fold of the cross validation.

The last fold uses 5/6 data training (hundreds of data), which takes the longest training time among all folds of the cross-validation. Therefore, we determine to use 0.9 as the learning rate for all the training process. All training processes are stopped when the MSE training threshold is reached.

Figure 2 shows a graph that displays the performance of the prediction of the final model in the validation set. The next example is the case of the price prediction for the coffee product in Colombia at time t + 1 (next week) and t + 8 (eight weeks ahead).

Fig. 2.
figure 2

The final model’s prediction performance. (a) time t + 1 (next week). (b) t + 8 (eight weeks ahead).

4.4 Importance of Macroeconomic Indicators

Table 6 presents the level of importance that was obtained in the RNA with the Garson’s algorithm. The coefficient shown in the last column presents the degree of importance that generates each indicator, it should be emphasized that a different coefficient is derived for each product in the same indicator, so an average Garson’s coefficient is shown among the 10 products.

Table 6. Summary of most important indicators.

According to the Garzon’s coefficient, the most relevant internal variables are the price and year, while in the macroeconomic policies are the foreign investment and the range of corruption, due to that both destabilize the price of the dollar.

5 Conclusions

In this paper, an early warning method is proposed for agricultural products price spikes based on artificial neural network prediction. The method is favorable for modeling basic commodity price and may help the State to predict the price spikes for some specific commodities on some specific regions to determine further anticipatory policies.

Agricultural products are analyzed obtaining the following 10 relevant products: Potato, coffee, cocoa, tomato, onion, lettuce, paprika, coriander, parsley and lemon. The early warning system can be adjusted according to seasonality (drought or rain) by multiplying the ranges by the respective coefficients of seasonality. Micro and macro variables were evaluated according to their importance in the affectation of the prices according to the Garzon’s coefficient.