A Multivariate and Multi-step Ahead Machine Learning Approach to Traditional and Cryptocurrencies Volatility Forecasting

De Stefani, Jacopo; Caelen, Olivier; Hattab, Dalila; Le Borgne, Yann-Aël; Bontempi, Gianluca

doi:10.1007/978-3-030-13463-1_1

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11054))

Included in the following conference series:

764 Accesses
3 Citations

Abstract

Multivariate time series forecasting involves the learning of historical multivariate information in order to predict the future values of several quantities of interests, accounting for interdependencies among them. In finance, several of this quantities of interests (stock valuations, return, volatility) have been shown to be mutually influencing each other, making the prediction of such quantities a difficult task, especially while dealing with an high number of variables and multiple horizons in the future. Here we propose a machine learning based framework, the DFML, based on the Dynamic Factor Model, to first perform a dimensionality reduction and then perform a multiple step ahead forecasting of a reduced number of components. Finally, the components are transformed again into an high dimensional space, providing the desired forecast. Our results, comparing the DFML with several state of the art techniques from different domanins (PLS, RNN, LSTM, DFM), on both traditional stock markets and cryptocurrencies market and for different families of volatility proxies show that the DFML outperforms the concurrent methods, especially for longer horizons. We conclude by explaining how we wish to further improve the performances of the framework, both in terms of accuracy and computational efficiency.

G. Bontempi acknowledges the support of the INNOVIRIS SecurIT project BruFence: Scalable machine learning for automating defense system. J. De Stefani acknowledges the support of the ULB-Worldline Collaboration Agreement. Computational resources have been provided by the Consortium des Équipements de Calcul Intensif (CÉCI), funded by the Fonds de la Recherche Scientifique de Belgique (F.R.S.-FNRS) under Grant No. 2.5020.11.

You have full access to this open access chapter, Download conference paper PDF

Artificial Neural Networks for Realized Volatility Prediction in Cryptocurrency Time Series

Cryptocurrency Volatility Index: An Efficient Way to Predict the Future CVI

Forecasting volatility with machine learning and rough volatility: example from the crypto-winter

Article 24 March 2024

Keywords

1 Introduction

The problem of time series forecasting, in its simplest form, deals with the prediction of a given quantity of interest in the future, given its historical values. Moreover, one could be interested in forecasting the immediate next value in the future (one-step-ahead forecasting) as well as being concerned with the estimation of a sequence of future values (multi-step-ahead forecasting). In a similar fashion, the problem might involve a single quantity (univariate forecasting), or several quantities at once (multivariate forecasting), in order to exploit potential interrelationships among them. In the context of finance, specific quantities of interest are: the stock price of a given company over time, its returns or the intensity of the fluctuations affecting the price (i.e. its volatility), among others. Specifically, in the case of stock markets, the underlying trend of the market influences all the stocks that are currently traded. As shown in [18], stock prices of firms acting on the same market often show similar patterns in the sequel of news that are important for the entire market. Moreover, analyzing global volatility transmission, Engle et al. [12] found evidence supporting volatility interdependence among the world’s major trading areas. For these reasons, while modeling these time dependent quantities of interest, a multivariate model appears to be a natural choice to incorporate interdependencies into the forecasting process.

Among all the quantities of interest, in the following, we will focus on the problem of multivariate volatility forecasting. In this specific case, the quantity of interest is a latent variable, which cannot be directly observed given the time series, but only estimated, according to the granularity and the type of the available data, through different measures, named volatility proxies [27]. According to the choice of the proxy, several approaches have been proposed to tackle this multivariate problem. The largest body of the volatility forecasting literature focus on multivariate extensions of the well known GARCH [2] on traditional stock market data, for instance, citing some recently published work: [13] and [3]. For a thorough review of the different univariate and multivariate methods, we refer the interested reader to the latter. Due to the steadily growth of the cryptocurrencies market capitalization [11], coupled with the currencies’ volatility, GARCH-like models [7, 32] have also been applied for non-traditional markets. The main problem of these approaches is that traditional multivariate models often suffers from the “curse of dimensionality”: as the number of dimensions increase, the number of parameters grows superlinearly in the number of dimensions, making the model estimation heavily computationally intensive, especially in the case of multiple step ahead forecasts.

In order to profit from the richness of a multivariate model, while maintaining a reasonable computational complexity, we propose to employ the DFML [4], a multivariate, multistep-ahead machine learning forecasting framework involving a dimensionality compression process, based on the dynamic factor model (DFM) principle [14]. The choice of this generic time series forecasting framework requires the usage of model-independent volatility proxies which will be discussed in Sect. 3, requiring us to dismiss GARCH as a proxy of volatility, due to his tight coupling between the proxy and the corresponding forecasting model to use, as discussed in [8].

At the time of writing, we had been able to find either multivariate techniques dealing with the forecasting of either cryptocurrencies prices [1, 6] or univariate techniques dealing with the forecasting of volatility either with a one-step ahead [7, 32] or multistep-ahead [10]. Nevertheless, we are not aware of any other work tackling both the problems of multivariate and multi-step ahead cryptocurrencies volatility forecasting, specifically in the case of large dimensionality and a reduced number of datapoints. Our technique will then be tested on two different benchmarks: one concerning cryptocurrencies and a second one, concerning a traditional regulated stock market (CAC40) being a de facto multivariate extension of [25].

The rest of the paper is structured as follows: Sect. 2 provides an oveview of the Dynamic Factor Machine Learner approach. Section 3 introduces the different tested multivariate models as well as the considered datasets and the formulation of the relevant forecast quantities. Section 4 concludes the paper with a discussion of the results and the future research directions.

2 Dynamic Factor Machine Learner

A Dynamic Factor Model (DFM) is a technique for multivariate forecasting originating in the economic forecasting community [14]. The basic idea of DFM is that a small number of unobserved series (or factors) can account for the temporal behavior of a much larger number of variables. If we are able to obtain accurate estimates of these factors, the forecasting endeavor can be made simpler by using the estimated dynamic factors for forecasting instead of using all series. In equations:

$$\begin{aligned} \mathbf {Y}_{t+1}&= \mathbf {W} \mathbf {Z}_{t+1} + \epsilon _{t+1} \end{aligned}$$

(1)

$$\begin{aligned} \mathbf {Z}_{t+1}&= \mathbf {A}_{t} \mathbf {Z}_{t}+\dots + \mathbf {A}_{t-m+1} \mathbf {Z}_{t-m+1} +\eta _{t+1} \end{aligned}$$

(2)

where $\mathbf {Y}_t$ is a multivariate time series vector at time t, $\mathbf {Z}_t$ is the vector of unobserved factors of size q ($q\,\ll \,n$), $\mathbf {A}_i$ are $q \times q$ coefficient matrices, $\mathbf {W}$ is the matrix $(n \times q)$ of dynamic factor loadings and the vectors of disturbances terms $\eta $ are assumed to be uncorrelated. As shown in Eq. 2, in the original DFM, the latent factors follow a VAR time series process. For a detailed review of DFM models, the interested reader could refer to [28].

Here, we propose to employ a machine learning extension of the DFM (called DFML - Dynamic Factor Machine Learner). The DFML, first proposed by Bontempi et al. [4] and further discussed in [9], relies on dimensionality reduction techniques to extract the factors. Then, the factors are forecast using a nonlinear model. Finally, the forecasts of the factors are transformed back to the original values by inverting the dimensionality reduction process. The basic architecture of the DFML is depicted in Fig. 1, along with the description of the different variants. Concerning dimensionality reduction, both linear (PCA) and nonlinear (autoencoder) techniques are employed in the DFML. Linear dimensionality reduction by PCA transforms the n original time series $\mathbf {Y}[1]$, $\dots $, $\mathbf {Y}[n]$ into q new variables $\mathbf {Z}[1]$, $\dots $, $\mathbf {Z}[q]$ (called principal components or factors) such that the new variables are uncorrelated with each other and account for decreasing portions of the variance of the original variables. The q principal components are then expressed as weighted sums of the elements of $\mathbf {Y}$ with maximal variance, where the weights are normalized and constrained to ensure orthogonality:

$$\begin{aligned} \mathbf {Z}[p]=\sum _{j=1}^n w_{jp} \mathbf {Y}[j], \qquad p=1,\dots ,q \end{aligned}$$

(3)

Given the multivariate time series matrix $\mathbf {Y}$, $\mathbf {Z}= \mathbf {Y}\mathbf {W}$ represents the projection of the series onto the pth principal components and $\hat{\mathbf {Y}}= \mathbf {Z}\mathbf {W}^T$ represent the reconstruction $\hat{\mathbf {Y}}$ of the values of $\mathbf {Y}$, based on the factors $\mathbf {Z}$. On the other hand, nonlinear dimensionality reduction is performed through the use of autoencoders. Autoencoders are neural networks trained to learn identity mapping from inputs to outputs [31], through a constrained architecture to enforce dimensionality reduction. As such their input and output layer have the same number of neurons n as the number of input time series but their hidden layers contain a reduced number of neurons q. Autoencoders are composed of two stacked multi-layer networks: an encoder:

$$\begin{aligned} \mathbf {Z}_t= & {} f_\theta (\mathbf {Y}_t) \end{aligned}$$

(4)

that transforms inputs $\mathbf {Y}_t$ into some latent (encoded) representation $\mathbf {Z}_t$, and a decoder:

$$\begin{aligned} \hat{\mathbf {Y}}_t= & {} g_{\theta '}(\mathbf {Z}_t) \end{aligned}$$

(5)

that reconstructs an approximation $\hat{\mathbf {Y}}_t$ of the input $\mathbf {Y}_t$ on the basis of the latent feature $\mathbf {Z}_t$ and where the mappings $f_\theta $ and $g_{\theta '}$ are non-linear. The network is usually trained using gradient descent techniques such as backpropagation, with the objective of minimizing the mean-squared error between the input $\mathbf {Y}_t$ and the output (its reconstruction $\hat{\mathbf {Y}}_t$) [31]. Concerning the forecasting part, the original DFML paper [4] proposes to forecast each factor independently (given their orthogonality) using a nonlinear model (lazy learning [5]) and a univariate multi-step-ahead forecasting strategy. In addition to the basic forecaster, the paper also proposes an optimized version (DFML$'$), performing a joint selection of the hyperparameters (number of factors for the dimensionality reduction, predictor and multi-step-ahead strategy for the forecaster) using out-of-sample assessment. Although we consider lazy methods for the forecaster, the modular architecture of this framework easily allows the replacement of the aforementioned technique with alternative supervised machine learning approaches (e.g. SVM, RNN).

3 Methodology

3.1 Multivariate Forecasting Methods

Multiple Univariate Techniques - {Naive, UNI}: In the case of a multivariate time series $\mathbf {Y}$, univariate approaches are still of interest since the multivariate forecasting task can be decomposed in a number of single-output multi-input tasks (or equivalently in a set of NARX tasks with exogenous variables)

$$\begin{aligned} {\left\{ \begin{array}{ll} Y_{t+1}[1]&{}=f_1 (Y_{t}[1],\dots ,Y_{t-m+1}[1], \dots , \\ &{} Y_{t}[n],\dots ,Y_{t-m+1}[n] )+w_t[1]\\ \vdots \\ Y_{t+1}[n]&{}=f_n (Y_{t}[1],\dots ,Y_{t-m+1}[1], \dots , \\ &{} Y_{t}[n],\dots ,Y_{t-m+1}[n] )+w_t[n] \end{array}\right. } \end{aligned}$$

(6)

In this case the training set is used to learn the n mapping functions $f_i$, $i=1,\dots ,n$, with $w_t[i]$ being uncorrelated disturbances. For large n the problem of large input dimensionality can be addressed by adopting a feature selection technique, selecting a reduced number q of most correlated features For these univariate techniques, we will also consider a naive method in which $\forall i \in \{1,\dots ,n\}, f_i(t)=Y_{t-1}[i]$, i.e. for every series, the forecast for the following H steps is given by the last available value. These are the baseline methods against which we will compare the performances of our forecaster.

Partial Least Squares - PLS: Partial Least Squares [15] allows the joint forecasting of the H steps ahead of the multivariate time series on the basis of the lagged vectors $\mathbf {Y}_{t},\dots ,\mathbf {Y}_{t-m}$. This is a multi-input multi-output regression task where the number of inputs amounts to nm and the number of outputs to Hn respectively, with n being the number of variables, m the embedding order of the model and H being the forecasting horizon. The benefit of PLS is that it allows at the same time a dimensionality reduction of the inputs and a joint prediction of the outputs, taking then into consideration the dependency between the future steps. An example of application of PLS in financial time series forecasting can be found in [22].

Recurrent Neural Networks - {RNN, LSTM}: Recurrent Neural networks (RNN) form a class of predictive models based on neural networks, in which recurrent connections to the network inputs allow to model dynamic temporal dependencies. In their simple form (also known as simple RNN) [17, 23], the recurrent connections come from a hidden state $H_t$, which is also used for predicting future values $Y_t$:

$$\begin{aligned} \mathbf {H}_t= & {} \sigma (\mathbf {W}_{HY}\mathbf {Y}_{t-1}+\mathbf {W}_{HH}\mathbf {H}_{t-1}+\mathbf {B}_H), \end{aligned}$$

(7)

$$\begin{aligned} \mathbf {Y}_t= & {} \mathbf {W}_{YH}\mathbf {H}_t+\mathbf {B}_Y \end{aligned}$$

(8)

The matrices $\mathbf {W}_{HY}$, $\mathbf {W}_{HH}$, $\mathbf {W}_{YH}$, $\mathbf {B}_H$ and $\mathbf {B}_Y$ are the parameters (weights and biases) of the network, typically learned by gradient descent algorithms such as backpropagation through time [17]. A sigmoid activation function $\sigma $ allows the modeling of nonlinear dependencies, while the recurrent connections allow the modeling of long-term temporal dependencies. Research on RNNs has recently been boosted by the advent of General Programming Graphic Processing Units (GPGPU), and improved design of the memory cell (Long-Short Term Memory cells [20]). These have allowed much more efficient RNN implementations, and effective training over multiple layers (deep RNNs). RNNs architectures have reached state-of-the-art performances for volatility either as part of an LSTM-GARCH hybrid model [21, 33] or as standalone model [26].

3.2 Datasets Description

CAC40: The available data consists of 1645 data points of the 40 time series composing the french stock market index CAC40 from 02/01/2012 to 08/06/2018 (approximately 6 years and 5 months) in OHLC (Opening, High, Low, Closing) format.

Cryptocurrencies: The available data comes from the Kaggle dataset “Every Cryptocurrency Daily Market Price”^{Footnote 1} constituted of 785,024 observation of 1644 different cryptotokens from 28/04/2013 to 06/06/2018. However the number of available datapoints per cryptotoken is inversely proportional to the lifespan of the token itself. In other words, the further we go into the past, the fewer values we have for our analysis, as depicted in Fig. 2. For these reasons, we restricted our analysis to the period from 28/01/2017 to 06/06/2018 for which we have complete OHLC data for 291 tokens.

3.3 Volatility Proxies

The OHLC available data is composed of several quantities of interest, each of them on a daily time scale: $P^{(o)}_t,P^{(c)}_t,P^{(h)}_t,P^{(l)}_t$, respectively the stock prices at the opening, closing of the trading day and the maximum and minimum value for each trading day. In the absence of detailed information concerning the price movements within a given trading days, stock volatility becomes directly unobservable [30]. To cope with such problem, several different measures (also called proxies) have been proposed in the econometrics literature [16, 19, 24, 27] to capture this information. However, there is no consensus in the scientific literature upon what volatility proxy should be employed for a given purpose. Nevertheless, for an empirical analysis of the use of volatility proxies in the case of univariate forecasting, the interested reader could find more details in [8].

Volatility as Variance. The first family of proxies corresponds to the natural definition of volatility [27], that is, a rolling standard deviation of a given stock’s continuously compounded returns over a past time window of size n:

$$\begin{aligned} \sigma ^{SD,w}_t = \sqrt{\frac{1}{w-1} \sum _{i=0}^{w-1} (r_{t-i} - \bar{r}_w)^2} \end{aligned}$$

(9)

where

$$\begin{aligned} r_t = \ln \left( \frac{P^{(c)}_t}{P^{(c)}_{t-1}} \right) \end{aligned}$$

(10)

represents the daily continuously compounded return for day t computed from the closing prices $P^{(c)}_t$ and $\bar{r}_w$ represents the returns’ average over the period $\{t,\cdots ,t-w\}$. In this formulation, w represents the degree of smoothing that is applied to the original time series. We will consider here $w \in \{5,10,21\}$, representing respectively one week, two weeks and one month of trading.

Volatility as a Proxy of the Coarse Grained Intraday Information. The second family of proxies that we will consider is the $\sigma ^{i}_t$ one, analytically derived by [16] by incorporating supplementary information (i.e. opening, maximum and minimum price for a given trading day) and trying to optimize the quality of the estimation. Among all the defined proxies, we will focus on:

$$\begin{aligned} \sigma ^{0}_t = \left[ \ln \left( \frac{P^{(c)}_{t+1}}{P^{(c)}_{t}} \right) \right] ^2 = r_t^2 \end{aligned}$$

(11)

$$\begin{aligned} u = \ln \left( \frac{P^{(h)}_t}{P^{(o)}_t} \right)&d = \ln \left( \frac{P^{(l)}_t}{P^{(o)}_t} \right)&c = \ln \left( \frac{P^{(c)}_t}{P^{(o)}_t} \right) \end{aligned}$$

(12)

where u is the normalized high price, d is the normalized low price and c is the normalized closing price.

$$\begin{aligned} \sigma ^{4}_t = 0.511 (u-d)^2 - 0.019[c(u+d) - 2ud] - 0.383c^2 \end{aligned}$$

(13)

$$\begin{aligned} \sigma ^{6}_t = \underbrace{\frac{a}{f} \cdot \log \left( \frac{P^{(o)}_{t+1}}{P^{(c)}_{t}} \right) ^2}_{\text {Nightly volatility}} + \underbrace{\frac{1-a}{1-f} \cdot \hat{\sigma }_4 (t)}_{\text {Intraday volatility}} \end{aligned}$$

(14)

The value of $f \in [0,1]$ represents the fraction of the trading day in which the market is closed. It is by definition bounded in the interval [0, 1], In the case of CAC40, we have that $f > 1-f$, since trading is only performed of roughly one third of the day. Here, a is a weighting parameter, whose optimal value, according to [16] is shown to be 0.17, regardless of the value of f.

After a preprocessing phase of the datasets, involving removal of missing values and proxy calculation for each time series, the data is restructured in a multivariate time series matrix form $\mathbf {Y}$ having N (number of observations) rows and n (number of variables/time series) columns. For each proxy, this matrix is such that each row $\mathbf {Y}_t$ represent a n-dimensional vector containing the value of the given proxy for of the n variables at the time t, and the scalar value $Y_t[j]$ represent the value of jth ($j=1,\dots ,n$) variable at time t.

4 Experimental Results

The experimental study assessed and compared the methods previously discussed in the article. The methods are listed below together with the software used for the experiments. Note that, for the sake of assessment, we set the lag $m=2$ and the maximum number of latent factors to $q=3$ for all methods, unless specified otherwise.

1.
NAIVE: univariate baseline method using the last observed value for each time series as prediction for the following H steps.
2.
UNI: univariate multi-step-ahead Direct forecasting of each individual series (Eq. 6) with a feature selection process based on correlation.
3.
PLS: partial-least-squares forecasting (Sect. 3.1) implemented by the function mvr of the R package pls. The optimal values for the size of the input space and the number of principal components q is determined through an out-of-sample criterion.
4.
RNN: recurrent neural network implemented by the keras_predict function of kerasR^{Footnote 2}, the R keras interface to the keras Deep Learning library^{Footnote 3} for Theano. The network is a fully-connected RNN with 10 hidden units. Since an automated setting of the number of units would not have been feasible due to an excessive computational time, this number has been set on the basis of trial and error over a small number of synthetic series.
5.
LSTM: As RNN, the model is a fully connected RNN, with 10 hidden units implemented using kerasR. It differs from RNN as it employs LSTM cells [20] in the hidden layer, instead of regular neurons.
6.
DFM: linear Dynamic Factor Model where PCA is used for factor estimation, the number of factors is set to q and the forecasting of the factors is carried out with a VAR method implemented by the estBlackBox function of the R package dse. The batch PCA is computed using the base R eigen function.
7.
DFML$_{PCA}$: Dynamic Factor Machine Learner where PCA is used for factor estimation, the number of factors is set to q and the forecasting of each factors is carried out in a univariate manner using a local learning predictor (lazy learning [5]) and a multi-step-ahead Direct strategy.
8.
DFML$_{A}$: it differs from DFML$_{PCA}$ because of the use of an autoencoder instead of PCA in the process of factor estimation.
9.
DFML’$_{PCA}$: it differs from DFML$_{PCA}$ because of the automatic selection strategy (described in [4]): the number of factors (in the range [1, q]) and the multi-step-ahead strategy (among Direct, Iterated and MIMO) and the lag m are selected by an out-of-sample strategy carried out on the training set.

4.1 Results Discussion

For each multivariate dataset we performed time series cross-validation following a rolling origin strategy [29]. The size of the training set is 2N/3 and a sequence of 50 different test sets of length H is considered.

For each test set, all methods are assessed in terms of the average Normalized Mean Squared Error:

$$ {\text {NMSE}}=\frac{\sum _{j=1}^n \text{ NMSE }[j] }{n}$$

where

$$ {\text {NMSE}}[j]=\frac{ \sum _{h=1}^H (Y_{T+h}[j]-\hat{Y}_{T+h}[j])^2}{ V[j] H} $$

V[j] is the variance of the series Y[j] and $T+1$ is the starting index of the continuation set.

While dealing with high dimensionality ($n=291$) coupled with a relatively low number of observations ($N=495$), as in the case of the Cryptocurrency dataset (Table 1), using the $\sigma ^i_t$ family of proxies, the DFML, even without hyperparameter optimisation, clearly outperforms all the concurrent methods. It should also be noted that some methods tested in the original DFML paper [4] (i.e. VAR, DSE, SSA) could not be tested due to numerical problems related to the limited number of available observations. The performances of DFML are mitigated while using proxies coming from the $\sigma ^{SD,w}_t$ family, where the performance of the Naive method improves, even for forecasting horizons up to 20 steps ahead, as the smoothing provided by the window size parameter w increases. In both the cases, a linear dimensionality reduction technique with no optimization (DFM, DFML$_{PCA}$) is shown to improve the performances of the forecaster, compared to nonlinear (DFML$_{A}$) and optimized (DFML’$_{PCA}$) ones.

A similar ranking among the methods is observed in the case of the CAC40 dataset (Table 2), characterized by a lower dimensionality ($n=40$) but an higher number of points ($N=1641$). Here we can observe a generally higher average normalized NMSE, indicating a higher complexity of the forecasting problem. For the $\sigma ^i_t$ family, PLS and DFM appears as competitive alternatives of the DFML, especially for longer horizons ($h>15$). As in the previous case, for the $\sigma ^{SD,w}_t$ family of proxies, the performances of the DFML family are affected by the value of the smoothing factor w, where, the higher the smoothing factor is, the less effective the DFML becomes for shorter horizons, with the Naive method becoming the best one, while still maintaining good forecasting accuracy for longer horizons.

Table 1. Cryptocurrencies - volatility time series: NMSE (averaged over all the continuation sets) of the different forecasting methods. The bold notation is used to highlight all techniques which are not significantly worse (pv = 0.05) than the one with the lowest NMSE score.

Full size table

Table 2. CAC40 - volatility time series: NMSE (averaged over all the continuation sets) of the different forecasting methods. The bold notation is used to highlight all techniques which are not significantly worse (pv = 0.05) than the one with the lowest NMSE score.

Full size table

In addition to forecasting accuracy, we also analyzed the total computational time required to produce a forecast. The total computational time is obtained by summing up the time required to train the considered model and the time needed to generate a forecast. Figure 3a shows that, for low dimensionalities ($n=40$) the total computational time of the different techniques is comparable, and independent of the forecasting horizon, except for the optimized DFML’$_{PCA}$, where the comparison of different forecasting strategies require a computational time proportional to the length of the forecasting horizon. On the other hand, for higher dimensionalities ($n=291$), the computational time required to train multiple univariate models (UNI), neural based models (RNN and LSTM) and PLS increases considerably due to the increase of both dimensionality and forecasting horizons, while DFML models, thanks to the dimensionality reduction component, maintain a reduced computational time regardless of the forecasting horizon.

5 Conclusion and Future Work

The empirical analysis shows that DFML is able to produce accurate volatility forecasts, especially in the case of high-dimensional noisy series (i.e. Cryptocurrencies dataset) with non-smoothed volatility proxies $\sigma ^{i}$, by summarizing well the intrinsic market correlations in a reduced number of factors. However, the presence of a smoothing factor (as in the $\sigma ^{SD,w}$ proxies family) is shown to worsen the performances of the DFML methods. Moreover, we have shown that, thanks to the dimensionality reduction component, DFML methods can produce multi-step ahead forecasts with the same accuracy as concurrent methods with a great reduction in terms of computational cost. In order to further improve this framework we foresee different possible extensions. On one hand we believe that the use of additional volatility proxies, together with an automated variable selection process could further improve the forecasting performances. On the other hand, the use of incremental dimensionality reduction techniques could further improve the computational efficiency of the method at the expense of a small reduction in forecasting accuracy.

Notes

References

Alessandretti, L., ElBahrawy, A., Aiello, L.M., Baronchelli, A.: Machine learning the cryptocurrency market. arXiv preprint arXiv:1805.08550 (2018)
Andersen, T.G., Bollerslev, T.: ARCH and GARCH models. Encyclopedia of Statistical Sciences (1998)
Google Scholar
Bollerslev, T., Patton, A.J., Quaedvlieg, R.: Multivariate leverage effects and realized semicovariance GARCH models (2018). https://doi.org/10.2139/ssrn.3164361
Bontempi, G., Le Borgne, Y.A., De Stefani, J.: A dynamic factor machine learning method for multi-variate and multi-step-ahead forecasting. In: 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA), pp. 222–231. IEEE (2017)
Google Scholar
Bontempi, G., Taieb, S.B.: Conditionally dependent strategies for multiple-step-ahead prediction in local learning. Int. J. Forecast. 27(3), 689–699 (2011)
Article Google Scholar
Catania, L., Grassi, S., Ravazzolo, F.: Forecasting cryptocurrencies financial time series. In: CAMP Working Paper Series 3, BI Norwegian Business School (2018)
Google Scholar
Catania, L., Grassi, S., Ravazzolo, F.: Predicting the volatility of cryptocurrency time-series. In: CAMP Working Paper Series 5, BI Norwegian Business School (2018)
Google Scholar
De Stefani, J., Caelen, O., Hattab, D., Bontempi, G.: Machine learning for multi-step ahead forecasting of volatility proxies. In: 2nd Workshop on MIning DAta for Financial Applications (MIDAS). CEUR Workshop Proceedings, Aachen, vol. 1941, pp. 17–28 (2017). http://ceur-ws.org/Vol-1941/MIDAS2017_paper3.pdf
De Stefani, J., Le Borgne, Y.A., Caelen, O., Hattab, D., Bontempi, G.: Batch and incremental dynamic factor machine learning for multivariate and multi-step-ahead forecasting. https://doi.org/10.1007/s41060-018-0150-x
Degiannakis, S.: Multiple days ahead realized volatility forecasting: single, combined and average forecasts. Glob. Financ. J. 36, 41–61 (2018)
Article Google Scholar
ElBahrawy, A., Alessandretti, L., Kandler, A., Pastor-Satorras, R., Baronchelli, A.: Evolutionary dynamics of the cryptocurrency market. Roy. Soc. Open Sci. 4(11), 170623 (2017)
Article MathSciNet Google Scholar
Engle III, R.F., Ito, T., Lin, W.L.: Meteor showers or heat waves? Heteroskedastic intra-daily volatility in the foreign exchange market (1988)
Google Scholar
Fengler, M.R., Herwartz, H., Raters, F.: Multivariate volatility models. In: Härdle, W.K., Hautsch, N., Overbeck, L. (eds.) Applied Quantitative Finance, pp. 25–37. Springer, Heidelberg (2017). https://doi.org/10.1007/978-3-540-69179-2_15
Chapter Google Scholar
Forni, M., Hallin, M., Lippi, M., Reichlin, L.: The generalized dynamic factor model. J. Am. Stat. Assoc. 100(471), 830–840 (2005). https://doi.org/10.1198/016214504000002050
Article MATH Google Scholar
Franses, P., Legerstee, R.: A unifying view on multi-step forecasting using an autoregression. J. Econ. Surv. 24(3), 389–401 (2010)
Google Scholar
Garman, M.B., Klass, M.J.: On the estimation of security price volatilities from historical data. J. Bus. 53(1), 67–78 (1980)
Article Google Scholar
Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-24797-2
Book MATH Google Scholar
Hafner, C.M., Herwartz, H.: Structural analysis of portfolio risk using beta impulse response functions. Statistica Neerlandica 52(3), 336–355 (1998)
Article Google Scholar
Hansen, P.R., Lunde, A.: A forecast comparison of volatility models: does anything beat a garch (1, 1)? J. Appl. Econometrics 20(7), 873–889 (2005)
Article MathSciNet Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Kim, H.Y., Won, C.H.: Forecasting the volatility of stock price index: a hybrid model integrating LSTM with multiple GARCH-type models. Expert Syst. Appl. 103, 25–37 (2018)
Article Google Scholar
Kim, J.M., Jung, H.: Time series forecasting using functional partial least square regression with stochastic volatility, GARCH, and exponential smoothing. J. Forecast. 37(3), 269–280 (2018)
Article MathSciNet Google Scholar
Lipton, Z.C., Berkowitz, J., Elkan, C.: A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019 (2015)
Parkinson, M.: The extreme value method for estimating the variance of the rate of return. J. Bus. 53(1), 61–65 (1980)
Article Google Scholar
Peng, Y., Albuquerque, P.H.M., de Sá, J.M.C., Padula, A.J.A., Montenegro, M.R.: The best of two worlds: forecasting high frequency volatility for cryptocurrencies and traditional currencies with support vector regression. Expert Syst. Appl. 97, 177–192 (2018)
Article Google Scholar
Petneházi, G., Gáll, J.: Exploring the predictability of range-based volatility estimators using rnns. arXiv preprint arXiv:1803.07152 (2018)
Poon, S.H., Granger, C.W.: Forecasting volatility in financial markets: a review. J. Econ. Lit. 41(2), 478–539 (2003)
Article Google Scholar
Stock, J., Watson, M.: Dynamic factor models. In: Clements, M., Hendry, D. (eds.) Oxford Handbook of Economic Forecasting. Oxford University Press, Oxford (2010)
Google Scholar
Tashman, L.J.: Out-of-sample tests of forecasting accuracy: an analysis and review. Int. J. Forecast. 16(4), 437–450 (2000). The M3-Competition
Article Google Scholar
Tsay, R.S.: Analysis of Financial Time Series, vol. 543. Wiley, Hoboken (2005)
Book Google Scholar
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)
MathSciNet MATH Google Scholar
Walther, T., Klein, T.: Exogenous drivers of cryptocurrency volatility - a mixed data sampling approach to forecasting (2018). https://doi.org/10.2139/ssrn.3192474
Yu, S.L., Li, Z.: Forecasting stock price index volatility with LSTM deep neural network. In: Tavana, M., Patnaik, S. (eds.) Recent Developments in Data Science and Business Analytics. SPBE, pp. 265–272. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-72745-5_29
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

MLG, Departement d’Informatique, Université Libre de Bruxelles, Boulevard du Triomphe CP212, 1050, Brussels, Belgium
Jacopo De Stefani, Yann-Aël Le Borgne & Gianluca Bontempi
Worldline SA/NV R&D, Brussels, Belgium
Olivier Caelen
Equens Worldline R&D, Lille, Seclin, France
Dalila Hattab

Authors

Jacopo De Stefani
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Caelen
View author publications
You can also search for this author in PubMed Google Scholar
Dalila Hattab
View author publications
You can also search for this author in PubMed Google Scholar
Yann-Aël Le Borgne
View author publications
You can also search for this author in PubMed Google Scholar
Gianluca Bontempi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jacopo De Stefani .

Editor information

Editors and Affiliations

IBM Research - Ireland, Dublin, Ireland
Carlos Alzate
KDD Lab, University of Pisa, Pisa, Italy
Anna Monreale
University of Turin, Turin, Italy
Livio Bioglio
R&D Department, UniCredit, Milan, Italy
Valerio Bitetta
R&D Department, UniCredit, Rome, Italy
Ilaria Bordino
IMT School for Advanced Studies Lucca, Lucca, Italy
Guido Caldarelli
R&D Department, UniCredit, Milan, Italy
Andrea Ferretti
KDD Lab, University of Pisa, Pisa, Italy
Riccardo Guidotti
R&D Department, UniCredit, Rome, Italy
Francesco Gullo
R&D Department, UniCredit, Milan, Italy
Stefano Pascolutti
Department of Computer Science, University of Turin, Turin, Italy
Ruggero G. Pensa
National Institute of Applied Science, Université de Lyon, Lyon, France
Celine Robardet
IMT School Advanced Studies Lucca, Lucca, Italy
Tiziano Squartini

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

De Stefani, J., Caelen, O., Hattab, D., Le Borgne, YA., Bontempi, G. (2019). A Multivariate and Multi-step Ahead Machine Learning Approach to Traditional and Cryptocurrencies Volatility Forecasting. In: Alzate, C., et al. ECML PKDD 2018 Workshops. MIDAS PAP 2018 2018. Lecture Notes in Computer Science(), vol 11054. Springer, Cham. https://doi.org/10.1007/978-3-030-13463-1_1

Download citation

DOI: https://doi.org/10.1007/978-3-030-13463-1_1
Published: 07 February 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-13462-4
Online ISBN: 978-3-030-13463-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics