1 Introduction

Forecast of financial markets’ conditions is crucial not only for involved institutions and individuals but also for economical well-being of entire nations. On the other hand, financial markets are an inexhaustible source of noisy and incomplete information in a form of financial time series. The number of dimensions, many of which have hidden correlations, and amount of data to be analysed make delivery of good models with sufficient predictive quality very challengingFootnote 1. In this context, a correct forecast of market direction was shown to be sufficient to generate profitable trading strategies [10].

Attempts to employ data discovery and soft computing methods to deliver predictive models have a long track record, including a growing set of reported work based on Rough Sets [13]. Rough Sets and Fuzzy Rough Sets models [14] deliver a way to infer knowledge from noisy and incomplete data and to automatically select significant data features (reducts). This forms a good basis to cope with the challenges of financial time series like data inconsistency and number of dimensions. Rough Sets were shown to be applicable to any time scale, down to intraday trading [9]. However, the recent advances in the field of fuzzy rough sets [14, 16] have not been verified in the area of financial time series forecasting yet, although the flexible definition of a similarity relation and ability to work directly with numeric variables make fuzzy rough sets an interesting candidate. Furthermore, related experiments using soft computing models often reported good classification accuracy but financial backtesting and the actual financial performance of the evaluated models were seldom shown. This prompted some reports, where several soft computing models were tested and deemed to deliver performance at best on par with statistical methods and simple buy and hold strategy [7].

Consequently, this work attempts to contribute new research data with regards to the feasibility of rough sets and fuzzy rough sets models to the task of financial time series forecasting. Both, the classification accuracy and financial performance of the examined models were tested using large real life data sets of several well known market-neutral indices. This work extends also on research described in [12] by considering Fuzzy Rough Sets.

This paper is organized as follows: Sect. 2 describes the experiment setup, used data, preprocessing and prediction performance analysis framework. Section 3 presents and discusses the experiment results. Section 4 concludes the paper.

2 Experiment Setup

In this study, Rough Sets (RS), Variable Precision Rough Sets (VPRS) and Fuzzy Rough Sets [14] combined with the k-nearest neighbour algorithm [16] were used to predict direction of movements of US S&P500, German DAX and Hong Kong Hang Seng (HSI) stock market indicesFootnote 2. The implementation of the positive region based fuzzy-rough nearest neighbor algorithm [16] provided in [15] was used, with two definitions of the lower/upper set approximation based on:

  1. 1.

    VQRS: Vaguely Quantified Rough Sets [3].

  2. 2.

    Lukasiewicz triangular norm operators, with tolerance relation [5]:

$$\begin{aligned} R_a(x,y)=1-\frac{(|a(x)-a(y)|}{|a_{max}-a_{min}|} \end{aligned}$$
(1)

Predictive models based on Variable Precision Rough Sets [17] were used as the reference. A detailed review of roughs sets, fuzzy rough sets, their extensions and their applications in finance is provided in [4, 8, 13].

All the models were applied to time series data samples of the used stock market indices in order to generate movement predictions using a walk-forward rolling time window cross-validation procedure [6]. Subsequently, using a backtesting procedure applied to Exchange Traded Funds (ETFs) associated with respective indices, the financial profit and loss of a sample market timing strategy and the buy and hold benchmark were compared. The experiment environment was built using R package RoughSets [15], ROSETTA Rough Sets system [11] and an SQL database.

2.1 Input Data

Historical daily Open, Low, High, Close prices, and Volume (OHLC&V) time series of the S&P500, DAX and HSI stock market indices were used to train the models and generate index movement predictions. Additional input was provided by daily closing values of the S&P 500 near-term volatility index VIXFootnote 3, which was used to gauge market confidence and covered the below-given periods. Data was divided into training, calibration and test samples, whereas the training sample was set to be 3333 trading days, and the testing period was fixed between 29th of January 2010 and 31st of December 2013 for all indices. This caused a slight difference in the calendar coverage and total sample lengths due to differing holiday calendars of the used indices, i.e. for S&P500 from 1st of August 1996 to 30th of December 2013, DAX from 4th of September 1996 to 27th of December 2013, and HSI from 20th of November 1996 to 30th of December 2013. The associated ETFs, i.e. SPDR S&P500, iShares Core DAX UCITS, and Tracker Fund of Hong Kong, designed to track the performance of the respective indices, were subsequently used to backtest the financial performance of predictions generated for the test sample. The data was adjusted for dividends, splits and mergers. All the used indices and ETF data covered the same time period. Figure 1 displays relative performance of the used index time series over the applied time period.

The input data included 16 conditional variables and one decision variable. The following time series attributes were used as conditional variables: Open, Close, Volume, VIX Close. The following derived technical indicators were used as conditional variables: Acceleration (\(n=5\)), Average Directional Index (\(n=50\)), Commodity Channel Index (\(n=20\)), Chaikin Oscillator (\(n=3,m=10\)), Exponential Moving Average (\(n=50\)), MACD (\(n=12, m=26\)), Momentum (\(n=5\)), Price Oscillator (\(n=10, m=5\)), RSI (\(n=14\)), Price Rate of Change (\(n=1\)), Williams A/D, Williams %R. See [1] for detailed description of the indicatorsFootnote 4.

Fig. 1.
figure 1

Cumulated return of used indices over the applied time period (Color figure online)

Simple daily return \(R_i\) of the given index for a future period i served as the basis for the decision variable \(d_i\) (prediction target) defined to be the change direction of the daily simple return \(R_i\) as follows:

$$\begin{aligned} d_i= \left\{ \begin{array}{rl} down &{} \text{ if } R_i \in (-\infty ,0) \\ up &{} \text{ otherwise } \end{array} \right. . \end{aligned}$$
(2)

where:

  • \(d_i\) - the decision variable for i-th sample,

  • \(R_i=\left( \frac{p_i}{p_{i-1}} -1 \right) *100\) - simple daily return of the Close price \(p_i\) for i-th period (trading day).

The discrete decision variable \(d_i\) was used for both, rough sets and fuzzy rough sets classifier models.

2.2 Market Timing Strategy

The financial performance of the tested models was verified using a long-only market timing strategy defined as follows:

$$\begin{aligned} \text {if }d_i= \left\{ \begin{array}{rl} down &{}\text { then }sell\text { at }Open_{i+1} \\ up &{}\text { then }buy\text { at }Open_{i+1} \\ otherwise &{}\text { then no action} \end{array} \right. \end{aligned}$$
(3)

where:

  • \(d_i\) - decision variable for i-th sample,

  • \(Open_{i+1}\) - the open price at the next period (\(i+1\)-th sample).

Aside of handling the up and down forecast, the strategy accommodated for the case where a prediction model generated no prediction for the forecasted period. In this case the strategy generated no trading signal resulting in no action for the period. The strategy was defined as long only, i.e. no short position was allowed. The buy and hold strategy was used as the benchmark strategy. For the purpose of financial performance comparison the ability to buy/sell fractional shares was enabled. This allowed to fully utilize the available equity (initial equity was set to 10’000) in both, buy&sell and market timing strategies. The transaction costs (commissions, slippage, etc.) were ignored in this experiment, as the focus was on the prediction performance.

2.3 Setup and Testing of Classifier Models

All the models were tested with each of selected indices (see Fig. 1) using the walk-forward method with a rolling time window [6]. The walk-forward method derives from the standard cross-validation but also observes the time order of time series and so prevents sample and look-ahead bias. For each iteration of the walk-forward cross-validation a consecutively aligned training, calibration and testing samples were selected. The training sample was fixed at 3333 trading days. Multiple combinations of calibration and testing samples were defined drawing their lengths from the sets {21,42,63} and {3,5,21} trading days, respectively.

For each combination of training-calibration-test sample set in the walk-forward cycle, predictive models were created using the training sample. The parameter-driven models, i.e. the fuzzy rough positive region based nearest neighbour model with VQRS approximation (\(FPOSNN_{VQRS}\)) and the VPRS based model, were then tuned using a grid search on the respective calibration sample, so the optimal (i.e. resulting in the highest classification accuracy/lowest error) model parameter set for the given calibration sample and iteration of the walk forward procedure was found. For the reference VPRS model the tuned parameter was the VPRS \(\beta \in \{0.0,0.25,0.49\}\). For the \(FPOSNN_{VQRS}\) model, the optimal boundaries \((\alpha ,\beta )\) of the most quantifier were searched in the set \((\alpha ,\beta )_{most}\in \{(0.2,1),(0.4,1),(0.6,1)\}\). The some quantifier had fixed boundary parameters set at \((\alpha ,\beta )_{some}=(0.1,0.6)\). The FPOSNN model using the Lukasiewicz \(\mathcal {T}\)-norm operators (\(FPOSNN_{\mathcal {T}_{L}}\)) did not require a specific tuning. The FPOSNN models used the fixed number of neighbours \(k=10\) in the middle of the range suggested in [16].

For VPRS-based analysis the input variables were preprocessed as follows:

  1. 1.

    A subset of technical indicators and the VIX Close values, for which a common consensus on interpretation of their values exists, were discretized using manually defined intervals.

  2. 2.

    The remaining conditional variables were first normalized using the z-score method and subsequently discretized using the equal binning algorithm with 3 bins. The mean and standard deviation of the learning sample as well as its equal binning cuts were used to normalize and discretize the validation and testing samples.

For the fuzzy rough sets-based processing all conditional variables were normalized, like in case of the rough sets model. Fuzzy rough sets do not require discretization of conditional variables. The decision variable d (Eq. 2) was encoded as 1 for an up movement, and \(-1\) for a down movement. The VPRS model set the decision variable d to 0 in case, the ruleset was unable to predict its value (i.e. no matching rule was found, as per Eq. 3).

3 Experiment Results

The walk-forward procedure was executed using rough sets and fuzzy rough sets models generated for all combinations of test sample lengths and model specific parameters as defined in Sect. 2.3. Both, the predictive and financial performance were tested with results described below.

Table 1. Prediction performance of tested models

3.1 Prediction Performance

Table 1 presents the total classification performance of the generated models. The prediction models based on \(FPOSNN_{VQRS}\) delivered classification performance with the highest accuracy for S&P500 and DAX indices. i.e. 54.21 % (calibration sample/forecast period=21/3 trading days) and 52.93 % (calibration sample/forecast period=63/21 trading days), respectively. The VPRS rule based model delivered the best accuracy for HSI with 52.43 % vs. the worst performing \(FPOSNN_{VQRS}\) model with classification accuracy of 49.80 % for the calibration sample/forecast period=63/21 trading days. The FPOSNN models consistently outperformed the VPRS rule based model for all tested combinations of calibration and forecast periods for the S&P index. For S&P and DAX time series the \(FPOSNN_{VQRS}\) model delivered the best average accuracy of 53.11 % and 52.14 %, respectively, vs. the respective average performance of the VPRS rule based model of 51.54 % and 50.90 %. For the HSI index the \(FPOSNN_{VQRS}\) and \(FPOSNN_{\mathcal {T}_{L}}\) models delivered an average classification performance of 50.17 % and 50.43, vs. 51.17 % delivered by the VPRS model. The length of calibration and forecast periods had an impact on the classification efficiency for all tested models. The use and increasing length of the calibration sample had a positive impact only on performance of the VPRS model, when predicting movements of the HSI index. For other indices the increasing length of the calibration sample had none to negative impact on classification accuracy of the tested models. This shows that the time distance between the classified sample and the ruleset or the nearest neighbour base sample (i.e. the training sample) was more important than the grid search based tuning of model parameters. The latter required a calibration sample, thus increasing the time distance between the training and test samples. One can conclude that consideration of time distance as a factor in the classification process would most likely improve it [12]. It is also worth noticing that both, DAX and S&P500 had a strong upwards trend in more than a half of the testing period, whereas HSI growth was relatively moderate with a sidewards trend in the same time period (see Fig. 2). Thus, DAX and S&P500 time series supported trend following prediction models, and by extension the nearest neighbour algorithm taking decisions based on a relatively small number of k samples. Prediction of sidewards movements (so called whipsaws), like it was the case for the HSI time series, is recognised as challenging in the field of technical analysis. In this case the rule based algorithm was relatively better. The classification accuracy alone is however not sufficient to deliver a profitable predictive model. A decisive factor is the timing of the correct prediction, which directly impacts the financial performance.

Fig. 2.
figure 2

Cumulated return time series (calibration sample of 63 trading days) (Color figure online)

3.2 Financial Performance

The profit and loss backtesting using the market timing strategy outlined in Sect. 2.2 as well as the buy and hold benchmark strategy were executed for all combinations of the training, calibration and test sample lengths. The trading simulation used tracking ETFs associated with the underlying indices as the traded equity instrument. The trading costs and slippage were ignored. Figure 2 displays the cumulated return time series of tested models vs. buy and hold strategy for test runs using the calibration period of 63 trading daysFootnote 5, whereas Table 2 displays related financial performance summary. The S&P500 and DAX indices were in a strong upwards trend for the second half of the testing period, reflected in the total return of 85.39 % and 72.69 % of the respective tracking ETFs (and associated buy and hold strategy). Consequently, beating these indices was expected to be difficult. This was confirmed for the majority of the calibration/test sample length combinations, where the predictive models underperformed the market, although still generating positive returns above 30 % (DAX) and 40 % (S&P500). An exception was the \(FPOSNN_{VQRS}\) model, consistently beating the S&P500 performance when using the calibration period od 21 days, and delivering the peak return of 99.88 % using the 21/21 combination of calibration/test sample lengths. A similar exceptional performance in case of the DAX benchmark was delivered by the VPRS model with 90.13 % and 78.68 % return using the 42/3 and 42/5 trading days combination of calibration/test, respectively. In case of the S&P500 index, the decrease of financial return was coupled with the increased length of the calibration period, reflecting the similar observation done with regards to the classification accuracy. For the DAX index, this trend was not present, with the VPRS model performing the best on the calibration sample of 42 days. It should be noted, that DAX movements were more volatile than these of S&P500, although both indices shared the same strong upward trend in the second half of the testing period. This may explain the reason for the lack of the clear dependencies visible in the case of the S&P500 index, i.e. the US VIX volatility index might not be able to fully reflect the volatility of the German DAX index. Thus, it may be necessary to define more localized input variables for the tested models, so the inherent capability of Rough Sets to select reducts is applied for each underlying market. The HSI index reflects the above observations amplified by the fact that it had a relatively weaker upwards trend, with multiple whipsaws, resulting in the total return of the tracking ETF of only 24.36 % over the testing period. Consequently, the correct prediction of the index movement played larger role and can be seen in an excellent performance of \(FPOSNN_{VQRS}\) and VPRS models outperforming the Hong Kong market for all combinations of tested lengths of calibration and test samples. Both underlying models, namely VPRS and VQRS are closely related, aiming at reduction of classification noise caused by outliers. On the other hand, the \(FPOSNN_{\mathcal {T}_{L}}\) model underperformed the market, which would indicate that this model is actually trend following and so unable to work in a non-trending market. Even though the tested models mostly underperform DAX and S&P500 markets due to the above described dependencies, the total return alone gives a biased view of the performance without considering the associated risk. Especially, the information ratio and maximal drawdowns [2] showed the level of risk associated with the permanent exposure required by the buy&hold strategy. The FPOSNN based models had risk similar to that of DAX and S&P but performed well vs. the HSI index, with less than 50 % of the drawdown caused by the index. The VPRS rule based model also delivered performance slightly lower than DAX and S&P500 indices but with a much lower level of risk (drawdown). Like the \(FPOSNN_{VQRS}\) model, it outperformed the HSI index with the half of the risk, reflected in the 50 % lower drawdown and positive information ratio.

Table 2. Financial performance of tested models on a holdout sample

4 Conclusions

Variable Precision Rough Sets and Fuzzy Rough Sets were used to generate binary classifiers applied to real life market data of multiple stock indices. Both, the classification accuracy and ability to support profitable market timing strategies were evaluated. The experimental results showed that Fuzzy Rough Sets based models were able to outperform the VPRS based model in terms of classification accuracy in the majority of experiments. In terms of financial performance, the VPRS and \(FPOSNN_{VQRS}\) models were able to outperform the buy&hold strategy applied to the HSI index. Considering the simplicity of the used strategy and the strong upwards trend of S&P500 and DAX, the models were robust when applied to these times series. The application of Rough Sets models to portfolios representing a specific strategy can be simulated by tracking certain indices or exchange traded funds and is planned to be included in the future research. In general, the effectiveness of the delivered trading signals can be further improved by considering the time distance in classification algorithms and wider set of input variables.