1 Introduction

The definition of factors of factor investing originates from “Arbitrage pricing theory” proposed by Ross [10], which holds that the expected return of a financial asset can be modeled as a function of various macroeconomic factors or theoretical market indexes. And then researchers have tried to use specific factors to model the return of stocks. Three-factors model [4] was the primary one which modeled excess return of stock by book value, earning. Further researches verified a series of factors can be used to explain the return of investing in stocks, factors can be summarised into three main categories: macroeconomic, statistical, and fundamental. In risk model developed by Barra team from MSCI company, factor returns are estimated through cross-sectional regression [8]. Factor portfolios were built according to target factors to construct factor returns in Fama-French approach [1, 4]. Similarly, Smart Beta Index from MSCI company [2, 3] is compiled according to target factors to reflect the style and performance of specific factors under the different market situation. When being put to use, multiple factors usually need to be integrated, a common way to integrate factors is a linearly weighted sum, and weights of each factor are calculated by solving an optimization with subjectively defined target [3]. In recent years, non-linear methods such Support Vector Machine, Logistic Regression, Random Forest, Neural Networks and deep learning methodologies are well used in financial time series modeling, yet most existing works focus on stock price prediction. They learn parameters of models by fitting training samples and presume that the distribution of the training set and test set in the feature space are identical [9, 13,14,15]. In the aspect of cross-section modeling and feature integration, only several works exist [5, 6].

In our works, we introduce neural networks into the task of cross-section factor integration, and we extract factors according to the definition from Barra [8]. We use Smart Beta Index methodology to compile factor indexes to reflect performance and style of them on the Chinese market. Experimental results show the index that compiled based on factors integrated by neural networks results in better profitability and stability.

2 Factors and Factor Indexes

The changes of the stock price are not just a result of historical market behavior, but also affected by information from multiple sources such as macroeconomy and financial situation of the corresponding listed company. Indicators can be selected and defined to capture this information for usage on investment practice, and they are called factors. Factors are extracted from three main sources: technical indicators from market samples, fundamental indicators from financial statements and macroeconomic indicators.

When used in market practices, stocks are ranked and selected according to scores calculated by one or multiple factors. Factors that proven to be robust through a long time period are summarized by Barra risk model. Table 1 present the definition of factors. Original indicators are extracted from market data of stocks and financial statement of their corresponding listed companies. Factors are usually sampled in monthly frequency when being used.

Table 1. BARRA style factors

To reflect performances of factors on market practices, factor indexes are compiled according to methodologies proposed by MSCI company. At beginning of each season component stocks of benchmark CSI 800 are sorted by factor score, and top 100 are selected as component of factor index and weighted according to their market value. For single factor indexes, component stocks are sorted by single target factor, for multi-factors indexes, weights of component stocks are calculated by solving optimization whose objective are maximizing multiple target factors:

$$\begin{aligned}&\max \quad \sum \limits _{k=1}^K{\sum \limits _{i=1}^n \omega _i X_{ik}^{target}}\\&\begin{array}{r@{\quad }r@{}l@{\quad }l} s.t. &{}\sum \limits _{i=1}^n \omega _i X_{ik}^{non-target} \ge \sum \limits _{i=1}^n \omega _i^{benchmark} X_{ik}^{non-target}-0.25*std(X_{k}^{non-target}), \\ {} &{}k=1,2,3\ldots ,\tilde{K}\\ &{}\sum \limits _{i=1}^n \omega _i X_{ik}^{non-target} \le \sum \limits _{i=1}^n \omega _i^{benchmark} X_{ik}^{non-target}+0.25*std(X_{k}^{non-target}), \\ {} &{}k=1,2,3\ldots ,\tilde{K}\\ &{}max(0,\omega _i^{benchmark}-2\%)\le \omega _i \le \max (10\omega _i^{benchmark},\omega _i^{benchmark}+2\%), \\ {} &{}i=1,2,3\ldots ,n \\ \end{array} . \end{aligned}$$

According to this methodology we compile single factor indexes and multi-factor indexes with target on Momentum, Size, Value, Dividend, which follows document from MSCI. Figure 1 is back-test results of factor indexes during 2010 to 2017. Factors present different style among different market situation. Profitability and risk of each factors are evaluated by indicators listed in Table 2, from which we can see that factor indexes reach higher returns and Sharpe ratio than benchmark, which verified the effectiveness of these factors on Chinese market. Moreover, subjectively setting the objective of optimization for factor integration may lead to unsatisfied result on profitability and risk, since factors show different performance in different market.

Fig. 1.
figure 1

Smart Beta factor indexes based on CSI 800.

Table 2. Smart Beta Index simulation results based on CSI 800

3 Neural Networks for Factor Integration

Deep learning methodology is explored on stock price prediction [7, 11, 12], and deep neural networks are designed to extract features from time series samples for prediction. Portfolio construction is another kind of market practice which provides cross-section level samples. In this work, we introduce Multi-layer Perceptron (MLP) to deal with cross-section factors. Traditional machine learning and linear regression are also applied in the experiment for comparison.

We use factors of each component stock of CSI 800 index from 2008 to 2017 for the experiment. Models are trained at the start of every year using monthly samples \(\{\chi _{t}^i,y_t^i\}\) from previous 3 years, where \(\chi _t^i\) denotes factors listed in Table 1 of stock i, and \(y_t^i\) denotes return of from t to \(t+1\). At the start of each month, factors of each stock are integrated by models trained at the start of that year, and stocks are sorted according to integrated factors, and top 100 stocks are used for index compilation and weighted according to their market size.

Fig. 2.
figure 2

Model integrated factor indexes based on CSI 800.

Results of indexes compiled based on integrated factors are performed in Fig. 2, from which we can see that the net value of most models based integrated factor indexes outperform benchmarks during most part of the back-test period. We further evaluate each index by the same performance indicators listed in Table 3. From the results of performance indicators, we can conclude that: (1) Factors integrated by neural networks and linear regression show better performance on profitability and stability than the multi-factors index. It implies that the model based integration can potentially mine the relationship between factors of stocks and their future performances. On the one hand, neural networks and linear regression based indexes show higher return than multi-factor indexes, on the other hand, volatility of multi-factor index is higher which means higher risk. Moreover, the higher Sharpe ratio still implies higher stability. (2) Neural networks show better performance than linear regression, which means the non-linear relationship between factors can be used to enhance the performance of integrated factors.

Table 3. Integrated factor indexes simulation results based on CSI 800

4 Conclusion

Factor indexes reflect performances of factors for factor investing so that robust factors can be filtered. Filtered factors need to be further integrated, our work introduces deep neural networks and other supervised models to integrate factors supervised by future return. And indexes are compiled according to integrated factors to evaluate their performance. Experimental results show that supervised integration by the model can enhance the effectiveness of integrated factors compared to integration by optimization with a subjectively defined objective. And Neural network is verified to be more effective since it is able to mine deep non-linear relationship between factors and future performance of stock price.