Keywords

1 Introduction

Post the global financial crisis, there has been a dramatic change in the use of central bank communications as a central bank policy instrument [1, 2]. Central banks communicate qualitative information to the financial market through statements, minutes, speeches, and published reports [3]. Communication is an important tool that a central bank can use to avert a crisis, by providing investors with its assessment of the risks and the measures it views as necessary to reduce those risks within the economy [1, 4]. Previous studies suggest that effective central bank communications can mitigate and potentially prevent a financial crisis; ineffective communications may exacerbate one [1, 5]. In [4], the Swedish central bank, the Riksbank, is criticized because its communications were “not clear or strong enough” leading up to the global financial crisis, such that the bank’s information went “unnoticed” [1]. In this paper, we design an automated system that predicts the impact of central bank communications on interest rate expectations as derived via financial market patterns. For the purposes of this study, we analyze economic sentiment, as expressed in the ‘Monetary Policy Committee Minutes’ [2] published by the Bank of England, that details its monthly interest rate decisions.

Financial markets scrutinize central bank communications for “clues and shades of meaning about its assessment of the economy and the direction of where economic policy may be heading” [1]. As a prediction task, the measurement and evaluation of sentiment is challenging due to the complexities and subtleties of interpreting bank communications [1]. The formation of economic policy is a balancing act between achieving high economic growth and financial stability, while targeting low inflation [2]. The relative importance of these objectives is dynamic, and varies depending on prevailing economic conditions [2]. For example under benign economic conditions, high inflation may be construed by financial market investors as a negative signal for the direction of future interest rates. During the financial crisis of 2007–2009, high inflation was considered to be a positive signal by effectively lowering real interest ratesFootnote 1 [6]. This motivates a need for fine-grained sentiment analysis, to automatically detect economic aspects and predict central bank sentiment expressed towards these aspects [7]. Such an approach would provide investors with an automated system to decipher the complexities and interactions of economic aspects, to interpret the consequences of these interactions for the future path of interest rates, and to incorporate the information into their investment decisions. For a central bank, such a system would provide it with the ability to predict the impact of its economic policies on the financial markets. The resulting ‘price discovery’ process [2] may promote a more efficient functioning of financial markets.

Our approach consists of four phases. First, the system detects salient references to economic aspects associated with economic growth, prices, interest rates and bank lending and employs a multinomial Naive Bayesian model to classify sentences within central bank documents. Economic aspects are identified in a pre-processing step, that employs a link analysis using the TextRank algorithm [8, 9] applied to background knowledge obtained from Wikipedia. The second phase measures sentiment expressed for the economic aspects, using a count of terms from the General Inquirer dictionary [17]. The third phase employs Latent Dirichlet Allocation (LDA) to infer intensifiers/diminishers that may change the meaning of the economic aspects and economic sentiment [7, 10]. Specifically, the model categorizes whether the magnitude of the economic aspects has ‘intensified’ or ‘diminished’ over time [11, 12]. We refer to the resulting topic clusters as directional topic clusters. Finally, an ensemble tree combines the model components to predict the impact of the communications on financial market interest rates over the following day.

The rest of this paper is structured as follows. Section 2 draws on literature from the field of macroeconomics and discusses the implications for sentiment analysis and keyword detection. Section 3 models the individual components of the system. Section 4 outlines the corpus of central bank communications, provides an evaluation of the model components and then discusses the results. Section 5 concludes and suggests avenues for future research.

2 Related Work

2.1 Background: Central Bank Research

Post the financial crisis, several central banks have identified communications, particularly ‘enhanced forward guidance’, as an important policy instrument within their economic toolkit [1, 2]. Effective communications enhance a central bank’s public transparency, accountability and credibility [13], which in turn aids its ability to implement economic policies [14]. To date, there has been little research into text mining of central bank communications. In [14], the impact of different types of communications (press releases, speeches, interviews, and news conferences) are analyzed to determine which media sources impact interest rate expectations. The analysis does not, however, classify the language used in the documents. In [3], a term counting approach is adopted to analyze the sentiment contained within the meeting minutes of the US central bank (the Federal Reserve). In [3, 15] Latent Semantic Analysis is employed to analyze the sentiment contained within the Bank of Canada’s minutes. The intention of this study is to design a fine-grained sentiment analysis approach to analyze the impact of central bank communications on financial market investors. To our knowledge, this remains an unexplored avenue of research.

2.2 Background: Sentiment Analysis

Traditionally, fine-grained sentiment analysis has been researched for the classification of online user reviews of products and movies [16]. Readers are often not only interested in the general sentiment towards an aspect but also a detailed opinion analysis for each of these aspects [7]. Evaluation is conducted by comparing model classifications versus ratings provided by users. The evaluation of economic sentiment is arguably a harder task, due to the lack of a clearly defined outcome to assess model performance. For example, which economic variable should a model’s predictions be evaluated against? The relative importance of the aspects (e.g. economic growth/inflation/interest rates) is subjective, may vary over time, and the measurement of the aspects is only known with significant time delay.

The traditional approach to text-mining within the field of finance is to count terms using the General Inquirer dictionary [17, 18]. The dictionary classifies words according to multiple categories, including 1,915 positive words and 2,291 negative words. The General Inquirer was developed for psychology and sociology research and while it is used for text mining within the field of finance, little research has been conducted as to its suitability within finance [19]. Aspects that are frequently mentioned in central bank communications, such as the terms ‘employment’, ‘unemployment’ and ‘growth’, are not classified by the General Inquirer dictionary. Adjectives are often needed before investors can interpret the patterns in the economy to form their interest rate expectations [3]. Furthermore, the terms ‘inflation’ and ‘low’ are classified as negative by the dictionary, yet ‘low inflation’ is a positive characteristic and indeed achieving this is a central bank’s core objective [2]. The terms ‘fall’ and ‘decline’ are classified as negative terms in the General Inquirer dictionary, yet the opposite terms ‘rise’ and ‘increase’ are not classified at all.

2.3 Background: Keyword Detection

Graph-based algorithms have received much attention [8] as an approach to keyphrase extraction and are considered to be state-of-the-art unsupervised methods [20]. In a graph representation of a document, nodes are words or phrases, and edges represent co-occurrence or semantic relations. The underlying assumption is that all words in the text have some relationship to all other words in the text. Such an approach is statistical, because it links all co-occurring terms without considering their meaning or function in text. Centrality is often used to estimate the importance of a word in a document [22], and is a way of deciding on the importance of a vertex within a graph that takes into account global information recursively computed from the entire graph, rather than relying only on local vertex-specific information [23]. The main advantage of such a representation is that selected terms are independent of their language [21].

3 Model to Predict Changes in Investors’ Expectations

In this section we describe the four phases of the system. First, the system detects salient references to economic aspects and employs a multinomial Naive Bayesian model to classify sentences within documents. The second phase measures sentiment expressed for the economic aspects, using a count of terms from the General Inquirer dictionary. The third phase employs a LDA model and categorizes whether the magnitude of the economic aspects has ‘intensified’ or ‘diminished’ [11, 12]. Finally, an ensemble tree combines the model components to predict the impact of the communications on financial market interest rates over the following day.

3.1 Aspect Detection

In [3] it is shown that tf-idf weighting selects infrequent terms that relate to major news events or economic shocks. By contrast, our approach is intended to detect the common economic themes that are discussed in central bank communications and are more likely to influence investors’ interest rate expectations on a day-to-day basis [2]. To determine salient references, we employ a link analysis approach that detects the most frequently mentioned terms within two Wikipedia pages on Central Banking and Inflation. TextRank [8], a ranking algorithm based on the concept of eigenvector centrality, is employed to compute the importance of the nodes in the graph. Each vertex corresponds to a word. A weight, wij, is assigned to the edge connecting the two vertices, vi and vj. The goal is to compute the score of each vertex, which reflects its importance, and use the word types that correspond to the highest scored vertices to form keywords for the text [23]. The score for vi, S(vi), is initialized with a default value and is computed in an iterative manner until convergence using recursive formula shown in Eq. (1).

$$ S(v_{i} ) = (1 - d) + d \times \sum\limits_{{v_{j} \epsilon Adj(v_{i} )}}^{{}} {\frac{{w_{ji} }}{{\sum\nolimits_{{v_{k} \epsilon Adj(v_{j} )}} {w_{jk} } }}s(v_{j} )} $$
(1)

where Adj(vi) denotes vi’s neighbors and d is the damping factor set to 0.85 [8]. Figure 1 displays the resulting clustering of terms. The size of each node is directly proportional to the TextRank score of the respective economic aspect.

Fig. 1.
figure 1

Link analysis of frequently occurring terms. Different nodes colors reflect different communities identified using the Clauset-Newman-Moore algorithm.

A greedy algorithm is employed to detect communities of terms within the network [31]. The algorithm detects four communities which we label as economic aspects. The economic growth aspect detects the frequency of the terms: ‘demand’, ‘goods’, ‘services’, ‘investment’. The prices aspect detects the terms: ‘inflation’, ‘prices’, ‘money’, ‘markets’, ‘currency’. The interest rate aspect detects the occurrence of: ‘interest’, ‘rates’, ‘policy’ and a bank lending aspect detects the terms: ‘banks’, ‘lending’ and ‘assets’. It is not surprising to see these terms appear in the link analysis given a central bank’s remit is to maintain price and financial stability. The choice of terms is consistent with the text mining research of [3] which identifies ‘growth’, ‘price’, ‘rate’, and ‘econom’ as the most frequently occurring terms for US central bank communications. Using the four economic aspects, the system next employs a multinomial Naive Bayesian model [24] to categorize sentences within each document. The resulting categorization labels form the basis upon which fine-grained sentiment analysis is applied.

3.2 Polarity Detection

In the second phase, the model computes a measure of economic sentiment associated with each of the four economic aspects. We measure polarity by counting the number of positive (P) versus negative (N) terms, (P − N)/(P + N) identified using the General Inquirer dictionary [17]. In line with [16], our goal is not to show that a term counting method can perform as well as a Machine Learning method, but to provide a baseline methodology to measure central bank sentiment and to draw attention to the limitations of the approach that is widely adopted by text mining studies in the field of finance as indicated in Sect. 2.2. The sentiment metrics that are associated with the economic aspects: economic growth, prices, interest rate and bank lending are labelled Tone growth , Tone prices , Tone interest_rates and Tone bank_lending respectively. A fifth sentiment metric, Tone overall , is computed to measure the polarity associated with the overall document, without conditioning upon the economic aspects. The five sentiment metrics are included as separate components within the ensemble tree.

3.3 Detection of LDA Directional Topic Clusters

Next we extend the baseline term-counting method by taking intensifiers and diminishers into account [11, 12]. These are terms that change the degree of the expressed sentiment in a document (see Sect. 2.2). In the case of central bank communications, the terms describe how economic aspects have changed over time. We employ an implementation of LDA [10], and represent each document as a probability distribution over latent topics, where each topic is modeled by a probability distribution of words. In [7], LDA is found to capture the global topics in documents, to the extent that topics do not represent ratable aspects associated with individual documents, but define clusterings of the documents into specific types. For the purposes of training the LDA model, we consider each sentence within each central bank communication to be a separate document. This increases the sample size of the dataset (see Sect. 4.1) and is intended to improve the robustness of the LDA model for statistical inference. We implement standard settings for LDA hyper-parameters, α = 50/K and β = .01, where the number of topics K is set to 20 [25]. We manually annotate two of the topic clusters that capture ‘directional’ information [1] and appear to act as intensifiers/diminishers of meaning. We label the clusters directional topic clusters. Table 1 identifies the top terms associated with the two clusters. Representative words are the highest probability document terms for each topic cluster.

Table 1. Representative document terms associated with the directional topic clusters

Next for each central bank communication the LDA model infers the probabilities associated with the ‘intensifier’ and ‘diminisher’ clusters within each of the four economic aspects detected by the Naïve Bayesian classifier. The output of the model is a vector of eight topic probabilities that proxy the central bank’s assessment that the economic aspects are intensifying/diminishing. We label the model directional LDA model and the respective probability vectors: \( {\text{Topic}}_{{{\text{growth}}_{ \uparrow } }} \), \( {\text{Topic}}_{{{\text{prices}}_{ \uparrow } }} \), \( {\text{Topic}}_{{{\text{interest}}\_{\text{rates}}_{ \uparrow } }} \) and \( {\text{Topic}}_{{{\text{bank}}\_{\text{lending}}_{ \uparrow } }} \) if the economic aspects are increasing and \( {\text{Topic}}_{{{\text{growth}}_{ \downarrow } }} \), \( {\text{Topic}}_{{{\text{prices}}_{ \downarrow } }} \), \( {\text{Topic}}_{{{\text{interest}}\_{\text{rate}}_{ \downarrow } }} \) and \( {\text{Topic}}_{{{\text{bank}}\_{\text{lending}}_{ \downarrow } }} \) if the economic aspects are decreasing. We include the topic probabilities as components within the ensemble tree.

4 Experiments

In this section we discuss the corpus of central bank communications and describe the investor patterns data used to evaluate the impact of the central bank communications on investors’ interest rate expectations. We then outline the evaluation of the ensemble classification tree, present the results and provide a discussion.

4.1 Data

We choose to analyze the interest rate minutes of the Bank of England. As cited in [3], central bank minutes are closely watched by investors to gauge the future direction of economic policies. Similar datasets for the US and Canadian central banks’ minutes are examined in [3, 15]. The Bank of England announces the level of UK interest rates on the first Thursday of every month. The details that underpin this decision are only provided two weeks later and are published in the Bank of England’s ‘Monetary Policy Committee Minutes’. The communications are interesting to analyze because changes in investors’ expectations on the day of the central bank communication may be attributed to the qualitative information contained within the meeting minutes rather than the interest rate decision announced two weeks before. Minutes typically include summaries of committee members’ views on economic conditions and discuss the rationale for their interest rate decisions [26]. The central bank’s minutes are, on average, 12 pages long (including a header page), and contain around 55 bullet points, typically with 5 sentences in each bullet. The documents are available from 1997, the year when Parliament voted to give the Bank of England operational independence from the UK government. We retrieve all meeting minutes available between July 1997–March 2014Footnote 2 to create a corpus that consists of 199 documents. For the purposes of aspect detection and to train the LDA model, we remove the header page and define a document as an individual sentence within each of the meeting minutes. This expands the corpus to a collection of 53,195 documents.

To evaluate the ensemble tree’s predictions we utilize information obtained from financial market patterns. Interest rate futures contracts are financial instruments that enable investors to insure against or speculate on uncertainty about the future level of interest rates [27]. Changes in the price of the futures contracts therefore reflect changes in investors’ views on the future direction in central bank interest rates. Investors’ interest rate expectations for the following three, six and twelve months are derived and published daily by the Bank of England. We utilize investors’ twelve month ahead forecasts. This data series has the greatest data coverage compared to the three and six month series. Furthermore, the twelve month forecast horizon is consistent with the time horizon over which that the Bank of England conducts its economic policies [2]. To isolate the effect of the central bank communication on investors’ expectations, we compute the percentage change in the interest rate futures contract, as measured from the close of business on the day of the communication announcement until the close of business one day after. This narrow time window helps to minimize the influence on investors’ interest rate expectations from other financial market factors that may occur at the same time [28].

4.2 Experiment Setup

We design the evaluation in stages in order to enhance our understanding of the system components. For a baseline, we evaluate the system’s predictions by using only the tone of the overall document (see Sect. 3.2). The approach does not take into account individual economic aspects or diminishers/intensifiers [11, 12]. We label the model naïve tone. This approach is consistent with the methodology typically adopted by financial literature [18]. Next we compare the outcomes of an ensemble model that combines the tone associated with each of the economic aspects: economic growth, prices, interest rates and bank lending (see Sect. 3.2). We label this the economic aspects model. A third model compares the outcomes from an ensemble model that combines the intensifiers/diminishers associated with the four economic aspects (see Sect. 3.3). We label this the directional LDA model. Finally, we combine the components in a single ensemble tree and refer to the system as the joint aspect-polarity model.

Learning and prediction is performed using an ensemble tree. The goal of ensemble methods is to combine the predictions of several models built with a given learning algorithm in order to improve generalizability and robustness over a single model. We use the Random Forest algorithm [30] that employs a diverse set of classifiers by introducing randomness into the classifier construction. Experiments were validated using five-fold cross validation in which the dataset is broken into five equal sized sets; the classifier is trained on four datasets and tested on the remaining dataset. The process is repeated five times and we calculate the average across folds. For evaluation, we select Mean Absolute Error (MAE), Root Mean Squared Error and Spearman’s rho (ρ). We also examine Spearman’s rho since prediction may be considered to be a ranking task. The formulae are displayed in Eq. (2) below.

$$ MAE = \frac{1}{n}\sum\limits_{i = 1}^{n} {|O_{i} - E_{i} |} \,\,,\,\,\,RMSE = \left[ {\frac{1}{n}\sum\limits_{i = 1}^{n} {|O_{i} - E_{i} |} } \right]^{2} \,\,,\,\,\,\rho = 1 - \frac{{6\sum (O_{i} - E_{i} )^{2} }}{{n(n^{2} - 1)}} $$
(2)

where Ei is the model’s predicted value, Oi is the realized value, and n is the number of observations. MAE measures the average magnitude of the forecast errors without considering direction; RMSE penalizes errors and gives a relatively high weight to large errors. A smaller value of MAE or RMSE indicates a more accurate prediction. Spearman’s rho is a non-parametric measure [29] of the degree of linear association between the predicted and realized values and is bound between the range −1 to +1. A positive Spearman’s rho indicates the model’s predictive ability; a negative value indicates a poor model fit.

4.3 Experiment Results

The evaluation metrics from the model components are shown in Table 2.

Table 2. Evaluation of the model components

The naïve tone model, which proxies the approach commonly adopted by text mining studies in the field of finance, shows the worst performance. It exhibits the highest MAE and RMSE. The rank correlation of the model’s forecasts with realized changes in investors’ interest rate expectations is highly statistically negative, implying that documents that are predicted to have a positive/negative impact on investors’ interest rate expectations result in the reverse outcome. The economic aspects and directional LDA models exhibit monotonic decreases in MAE and RMSE, suggesting a slight improvement in the model fit. Finally, the joint aspect-polarity model, that includes all model components in the ensemble tree, displays the lowest MAE and RMSE. The mildly positive Spearman’s rho is consistent with previous forecasting studies within the field of finance. As cited in [19], many factors influence the financial markets; a low, positive correlation provides sufficient comfort of the model’s predictive power.

5 Discussion

One interpretation of the experiment results is that multiple aspects are needed to improve the accuracy of the prediction system. The existence of a positive Spearman’s rho for the joint model versus a negative Spearman’s rho for the naïve tone and economic aspects may be indicative of a non-linear relationship between the components that is only evident when the models are combined rather than considered in isolation. One of the strengths of a regression tree is that it does not assume a functional form, allowing it to detect interactions between model components. To aid our understanding of prediction in the joint model, Fig. 2 displays the decision tree results for one of the folds. The values in the grey boxes provide the predicted percentage change in investors’ interest rate expectations associated with the sentiment contained within the central bank communication. A positive value indicates that the impact is expected to lead to an increase in investors’ interest rate expectations, while a negative value indicates an expected decrease in interest rate expectations.

Fig. 2.
figure 2

Example decision tree from one of the folds

The regression tree identifies the interaction between the directional topic clusters and Tone measures. The primary decision in the decision tree is central bank sentiment towards economic growth. The right hand path indicates that if a central bank communication emphasizes positive economic growth and discusses interest rate increases, investors’ expectations of future interest rates is predicted to rise by 3 %. The left hand path indicates that if a central bank tone towards economic growth is low, discussed declining bank lending and the tone towards interest rates is negative, investors are predicted to lower their expectations of future interest rates by 4 %.

6 Conclusion

The goal of central bank communication is to make messages as clear, simple and understandable as possible to a wide range of audiences [1]. In this study, we focus of one specific audience, namely financial market investors. The outcome of our study may feed the design of a system that can predict the impact of central bank communication on formation of investors’ interest rate expectations. The results of the joint aspect-polarity model suggest that investors may benefit by incorporating a measure of central bank sentiment to forecast interest rates.

In this study we evaluate model performance using prices from financial market instruments. The market price of an interest rate contract implicitly measures the average investor’s interest rate expectations [27]. It is also possible to compute an ‘implied probability distribution’ of those expectations [27]. In future work we plan to evaluate a range of metrics, including the dispersion of the expectations as a proxy of investor uncertainty. Post the 2007–2009 financial crisis, central banks have broadened the range of their communication, including the use of social media, live broadcasts, podcasts and blogs, to deliver their messages [1]. In future research, a wider range of central bank communications will be integrated into our study. We also intend to examine alternative approaches to select economic aspects, including dynamic approaches to detect salient terms as central bank communications change over time.