Memory Augmented Attention Model for Chinese Implicit Discourse Relation Recognition

Liu, Yang; Zhang, Jiajun; Zong, Chengqing

doi:10.1007/978-3-319-69005-6_34

Memory Augmented Attention Model for Chinese Implicit Discourse Relation Recognition

Yang Liu¹⁷,
Jiajun Zhang¹⁷ &
Chengqing Zong¹⁷

Conference paper
First Online: 07 October 2017

1954 Accesses
3 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10565))

Abstract

Recently, Chinese implicit discourse relation recognition has attracted more and more attention, since it is crucial to understand the Chinese discourse text. In this paper, we propose a novel memory augmented attention model which represents the arguments using an attention-based neural network and preserves the crucial information with an external memory network which captures each discourse relation clustering structure to support the relation inference. Extensive experiments demonstrate that our proposed model can achieve the new state-of-the-art results on Chinese Discourse Treebank. We further leverage network visualization to show why our attention and memory model are effective.

Download conference paper PDF

1 Introduction

The Chinese implicit discourse relation recognition has drawn more and more attention, because it is crucial for Chinese discourse understanding. Recently, the Chinese Discourse Treebank (CDTB) was released [1]. Although Chinese Discourse corpora shares the similar annotation framework with Penn Discourse Treebank (PDTB) for English, the statistical differences are obvious and significant. First, the connectives in Chinese occur much less frequently than those in English [2]. Second, the relation distribution in Chinese is more unbalanced than that in English. Third, the relation annotation for Chinese implicit case is more semantic due to the language essential characteristic [3]. These evidences indicate that implicit discourse relation recognition task for Chinese would be different from English.

Unfortunately, there is existing few work on Chinese discourse relation problem [4, 7], thus our work is mainly inspired by the studies of English. Conventional approaches on identifying English discourse relation rely on handcrafted features extracted from two arguments, including word-pairs [8], VerbNet classes [10], brown clustering [24], production rules [15] and dependency rules [9]. These features indeed capture the correlation with discourse relation to some extent and achieve considerable performance in explicit cases. However, implicit discourse relation recognition is much harder, due to the absence of connectives^{Footnote 1}. Moreover, these hand-crafted features usually suffer from data sparsity problem [19] and are weak to capture the deep semantic feature of discourse [22].

To tackle this problem, deep learning methods are introduced to this area. It can learn dense real-valued vector representations of the arguments, which can capture the semantics in some extent, and alleviate the data sparsity problem simultaneously. Recently, a variety of neural network architectures have been explored on this task, such as convolution neural network [32], recursive network [22], feed-forward network [26], recurrent network [25], attentional network [23] and hybrid feature model [5, 6]. These studies show that deep learning technology can achieve comparable or even better performance than the conventional approach with complicated hand-crafted features.

More recently, there are growing interest in memory augmented neural architecture. The advantage of extra memory is to capture and preserve useful information for task, the core of this idea is to keep those information in independent memory slot, and trigger and retrieval the related memory slot to support the inference. This design has proven effective in many works, including neural turing machine [17], memory network [28], dynamic memory networks [21], matching networks [29], etc.

Therefore, in this paper, we propose a memory augmented attention model (MAAM) to handle Chinese implicit discourse relation recognition task. It can represent arguments with an attention-based neural network, and then retrieval the external memory for relation inference support information, after that it combines the representation and memory support information to complete the classification.

More specifically, the procedure of our model can be divided into five steps: (1) Our model use a general encoder module to transform the input arguments from word sequence into dense vectors. (2) An attention module is proposed to score the importance of each word based on the given contexts and the weighted sum of the words is used as the argument representation. (3) An external memory is employed to produce an output based on this arguments representation. (4) The memory gate combines the memory output together with the attention representation to generate a refined representation of the arguments. (5) Finally, we stack a feed-forward network as the classification layer to predict the discourse relation. Extensive experiments and analysis show that our proposed method achieves the new state-of-the-art results on Chinese Discourse Treebank (CDTB).

2 Memory Augmented Attention Model

In this section, we first give an overview of the modules that build up memory augmented attention model (MAAM). We then introduce each module in detail and give intuitions about its formulation. A high-level illustration of the MAAM is shown in Fig. 1.

As shown in Fig. 1, our framework consists of five modules: (1) general encoder module; (2) content-based attention module; (3) external memory module; (4) memory gate; (5) classification module.

The General Encoder Module encodes the word sequence of the two arguments into distributed vector representations. It is implemented by using the bidirectional recurrent neural network.

The Attention Module is proposed to capture the importance (attention) of each word in two arguments. We score the weight of each word in the argument based on its inner context and generates a weighted sum as the argument representation.

The External Memory Module consists of a fixed number of memory slots. The external memory computes the match score between the representation of arguments and yields a probability distribution. Then memory generates a weighted sum as memory output.

The Memory Gate is a learn-able controller component and it computes the convex combination of the original argument representation and the memory output to generate a refined representation.

The Classification Module stacks on the refined representation of the arguments and outputs the final discourse relation. We implement this module with a two-layer feed-forward network which can capture the interaction between two arguments implicitly.

2.1 General Encoder Module

In implicit discourse relation recognition, the input is the word sequence of two arguments Arg1 and Arg2. We choose recurrent neural network [16] to encode the arguments. Word embeddings are given as input to the recurrent network. At each time step t, the network updates its hidden state $h_{t}= RNN(x_{t},h_{t-1})$, where $x_{t}$ is the embedding vector of the t-th word of the input argument. In our model, we use a gated recurrent unit (GRU) to replace the normal RNN unit [12]. GRU is a variant of RNN, which works much better than the original one and suffers less from the vanishing gradient problem by introducing the gate structure like Long Short Term Memory (LSTM) [18]. Assume each time step t has an input $x_{t}$ and a hidden state $h_{t}$. The formula of GRU shows as follows:

$$\begin{aligned} z_{t}&= \sigma (W_{z}x_{t}+U_{z}h_{t-1}+b_{z}) \end{aligned}$$

(1)

$$\begin{aligned} r_{t}&= \sigma (W_{r}x_{t}+U_{r}h_{t-1}+b_{r}) \end{aligned}$$

(2)

$$\begin{aligned} \tilde{h_{t}}&= tanh(Wx_{t}+r_{t}\circ Uh_{t-1}+b_{h}) \end{aligned}$$

(3)

$$\begin{aligned} h_{t}&= z_{t}\circ h_{t-1}+(1-z_{t})\circ \tilde{h_{t}} \end{aligned}$$

(4)

In brief, the simple version of GRU is $h_{t} = GRU(x_{t};h_{t-1})$. RNN and its variant as described above read an input sequence x in order, starting from the first word to the last one. However, we expect the representation of each word to summarize not only the preceding words, but also the following words. Thus, we propose to use a bidirectional RNN [27]. A Bi-RNN consists of a forward and a backward RNN. The forward RNN reads the input sequence from left to right, while the backward RNN reads the sequence in the reverse order.

$$\begin{aligned} \overrightarrow{h_{t}} = \overrightarrow{GRU}(x_{t},\overrightarrow{h_{t-1}}) \end{aligned}$$

(5)

$$\begin{aligned} \overleftarrow{h_{t}} = \overleftarrow{GRU}(x_{t},\overleftarrow{h_{t-1}}) \end{aligned}$$

(6)

We obtain representation for each word by concatenating two hidden state sequences generated by the forward and backward RNNs.

$$\begin{aligned} h_{t} = [\overrightarrow{h_{t}};\overleftarrow{h_{t}}] \end{aligned}$$

(7)

In this way, the representation $h_{t}$ of each word contains the summary of both the preceding words and the following words.

2.2 Attention Module

After obtaining the representation of the arguments by treating each word equally in general encoder module, we now apply the content-based attention module to score the importance of each word in the arguments. We evaluate the weight of each word only based on the its inner context. The motivation behind it is that since the connective is absent in implicit samples, we can utilize the context of the arguments to generate an appropriate representation. Obviously, the contribution of each word in the context is not same and it is natural to capture the correlation between the context dependent word feature and the discourse relation using attention mechanism. In our case, we use a multilayer perception to implement the attention module:

$$\begin{aligned} e_{t}=u_{a}^{T}tanh(W_{a}h_{t}+b_{a}) \end{aligned}$$

(8)

Notice that $h_{t}$ is generated by the general encoder module. The weight of each word $h_{t}$ is computed using softmax:

$$\begin{aligned} a_{t}=\frac{exp(e_{t})}{\sum _{j=1}^{T}exp(e_{j})} \end{aligned}$$

(9)

For instance, we consider the vector $v_{Arg1}$ the weighted sum of the representations of Arg1:

$$\begin{aligned} v_{Arg1}=\sum _{j=1}^{T}a_{t}h_{t} \end{aligned}$$

(10)

We generate the vector of Arg2 in the same way. Then we directly concatenate two vectors as the representation of arguments:

$$\begin{aligned} v_{Args} = [v_{Arg1};v_{Arg2}] \end{aligned}$$

(11)

2.3 External Memory Module

As long as we have the semantic representation of arguments, we can use it to interact with our augmented memory. Our external memory consists of the memory slots, which are activated by the particular pattern of the arguments and generate corresponding output as response. This memory output will be used in following step to refine the original argument representation. Concretely, we first compute the similarity score between $v_{Args}$ and each memory slot $m_{i}$ and produce a normalized weight $w_{i}$ using similarity measure $K[\cdot ,\cdot ]$. Also, in order to improve the focus, a sharpen factor $\beta $ is needed.

$$\begin{aligned} w_{i}\leftarrow \frac{exp(\beta K[v_{Args},m_{i}])}{\sum _{j}exp(\beta K[v_{Args},m_{j}]))} \end{aligned}$$

(12)

In our case, we use the cosine similarity as our metric.

$$\begin{aligned} K[u,v] = \frac{u\cdot v}{\left\| u \right\| \cdot \left\| v \right\| } \end{aligned}$$

(13)

Then, we generate the output from memory according to the weights.

$$\begin{aligned} m = \sum _{i} w_{i}m_{i} \end{aligned}$$

(14)

The memory design is mainly inspired by Neural Turing Machine [17]. The memory will capture the common pattern of discourse relation distribution during training. For example, when an input relation sample accesses the external memory, the memory will response with an output vector which contains the information mostly related to the similar samples it has seen before. Intuitively, samples with similar representations usually belong to the same discourse relation. In summary, the memory actually implicitly holds the discourse relation clustering information for the following classification. The external memory component is randomly initialized and optimized during training.

2.4 Memory Gate

Once we can access the output information m from memory, we can use it to generate the refined representation $\widetilde{v}$ along with the original representation of arguments $v_{Args}$. We propose an interpolation strategy to combine these two vectors together and employ a sigmoid function called memory gate to control the final output.

$$\begin{aligned} \alpha = \sigma (W_{g}[v_{Args};m] + b_{g}) \end{aligned}$$

(15)

where $\sigma $ is a sigmoid function. We then compute a convex combination of the memory output and the original argument representation:

$$\begin{aligned} \widetilde{v} = \alpha \cdot v_{Args} + (1-\alpha ) \cdot m \end{aligned}$$

(16)

The memory gate is a learn-able neural layer. The idea behind it is that although memory can return the clustering structure information which is potentially useful. Also, we build a gate mechanism to control the output of memory and mix them with the original argument representations.

2.5 Classification Module

Given the refined representation vector $\widetilde{v}$ of the arguments, we implement the classification module using a two-layer feed-forward network which is followed by a standard softmax layer.

$$\begin{aligned} \tilde{y} =softmax(tanh(W_{c}\widetilde{v}+b_{c})) \end{aligned}$$

(17)

where $\tilde{y}$ is our output predicted label. During training, we optimize the network parameters by maximizing the cross-entropy loss function between the true and predicted labels.

3 Experiments

3.1 Corpora

We evaluate our model on Chinese Discourse Treebank (CDTB) [1,2,3, 25], which has been published as standard corpora in CoNLL shared task 2016. In our work, we experiment on the ten relations in this corpus following the setup of suggestions given by the shared task. We directly adopt the standard training set, development set, test set and blind test set. We also use the word embeddings provided by the CoNLL 2016.

3.2 Training Details

To train our model, the objective function is defined as the cross-entropy loss between the outputs of the softmax layer and the ground-truth class labels. We use adadelta algorithm to optimize the whole neural networks. To avoid over-fitting, dropout operation is applied on the layer before softmax.

3.3 Experimental Results

To exhibit the effectiveness of our model, our experiment results consists of three parts: baselines, MAAM variants and MAAMs.

Baselines: We collect two baselines for our experiments, the one is “Conjunction” and another is “Focused RNN” which achieved the best result in CoNLL 2016 shared task.

We implement the first “Conjunction” system which directly annotates every test sample as “Conjunction”. The reason behind is that due to the unbalanced problem of corpora (see Table 1), this baseline system is very strong according to the CoNLL report by Xue et al. [25] and many participated systems cannot beat this baseline.

The “Focused RNN” is proposed by Weiss and Bajec [31], which is implemented with a focused recurrent neural network which can selective react to different context. Its result is directly selected from the report of CoNLL 2016.

MAAM variants: Since there are few published results on CDTB, it is necessary to show variants of our model. These variants are helpful to understand the contribution of each module, since the variants we proposed is only sightly different from our final model. The detail of each MAAM variants is shown below.

MAAM+0memslot+no Encoder: It use no encoder module at all. In this variant, it directly uses word embedding sequence to encode arguments, and applies the same attention layer on them. This model explores the effectiveness of embedding features missing context dependent information.

MAAM+0memslot+GRU Encoder: This system only uses single GRU as the encoder of input module, it is used to understand the effectiveness of bidirectional encoder.

MAAM+0memslot+Mean(no Attention): Instead of using attention mechanism, this system directly represent argument as mean of all hidden states in Bi-GRU, treating each word in argument equally.

We can see from Table 2 that the proposed MAAM module is better than all the variants. It is obvious that both of the context and the attention are beneficial for distributed argument representation in discourse relation.

Table 1. The Experiment results on CoNLL 2016 Shared Task

Full size table

MAAMs: Now, we compare our memory augmented attention model (MAAM) with other approaches in the closed track. Our memory models (containing different numbers of slots [1, 20, 50, 100, 150] can outperform the two baselines, and the one with 20 slots achieves the best result, which is the new state-of-the-art on CTDB. Specifically, we observe an interesting phenomenon in our memory models. Along with the number of memory slots grow, the performance is improved first (from 0 to 20 slots) but is gradually decreased (from 20 to 50, 100 and 150). We speculate that the under-fitting problem (no adequate training samples) is the main reason. When comparing to MAAM+0memslot, we can see that all the settings of memory model can obtain better results, demonstrating the effectiveness of proposed external memory component.

3.4 Discussion and Analysis

The experimental results demonstrate the superiority of our memory augmented attention model. In this section, we discuss the behavior of the external memory and the attention module in the network.

Memory Analysis: The results show that the external memory component is significantly helpful for the performance. In order to understand how our memory component works, we show a memory component which contains 10 memory slots in Fig. 2. As we mentioned above, the memory slot will be triggered when the relevant input arguments retrial the memory component. The memory will compute scores for each memory slot based on input arguments, we call these scores as activation. We now feed 13 arguments belong to different discourse relation into memory component. The activations of each 10 memory slots triggered by different relation samples are shown in Fig. 2, the deeper color means this slot achieve higher activation, each row in Fig. 2 exhibits the different activation of memory slot for every input relation arguments. As we can see that, arguments belong to the same relation always trigger the same slots (location) in memory component. For instance, the “EntRel” samples always focus on the 2-nd slot (in horizontal) and the “Conjunction” samples trigger the 8-th slot.

Representation Analysis: In order to understand the discourse relation distribution (representation) in our model, we show the t-SNE visualization of Chinese implicit discourse relation samples in Fig. 3 (using feature space from classification module). As we can see, the “Conjunction” relation samples mostly play as a background for any other relation. This may be caused by the definition of “conjunction”.^{Footnote 2} Meanwhile, other relation samples are hard to distinguish from “Conjunction” samples. This situation also indicates that the Chinese implicit relation recognition is a difficult task.

Attention Analysis: Our attention module scores each word relying on the inner content. It captures the correlation between content and discourse relation, different from independent word embedding information which can not access the surrounding context. In Fig. 4, the “Causation” relation example extracted from corpora shows our model pays more attention on the content words than the function words. We annotated the alignment relation between the Chinese relation sample and its English translation. The attention module focuses on the “international;steady;expansion” in Arg1 and “for China’s export;provides;international environment” in Arg2, which can be roughly considered as a simple summarization of two arguments. This example demonstrates the effect of the proposed attention module. The result of attention makes us to wonder if we should give different score to word when we deal with different relation.

Discussion: Another issue we observed is the ambiguity and data imbalance of Chinese implicit discourse relation. Comparing to English, Chinese contains more less explicit connectives, this is the main reason for Chinese implicit reason recognition problem. Therefore, many relation samples is hard distinguish from “Conjunction”, unless it is pretty obvious for annotator. Our approach is actually based on a assumption that every relation has a prototype sample, thus we hope our memory component can capture each discourse relation prototype and identify it from unseen sample. However, we didn’t observed positive result to support our assumption.

4 Related Work

Implicit discourse relation recognition has been a hot topic in recent years. However, most of the approaches focus on English. There are mainly two directions related to our study: (1) English implicit discourse relation recognition using neural networks, and (2) memory augmented networks.

Conventional implicit relation recognition approaches rely on kinds of hand-crafted features [8, 11, 24], these surface features usually suffer from sparsity problem. Then, neural network based approaches are proposed. In order to alleviate feature sparsity problem, Ji and Eisenstein [19] first transform surface features of arguments into low dimension distributed representations to boost the performance. A discourse document usually covers different scale unit from word, sentence to paragraph. To model this kind of structures, Li [22] and Ji [20] both introduced the recursive network to represent arguments to facilitate the discourse parsing.Considering the discourse relation recognition as text classification problem, Liu et al. [23] propose a convolution neural network (CNN) to detect the sequence feature in arguments to predict relation. Rutherford et al. [25] conduct experiments to explore the effectiveness of feedforward neural network and recurrent neural network. Liu and Li [23] use attention mechanism to refine the representation of arguments by reweighing the importance of different parts of argument. Braud and Denis [13, 14] utilize the word representation to improve implicit discourse relation classification. Their method investigates the correlation between word embedding and discourse relation.

The memory model is inspired by recently proposed memory augmented network. The Neural Turing Machine (NTM) [17] builds an external memory component to preserve kinds of subsequence pattern explicitly, and makes NTM more effective to learn from training samples. Another type of memory augmented network is memory network [28], which is different from NTM and works more like a cache for particular data. The memory network saves the sentences in memory to support multiple step question & answer inference. More recently, the matching network is proposed by Vinyals et al. [29], its memory component caches the common pattern of representation and corresponding label of training samples. It predicts label by matching input sample with memory caches then generate weighted sum label (with matching distribution) as final output. Since the memory network can capture particular pattern of samples and be optimized during training, we extend it in our framework to maintain crucial information for Chinese implicit relation recognition. The experimental results verify the efficacy of the proposed memory network and the memory augmented model achieves the best performance on CDTB.

5 Conclusion

In this paper, we have proposed a memory augmented attention model for Chinese implicit Discourse relation recognition. The attention network is employed to learn the semantic representation of the two arguments Arg1 and Arg2. The memory network is introduced to capture the underlying clustering structure of samples. The extensive experiments show that our proposed method achieves the new state-of-the-art results on CDTB.

Notes

1.
The connective has strong correlation with discourse relations.
2.
Conjunction: relation between two equal-status statements serving a common communicative function, from CoNLL 2016. It is relative ambiguous.

References

Xue, N.: Annotating discourse connectives in the Chinese treebank. In: Proceedings of the Workshop on Frontiers in Corpus Annotations II: Pie in the Sky, pp. 84–91 (2005)
Google Scholar
Zhou, Y., Xue, N.: PDTB-style discourse annotation of Chinese text. In: ACL 2012, pp. 69–77 (2012)
Google Scholar
Zhou, Y., Xue, N.: The Chinese discourse treebank: a Chinese corpus annotated with discourse relations. Lang. Resour. Eval. 49(2), 397–431 (2015)
Article MathSciNet Google Scholar
Mei, T., Zhou, Y., Zong, C.: Automatically parsing Chinese discourse based on maximum entropy. Acta Scientiarum Naturalium Universitatis Pekinensis 50(1), 125–132 (2014)
Google Scholar
Li, H., Zhang, J., Zong, C.: Implicit discourse relation recognition for English and Chinese with multi-view modeling and effective representation learning. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 16(3), 19 (2017)
Article Google Scholar
Li, H., Zhang, J., Zhou, Y., Zong, C.: Predicting implicit discourse relation with multi-view modeling and effective representation learning. In: Lin, C.-Y., Xue, N., Zhao, D., Huang, X., Feng, Y. (eds.) ICCPOL/NLPCC -2016. LNCS, vol. 10102, pp. 374–386. Springer, Cham (2016). doi:10.1007/978-3-319-50496-4_31
Chapter Google Scholar
Li, Y., Feng, W., Sun, J., Kong, F., Zhou, G.: Building Chinese discourse corpus with connective-driven dependency tree structure. In: EMNLP, pp. 2105–2114 (2014)
Google Scholar
Lin, Z., Kan, M.-Y., Ng, H.T.: Recognizing implicit discourse relations in the penn discourse treebank. In EMNLP, pp. 343–351 (2009)
Google Scholar
Lin, Z., Ng, H.T., Kan, M.-Y.: A PDTB-styled end-to-end discourse parser. Nat. Lang. Eng. 20(02), 151–184 (2014)
Article Google Scholar
Kang, X., Li, H., Zhou, L., Zhang, J., Zong, C.: An end-to-end Chinese discourse parser with adaptation to explicit and nonexplicit relation recognition. In: ACL-CoNLL Shared Task, pp. 27–32 (2016)
Google Scholar
Pitler, E., Nenkova, A.: Using syntax to disambiguate explicit discourse connectives in text. In: ACL-IJCNLP, pp. 13–16 (2009)
Google Scholar
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
Braud, C., Denis, P.: Comparing word representations for implicit discourse relation classification. In: EMNLP (2015)
Google Scholar
Braud, C., Denis, P.: Learning connective-based word representations for implicit discourse relation identification. In: EMNLP, pp. 203–213 (2016)
Google Scholar
Chandrasekar, P., Zhang, X., Chakravarty, S., Ray, A., Krulick, J., Rozovskaya, A.: The Virginia Tech system at CoNLL-2016 shared task on shallow discourse parsing. In: CoNLL Shared Task 2016, pp. 115–121 (2016)
Google Scholar
Elman, J.L.: Distributed representations simple recurrent networks, and grammatical structure. Mach. Learn. 7(2–3), 195–225 (1991)
Google Scholar
Graves, A., Wayne, G., Danihelka, I.: Neural turing machines. arXiv preprint arXiv:1410.5401 (2014)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Ji, Y., Eisenstein, J.: Representation learning for text-level discourse parsing. In: ACL, pp. 13–24 (2014)
Google Scholar
Ji, Y., Eisenstein, J.: One vector is not enough: entity-augmented distributed semantics for discourse relations. TACL 3, 329–344 (2015)
Google Scholar
Kumar, A., Irsoy, O., Jonathan, S., Bradbury, J., English, R., Pierce, B., Ondruska, P., Gulrajani, I., Socher, R.: Ask me anything: dynamic memory networks for natural language processing. CoRR, abs/1506.07285 (2015)
Google Scholar
Li, J., Li, R., Hovy, E.H.: Recursive deep models for discourse parsing. In: EMNLP, pp. 2016–2069 (2014)
Google Scholar
Liu, Y., Li, S.: Recognizing implicit discourse relations via repeated reading: neural networks with multi-level attention. In EMNLP, pp. 1224–1233 (2016)
Google Scholar
Rutherford, A., Xue, N.: Discovering implicit discourse relations through brown cluster pair representation and coreference patterns. In: EACL, p. 645 (2014)
Google Scholar
Rutherford, A.T., Demberg, V., Xue, N.: Neural network models for implicit discourse relation classification in English and Chinese without surface features. arXiv preprint arXiv:1606.01990 (2016)
Schenk, N., Chiarcos, C., Donandt, K., Ronnqvist, S., Stepanov, E.A., Riccardi, G.: Do we really need all those rich linguistic features? a neural network-based approach to implicit sense labeling. In: ACL, pp. 41–50 (2016)
Google Scholar
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Sukhbaatar, S., Weston, J., Fergus, R., et al.: End-to-end memory networks. In: NIPS, pp. 2440–2448 (2015)
Google Scholar
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: Matching networks for one shot learning. In: NIPS 2016, pp. 3630–3638 (2016)
Google Scholar
Wang, J., Lan, M.: Two end-to-end shallow discourse parsers for English and Chinese in CoNLL-2016 shared task. In: CoNLL-2016 Shared Task, pp. 33–40 (2016)
Google Scholar
Weiss, G., Bajec, M.: Discourse sense classification from scratch using focused RNNs. In: ACL 2016 vol. 1(100), p. 50 (2016)
Google Scholar
Xue, N., Ng, H.T., Rutherford, A., Webber, B., Wang, C., Wang, H.: CoNLL 2016 shared task on multilingual shallow discourse parsing. In: Proceedings of the CoNLL-16 Shared Task, pp. 1–19 (2016)
Google Scholar

Download references

Acknowledgments

The research work has been supported by the Natural Science Foundation of China under Grant No. 61333018 and No. 61403379.

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, China
Yang Liu, Jiajun Zhang & Chengqing Zong

Authors

Yang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jiajun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chengqing Zong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chengqing Zong .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Maosong Sun
Beijing University of Posts and Telecommunications, Beijing, China
Xiaojie Wang
Peking University, Beijing, China
Baobao Chang
Soochow University, Suzhou, China
Deyi Xiong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, Y., Zhang, J., Zong, C. (2017). Memory Augmented Attention Model for Chinese Implicit Discourse Relation Recognition. In: Sun, M., Wang, X., Chang, B., Xiong, D. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2017 2017. Lecture Notes in Computer Science(), vol 10565. Springer, Cham. https://doi.org/10.1007/978-3-319-69005-6_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-69005-6_34
Published: 07 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69004-9
Online ISBN: 978-3-319-69005-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics