A Neural Framework for Joint Prediction on Intent Identification and Slot Filling

Shan, Jiawei; Xu, Huayun; Gong, Zeyang; Su, Hanchen; Han, Xu; Li, Binyang

doi:10.1007/978-3-030-23407-2_2

Jiawei Shan¹⁷,
Huayun Xu¹⁷,
Zeyang Gong¹⁷,
Hanchen Su¹⁷,
Xu Han¹⁸ &
…
Binyang Li¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11518))

Included in the following conference series:

International Conference on Cognitive Computing

936 Accesses

Abstract

In task-oriented dialog systems, understanding of users’ queries (expressed in natural language) is a process of parsing users’ queries and converting them into some structure that machine can handle. The understanding usually consists of two parts, namely intent identification and slot filling. To address this problem, we propose a neural framework, named SI-LSTM, that combines two tasks and integrates CRF into LSTM network, where the slot information is extracted by using CRF, and the intent will be identified by using LSTM. In our approach, the slot information is used for determining the intent, while the intent type is used to rectify the slot filling deviation. Based on the dataset provided by NLPCC 2018, SI-LSTM achieved 90.71% on intent identification, slot filling and error correction in terms of accuracy.

You have full access to this open access chapter, Download conference paper PDF

Attention-Based CNN-BLSTM Networks for Joint Intent Detection and Slot Filling

Jointly Modeling Intent Identification and Slot Filling with Contextual and Hierarchical Information

D-GHNAS for Joint Intent Classification and Slot Filling

Keywords

1 Introduction

In task-oriented dialog systems, understanding of users’ queries (expressed in natural language) is a process of parsing users’ queries and converting them into some structure that machine can handle. The understanding usually consists of two parts, namely intent identification and slot filling. The textual strings, fed into a dialog system as input, are mostly the transcripts translated from spoken language by ASR (Automatic Speech Recognition) and thus subject to recognition errors.

Intent identification is to recognize the behavioral goals of the queries or sentences presented by users, such as playing music or booking tickets. Slot filling targets on extracting the semantic slot information related to the specific intents from conversation. Take the task of playing music as an example. When user intents to play music, the information of singer and song are the slots that are required to be extracted and filled. The former is considered as a classification problem, while the latter is a sequential labeling problem. Generally, as to the spoken language understanding in task-oriented dialog systems, intent identification and slot filling aim at converting human natural language expressions into structured information. NLPCC 2018 [22] provides an annotated dataset which covers three domains, namely music, navigation and phone call.

There are much research on these two issues. Traditional approaches toward intent identification include rules matching and machine learning algorithms. These approaches was proved effective in some specific domains. However, as the size of the corpora grows dramatically, it is difficult to extract characteristics from general domain. Under this state, some deep learning techniques were applied, including convolutional neural networks (CNN) [19] and recurrent neural network (RNN) [20], especially long-short time memory (LSTM) [21], and so on. Similar situations occurred in slot filling task [14,15,16].

Our team tries to conduct a model to handle both tasks in spoken language understanding in task-oriented dialogue systems efficiently. We propose a neural framework, named SI-LSTM model, for tackling the intent identification and slot filling. SI-LSTM integrates CRF into an LSTM network, where the CRF will segment the words into different parts of entities and generate the sequential labels for slot filling; and the LSTM will maintain the semantics of each sentence for intent identification. SI-LSTM combines intent identification and slot filling together, and the slot information is used for determining the intent while each type of the intent is used to rectify the slot filling deviation.

Based on the dataset provided by NLPCC 2018, SI-LSTM successfully improves the accuracy of each task, and achieved 90.71% on intent identification, slot filling and error correction in terms of accuracy.

2 Related Works

In this paper, there are two main tasks that are intent identification and slot filling. It is required to firstly identify the intent and then fill the corresponding slot with respect to the specific intent. Intent identification is usually considered as a classification problem, which is to classify each query into a corresponding intent category, while slot filling is regarded as a sequential labeling issue. In this section, we will introduce the current studies on these two tasks.

2.1 Intent Identification

There are many previous researches on intent identification task, and most of them can be divided into three categories.

[1] presented a rule-based method and designed different templates for tackling this problem. For different intent categories, various thesauruses were constructed to facilitate to identify the intent in the specific domain. More specifically, based on the previous collected records, a list of intents will be listed that are computed by probability distribution. However, this method required too much professional knowledge to construct all the rules, and it is rather time-consuming.

[2, 3] proposed machine learning approaches for intent identification, and Support Vector Machine (SVM), Naive Bayesian, and Decision Trees (DT) [3]. Most of the approaches focused on either feature construction or model selection. These methods have been proved to be effective in the accuracy improvement on the identification. Yet since corpora have become more and more random, it is difficult to find uniform features to figure out all intents.

More recently, researchers attempted to utilize deep learning technology to classify the intent, since the attributes of each sentence can be represented well. For instance, CNN [4], RNN [20], LSTM [5] were widely applied. Also, user profiling tried to guide the intent identification and perform the determination [6]. It was proved that these neural network models worked better on intent identification and speeded up the training process.

2.2 Slot Filling

Since our model also targets at the slot filling, we review the related works on slot filling from the following three aspects as well.

Firstly, rule-based methods are proposed. Based on linguistic rules, slot filling was usually accomplished by matching different rules. Despite a high accuracy, much time and abundant knowledge in specific field are required. After that, many statistical models were applied to solve the problem, which was proved efficient and effective. Hidden Markov Model (HMM) [7] and CFG [8] and Conditional Random Field (CRF) [9] are widely used for the sequential tagging problems. Then, researchers today begin to apply RNN [10] into slot filling, and many advantages are explored, including faster training process, a flexible architecture, effective performance, and so on. More importantly, these neural network models can be integrated by the sequential tagging model, such as HMM, CRF, which will further improve the performance.

Our work is inspired by the studies above, and we design a SI-LSTM model for intent identification and slot filling, which will be described in the following section.

3 Joint Prediction on Intent Identification and Slot Filling

In task-oriented dialog systems, the comprehension of users’ queries is a process of parsing users’ queries and converting them into some structures that machine can handle. This comprehension usually consists of two parts, namely intent identification and slot filling.

It is obvious that there is a strong correlation between intent identification and slot filling, and both of them can be mapped into a certain scope, especially in the dataset provided by NLPCC [22]. For example, some specific slot categories such as song, language, toplist, etc. will only appear in the sentences about music.play, rather than navigation.navigation. The correlations between the intention and the slot are shown in Table 1.

Table 1. The samples of intent type and the slot information.

Full size table

Besides, the dataset can be seen as a stream of user queries ordered by time stamp. The stream is further split into a series of segments according to the gaps of time stamps between queries and each segment is denoted as a ‘session’. Instead of being separated contexts, the contexts within a session is correlated with previous ones. For example, given the input text “张三 (Zhang San)” (a singer’s name), only when the latest intention is phone_call.make_a_phone_call, the current intent will be recognized as phone_call.make_a_phone_call, and the slot information will be “<contact_name> 张三 <contact_name>”. Otherwise, the intent will be classified into OTHERS and the slot information is empty.

Based on the above observation, we design a neural framework to tackle two tasks at the same time, named SI-LSTM. The structure of the SI-LSTM is shown in Fig. 1. SI-LSTM is a four-layer neural framework, consisting of a CRF layer, a CNN layer, a LSTM layer and n fully connected layer Firstly, each word in a sentence is regarded as the input to the CRF layer to generate the sequential label. After the CRF layer, slot filling results are obtained. Then the output of CRF layer together with the vectorized representation is putted into the CNN layer to extract rich semantics feature. On top of that, the LSTM layer enhances the use of text word order and time information. In the end, the fully connected layer will output the prediction result of the intent identification. Since the contexts within a session are taken into, we set up a memory cell to store the latest user intent. In this way, SI-LSTM will take the memory cell as a reference when it outputs the final result of the classification.

3.1 Conditional Random Field Layer

In our Conditional Random Field (CRF) layer, we utilize the classic CRF [11] model for sequential labeling, which attempts to model the conditional probability distribution P(Y|C). Our model first uses CRF for named entity recognition and slot filling.

Based on the given sentence, the CRF layer will firstly segment the words into different parts of entities, and then classify each entity according to the type, such as person, organization, location, and so on. To avoid of the limitations of the data bias towards the states with few successor states, our CRF layer is designed to have the ability to relax strong independence assumptions made in other models.

Given the observation $ C $, the model based on the assumption of first order Markov chain predicts the hidden sequence Y represent the attributes of the entities.

Conditional distribution is computed by Eq. (1):

$$ P\left( {Y |C} \right) = \frac{1}{Z(c)}e^{{\sum\nolimits_{i} {\sum\nolimits_{j} {\lambda_{j} f_{j} (y_{i - 1} ,y_{i} ,c,{\text{i}})} } }} $$

(1)

where $ Z(c) $ is a normalizing constant, and $ \lambda_{j} $ is the bias weight learned from the training data, and $ f_{j} $ is the feature function.

In our CRF layer, the output will be considered as the input for the CNN layer, in this way both the sequential label and the semantics information will be maintained.

3.2 CNN Layer

Inspired by the good feature extraction capabilities of CNN [12], we also use a CNN layer to extract features from texts. Meanwhile in this layer we will combine the slot filling categories with intent identification for the further processing.

Without the loss of generality, for each sentence $ s $, it can be formulated as a word sequence $ s = \{ w_{1} , w_{2} , \ldots , w_{L} \} $, where L denotes the length of $ {\text{s}} $. The objective of intent identification is produce $ y_{i} $ for each $ s_{i} $, and $ y_{i} $ is belonging one of the intent type. Then in the CNN layer, an n-dimensional vector is obtained by the combination of the whole words in the sentence shown in Eq. (2).

$$ {\text{x}}_{{1:{\text{n}}}} = {\text{x}}_{1} \oplus {\text{x}}_{2} \oplus {\text{x}}_{3} \oplus \ldots \oplus {\text{x}}_{\text{n}} $$

(2)

The convolutional layer is mainly used to capture the local information between words based on a sliding window. In our CNN layer, the length is denoted by $ h $ and the size is denoted by $ \upomega \in {\mathbb{R}}^{hk} $, which means that from $ i_{th} $ word to the $ i + h - 1_{th} $ word will be covered. Then the convolution kernel obtains a characteristic representation by Eq. (3).

$$ f_{i} = f\left( {\omega_{i} \cdot x_{i:i + h - 1} + b} \right) $$

(3)

The convolution kernel sequentially convolves all the windows in the sentence to get a feature map $ F \in {\mathbb{R}}^{n - h + 1} $ shown in Eq. (4).

$$ F = \left[ {f_{1} , f_{2} , {{ \ldots , }}f_{n - h + 1} } \right] $$

(4)

A max pooling is used to get the max dimension from each feature map as the final feature and retains only the most representative features in the feature vector.

The Softmax function is used to output the results which can be seen as the conditional probability distribution shown in Eq. (5), and it help us determine the most likely intent shown in Eq. (6).

$$ P_{\theta } \left( {y_{j} |h_{{s_{i} }} } \right) = softmax(h_{{s_{i} }} \omega + b) $$

(5)

$$ Y_{pred} = argmaxP_{\theta } $$

(6)

In our CNN layer, by using the classic CNN model, the input vector will be converted into a new fixed-length global vector which contains the most representative feature. Meanwhile, the training process is accelerated.

3.3 Long Short-Term Memory Layer

In LSTM layer, our model will encode the information from the vector converted by CNN into a fixed-length vector representation. In our task, the dialogue is a hierarchical sequence of data: each sentence is a sequence of words, and each session is a list of sentences.

The long and short term memory layer consists of several repeated cells, and each of them receives the output of the hidden layer $ h_{t - 1} $ at the previous time and the current input $ h_{t} $. Each cell is made up of an input gate $ i_{t} $, an oblivion gate $ f_{t} $ and an output gate $ o_{t} $. For every neuron in LSTM, the whole working process is as follows:

$$ i_{t} = \sigma \left( {w_{t}^{i} \cdot x_{t} + w_{t}^{i} \cdot h_{t - 1} + b_{t} } \right) $$

(7)

$$ f_{t} = \sigma \left( {w_{t}^{f} \cdot x_{t} + w_{t}^{f} \cdot h_{t - 1} + b_{t} } \right) $$

(8)

$$ q_{t} = tanh\left( {w_{t}^{q} \cdot x_{t} + w_{t}^{q} \cdot h_{t - 1} + b_{q} } \right) $$

(9)

$$ o_{t} = \sigma \left( {w_{o} \cdot x_{t} + w_{o} \cdot h_{t - 1} + b_{o} } \right) $$

(10)

$$ c_{t} = f_{t} \odot c_{t - 1} + i_{t} \odot q_{t} $$

(11)

$$ h_{t} = o_{t} \odot tanh\left( {c_{t} } \right) $$

(12)

Each vector of the feature combination layer is obtained by sequentially connecting the elements corresponding to the $ i_{th} $ dimension of each convolutional feature map shown in the Eq. (13).

$$ C_{i} = F_{1}^{i} \oplus F_{2}^{i} \oplus F_{3}^{i} \oplus \ldots \oplus F_{n - h + 1}^{i} $$

(13)

We will get $ n - h + 1 $ combination vectors so that each vector is recombined according to the convolution order to ensure the temporality of the text. Then the vectors are sequentially sent to the LSTM cell. The output of the last hidden layer is obtained as the final sentence representation and finally, the prediction result would be identified by the Softmax layer.

4 Spelling Correction

Among the process of slot filling, we need to handle the Chinese typos in the slot, thus the work of spelling correction is needed. Spelling correction has two requirements: to recognize the typos and to find the correct answers. For this consideration, we should design our method to calculate the string similarity such as Edit Distance and determine whether the lot is a typo by using the corresponding thesaurus and find the correct answer for each typo.

4.1 Preprocessing

Since the training corpus and test corpus in this task both contain multiple languages and some other characteristics may impose obstacles to the result, data preprocessing is supposed to be necessary.

There are a few obstacles. First of all, given the fact that Arabic numbers and Chinese numbers are considered as completely different character in this research, the conversion between Arabic numbers and Chinese numbers is necessary. When it comes to the expression of exact years such as ‘2002’, the conversion should follow the Chinese traditional expression.

Then the multiple languages could be another problem not only in the process of comparing the similarity between strings, but also in the process of getting each word’s Chinese Pinyin. In order to reduce the negative influence from the foreign languages on comparing Chinese Pinyin, all consecutive foreign characters in each slot are regarded as one Pinyin.

4.2 Congruent Length in String Order

We design our approach to compute the string similarity, especially when the string is missing one or two characters. Since the lengths of two strings are crucial for the similarity computation, we use the following equation to compute the similarity when the congruent length(Cl) is not less than a predefined value.

$$ Sim = 0.8 + 0.01 *Cl\quad {\text{where }}\;Cl \ge k $$

(14)

The parameter k is the congruent length, which is defined as follows.

$$ k\, = \,\left\{ {\begin{array}{*{20}l} {2 \quad 2\, \le \,slotlength\, \le \,4} \hfill \\ {slotlength - \left[ {0.4*slotlength} \right]} \hfill \\ {6\quad slotlength\, \ge \,8} \hfill \\ \end{array} } \right.\quad 5\, \le \,slotlength\, \le \,7 $$

(15)

4.3 Edit Distance

We also treat the edit distance [13] as another measurement to calculate similarity between two strings. We regard the edit distance as the minimum number of times needed to edit a single character (such as modify, insert, delete) when changing from one string to another. The smaller the edit distance is, the higher the similarity will be. This time complexity of this algorithm is $ {\text{O}}(m *n) $ and space complexity is $ {\text{O}}(m *n) $. m and n stand for the length of string a and b.

4.4 Spelling Correction

In this task, the content and the category of input slots are the two variables. At first, the content will be matched to the thesaurus in accordance with the specific category. If the content is appeared in the corresponding thesaurus, this content is not a spelling error. On the contrary, the algorithm will calculate its possibility of becoming a spelling error. Then the remaining slots will be compared with all the strings comparing to the thesaurus and then we obtain the highest similarity.

In our model, we design three ways to calculate the similarity of two strings. The first measurement is congruent length in string order. If there are qualified strings in corresponding thesaurus we will achieve the first similarity. The second measurement is edit distance(lev). This could be used to calculate the similarity in this way.

$$ sim = 1 - \frac{lev}{slotlength} $$

(17)

The third measurement is the edit distance(lev) between Chinese Pinyins. We define a parameter m to measure the similarity of the conversion between the character to Pinyin. Then the similarity can be computed by the following equation.

$$ sim = 1 - \frac{m*lev}{slotlength} $$

(18)

In our experiment, based on the performance on training dataset, we set m = 1.8.

With the incensement of the similarity, the corresponding slot is more likely a typo. On the contrary, this means that this slot is not similar to any strings in the corpus and is not likely to be a typo. We set a variable p = 0.55 to help us determine whether the slot is a typo. We take the biggest measurement among these three to represent the overall possibility. When the biggest measurement is bigger than p, then this slot is a spelling error and the corresponding string is the correct answer.

5 Experiment

In this section, we will report the performance of our proposed approach SI-LSTM based on the dataset provided by NLPCC 2018 [22].

5.1 Experiment Setup

NLPCC 2018 [22] provides a dialogue dataset focusing on three scenarios, namely music, navigation and telephone. The training dataset is consisted of 4,707 real annotated dialogue sessions, including 21,350 sentences and 11 intents. According to our statistics, there are 4.5 sentences in per session in average. As to the test dataset, 1,177 dialogue sessions containing 5,349 sentences are involved, and there are 4.5 sentences in per session on average as well. The training and test dataset information is shown in Table 2.

Table 2. The description of training and test dataset.

Full size table

To better demonstrate the dataset, we also list some statistics in Table 3. There are 11 types of intent in total, and the type of music.play contains the most sessions except for OTHERS. Note that, for some types of the intents, a corresponding thesaurus is also provided, which can help us to extract the slot information and spelling correction.

Table 3. The statistics of the dataset.

Full size table

In our training process, the dataset was randomly divided into 9:1, with training set (90%), validation set (10%), to train our model and tune some parameters.

In this task, $ F1_{macro} $ is used as the evaluation metrics.

$$ P_{macro} = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {\frac{{\# \;{\text{of queries correctly predicted as intent}}\;c_{i} }}{{\# \;{\text{of queries predicted as intent}}\;c_{i} }}} $$

$$ R_{macro} = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {\frac{{\# \;{\text{of queries correctly predicted as intent}}\;c_{i} }}{{\# \;{\text{of queries labelled as intent}}\;c_{i} }}} $$

$$ F1_{macro} = \frac{2}{{\frac{1}{{P_{macro} }} + \frac{1}{{R_{macro} }}}} $$

5.2 Experimental Results

We deal with all the three tasks, intent identification, slot filling and spelling correction, and we will report the results in this subsection. Moreover, other approaches, such as SVM, LSTM, etc., are also implemented for the comparison with SI-LSTM on the provided dataset.

Intent Identification.

SI-LSTM was implemented based on the open source library Keras in Python which was backended by Tensorflow. In SI-LSTM, we trained our word embedding by using word2vec based on the whole dataset, and set the dimensionality as 50. 50 epochs were run in total, and the parameters were updated after each batch. We finally set the parameter with the value when the model achieved highest accuracy.

To better demonstrate the performance of our model, we also redesign some classic models for intent identification, including SVM [17], FastText [18], and LSTM. SVM is a classic supervised machine learning algorithm and LSTM is a representative deep learning algorithm. FastText is a classifier developed by Facebook that provides a simple and efficient way to represent textual information. These three models are widely used in NLP research and can provide basic support. Besides, since the slot information works as an important factor, we also integrate it into the three traditional models, and get S-SVM, S-FastText, S-LSTM.

The performance of each model on test set was shown in Table 4.

Table 4. The comparison between different models.

Full size table

From Table 4, we can find that our proposed model SI-LSTM achieved the best run in both metrics, which proved the effectiveness of our model. Although SVM is non-deep learning model, the accuracy was 90.43% and the F1_macro was 81.97%, that were comparable with other deep learning models. To the contrary, FastText performed poor in both accuracy and F1_macro. We can also find that compared with basic models, the F1_macro. The accuracy and F1_macro were both improved when putting the slot information into the model. In Table 4.

What’s more, the combination of CNN and LSTM accelerates the training process greatly. Also, if we eliminate some minimum categories which contain only a few number of sentences, we are able to see a more surprising output from SI-LSTM.

In our experiment, we also list the accuracy and recall of each type of intent as shown in Table 5.

Table 5. The results of each type of intent.

Full size table

Spelling Correction.

Since the typos only appear in the slots, the spelling correction is based on the result of slot filling. Besides, we will focus on the errors that are related to music and navigation, and other fields, e.g. phone calls, are not considered. The experimental results are shown in Tables 6 and 7.

Table 6. The confusion matrix of spelling correction on both training set and test set.

Full size table

Table 7. The result of spelling correction.

Full size table

Intent Identification and Slot Filling Results.

We evaluated the performance on both intent identification and slot filling (with spelling correction) on the SVM, FastText, basic LSTM and SI-LSTM, and the results are shown in Table 8.

Table 8. The comparison between different models on intent identification with slot filling.

Full size table

It is obviously that the accuracy of model for the joint prediction on slot filling and intent recognition achieved the best performance.

5.3 Discussion

Based on the above experimental results, we summarized some characteristics of our approach and made some error analysis.

SI-LSTM achieved a high accuracy in intent identification, but performed not as good in the metric of F1_macro. In Table 5, we can see the recall of our model is very low under ‘phone_call.cancel’ intent which in turn imposes negative effect on F1_macro, although the size of that type is quite small, i.e. 18. In fact, many contents under this type ‘phone_call.cancel’ only express the instruction of cancel or stop, but few mention the specific objection of the instruction. So the classification of this specific intent needs to account for the previous content and it is difficult for the model to distinguish between ‘music.pause’, ‘navigation.cancel_navigation’ and ‘phone_call.cancel’. Regarding that there are only 22 data in training set and 18 data in test set under the intent ‘phone_call.cancel’, the volume of data does greatly affect the final result of the model.

Besides, in the task of spelling correction, multiple languages occur in a session. Due to limited resources, the result of typo detection is susceptible by the volume and quality of thesauruses. Therefore, for the slots beyond the thesaurus, it is difficult for our model to distinguish the intent accurately.

6 Conclusions

In this paper, we propose a neural framework, named SI-LSTM, for intent identification and slot filling. SI-LSTM combines two tasks and integrates CRF into a LSTM network, where the slot information is extracted by using CRF, while the intent will be identified by using LSTM. In our approach, the slot information is used for determining the intent, and the intent type is used for slot filling. Based on the dataset provided by NLPCC 2018, SI-LSTM achieved 90.71% on intent identification, slot filling and error correction in terms of accuracy.

References

De, A., Kopparapu, S.K.: A Rule-Based Short Query Intent Identification System. Submitted on 25 Mar 2015
Google Scholar
Song, X., Zheng, Y., Cao, H.: Research on driver’s lane change intention recognition based on HMM and SVM. J. Electron. Meas. Instrum. 30(1), 58–65 (2016)
Google Scholar
Worachartcheewan, A., Nantasenamat, C.: Identification of metabolic syndrome using decision tree analysis. Diabetes Res. Clin. Prac. 90(1), e15–e18 (2010)
Article Google Scholar
Turra, G., Arrigoni, S., Signoroni, A.: CNN-based identification of hyperspectral bacterial signatures for digital microbiology. In: Battiato, S., Gallo, G., Schettini, R., Stanco, F. (eds.) ICIAP 2017. LNCS, vol. 10485, pp. 500–510. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68548-9_46
Chapter Google Scholar
Ren, J., et al.: Look, Listen and Learn - A Multimodal LSTM for Speaker Identification. Accessed 13 Feb 2016
Google Scholar
Rezaei, B.A., Roychowdhury, V., Ghate, S., Khajehnouri, N., Boscolo, R., Mracek, J.: Concept-level user intent profile extraction and applications. Accessed 03 June 2014
Google Scholar
Rabiner, L.R., Juang, B.H.: An introduction to HMMs. IEEE ASSP Mag. 3(1), 4–16 (1986)
Article Google Scholar
Wang, Y.-Y., Acero, A.: Combination of CFG and n-gram modeling in semantic grammar learning. In: European Conference on Speech Communication & Technology, pp. 2809–2812 (2003)
Google Scholar
SUN, X., WANG, H.: Intent determination and slot filling in question answering. J. Chin. Inf. Process. 31(6), 132–139 (2017)
Google Scholar
Pollack, J.B.F.: Recursive distributed representations. Artif. Intell. 46(1–2), 77–105 (1990)
Article Google Scholar
Lafferty, J., McCallum, A., Pereira, F.C.: Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings 18th International Conference on Machine Learning, pp. 282–289. Morgan Kaufmann (2001)
Google Scholar
Collobert, R., Weston, J.: A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th International Conference on Machine Learning. ICML 2008, pp.160–167. ACM, New York, 01 January 2008
Google Scholar
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Dokl. 10(8), 707–710 (1966)
MathSciNet Google Scholar
Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF Models for Sequence Tagging. Computer Science (2015)
Google Scholar
Mesnil, G., Dauphin, Y., Yao, K.: Using recurrent neural networks for slot filling in spoken language understanding. IEEE/ACM Trans. Audio Speech Lang. Process. 23(3), 530–539 (2015)
Article Google Scholar
Yao, K., Peng, B., Zweig, G., Yu, D., Li, X., Gao, F.: Recurrent conditional random fields for language understanding. In: ICASSP (2014)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)
MATH Google Scholar
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of Tricks for Efficient Text Classification. arXiv preprint arXiv:1607.01759 (2016)
Lecun, Y., Boser, B., Denker, J.S., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)
Article Google Scholar
Graves, A., Mohamed, A., Hinton, G.: Speech Recognition with Deep Recurrent Neural Networks. arxiv (2013)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735. PMID 9377276
Article Google Scholar
Zhao, X., Cao, Y.: Overview of the NLPCC 2018 shared task: spoken language understanding in task-oriented dialog systems. In: Zhang, M., Ng, V., Zhao, D., Li, S., Zan, H. (eds.) NLPCC 2018. LNCS (LNAI), vol. 11109, pp. 468–478. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-99501-4_46
Chapter Google Scholar

Download references

Acknowledgement

This work is partially funded by the National Natural Science Foundation of China (61602326, U1636103, 61672361, and U1536207) and the Fundamental Research Fund for the Central Universities (3262019T29 and 3262019T54).

Author information

Authors and Affiliations

School of Information Science and Technology, University of International Relations, Beijing, China
Jiawei Shan, Huayun Xu, Zeyang Gong, Hanchen Su & Binyang Li
College of Information Engineering, Capital Normal University, Beijing, China
Xu Han

Authors

Jiawei Shan
View author publications
You can also search for this author in PubMed Google Scholar
Huayun Xu
View author publications
You can also search for this author in PubMed Google Scholar
Zeyang Gong
View author publications
You can also search for this author in PubMed Google Scholar
Hanchen Su
View author publications
You can also search for this author in PubMed Google Scholar
Xu Han
View author publications
You can also search for this author in PubMed Google Scholar
Binyang Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Binyang Li .

Editor information

Editors and Affiliations

Harbin Institute of Technology Shenzhen, Shenzhen, China
Ruifeng Xu
Ping An Technology (Shenzhen) Co., Ltd., Shenzhen, China
Jianzong Wang
Kingdee International Software Group Co., Ltd., Shenzhen, China
Liang-Jie Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Shan, J., Xu, H., Gong, Z., Su, H., Han, X., Li, B. (2019). A Neural Framework for Joint Prediction on Intent Identification and Slot Filling. In: Xu, R., Wang, J., Zhang, LJ. (eds) Cognitive Computing – ICCC 2019. ICCC 2019. Lecture Notes in Computer Science(), vol 11518. Springer, Cham. https://doi.org/10.1007/978-3-030-23407-2_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-23407-2_2
Published: 19 June 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-23406-5
Online ISBN: 978-3-030-23407-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Neural Framework for Joint Prediction on Intent Identification and Slot Filling

Abstract

Similar content being viewed by others

Attention-Based CNN-BLSTM Networks for Joint Intent Detection and Slot Filling

Jointly Modeling Intent Identification and Slot Filling with Contextual and Hierarchical Information

D-GHNAS for Joint Intent Classification and Slot Filling

Keywords

1 Introduction

2 Related Works

2.1 Intent Identification

2.2 Slot Filling

3 Joint Prediction on Intent Identification and Slot Filling

3.1 Conditional Random Field Layer

3.2 CNN Layer

3.3 Long Short-Term Memory Layer

4 Spelling Correction

4.1 Preprocessing

4.2 Congruent Length in String Order

4.3 Edit Distance

4.4 Spelling Correction

5 Experiment

5.1 Experiment Setup

5.2 Experimental Results

Intent Identification.

Spelling Correction.

Intent Identification and Slot Filling Results.

5.3 Discussion

6 Conclusions

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation