Abstract
Part of speech (POS) tagging determines the attributes of each word, and it is the fundamental work in machine translation, speech recognition, information retrieval and other fields. For Tibetan part-of-speech (TPOS) tagging, a tagging method is proposed based on bidirectional long short-term memory with conditional random field model (BiLSTM_CRF). Firstly, the designed TOPS tagging set and manual tagging corpus were used to get word vectors by embedding Tibetan words and corresponding TPOS tags in continuous bag-of-words (CBOW) model. Secondly, the word vectors were input into the BiLSTM_CRF model. To obtain the predictive score matrix, this model using the past input features and future input feature information respectively learned by forward long short-term memory (LSTM) and backward LSTM performs non-linear operations on the softmax layer. The prediction score matrix was input into the CRF model to judge the threshold value and calculate the sequence score error. Lastly, a Tibetan part of speech tagging model was got based on the BiLSTM_CRF model. The experimental results indicate that the accuracy of TPOS tagging model based on the BiLSTM_CRF model can reach 92.7%.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Di, J.: On syntactic chunks and formal markers of Tibetan. In: National Joint Conference on Computational Linguistics (2003)
Rang-jia, C., Tai-jia, J.: On the code classification of parts of speech in Tibetan corpora. In: Academic Conference on the 25th Anniversary of the Chinese Information Society of China (2006)
Jun-fen, S., Kong-yu, Q., Tai, B.: Research on automatic part-of-speech tagging in Tibetan corpus based on HMM. J. Northwest Natl. (Nat. Sci.) 30(1), 42–45 (2009)
Duo-Xi, Z.J., Cai-Rang, A.J.: Research and implementation of Tibetan part of speech tagging based on HMM. Comput. CD Softw. Its Appl. 12, 100–101 (2012)
Hong-Zhi, Y., Ya-Chao, L., Kun, W.: Fusion of syllable features for Tibetan part of speech based on maximum entropy model. J. Chin. Inf. Process. 27(5), 160–166 (2013)
Que-Cai-Rang, H., Qun, L., Hai-Xing, Z.: Discriminative Tibetan part-of-speech tagging with perceptron model. J. Chin. Inf. Process. 28(2), 56–60 (2014)
Cai-Jun, K.: Research on Tibetan word segmentation and part of speech tagging (2014)
Cong-Jun, L., Hui-Dan, L., Jian, W.: Research on tagging of Tibetan syllables. J. Chin. Inf. Process. 29(5), 211–216 (2015)
Ya-Chao, L., Jing, J., Yang-Ji, J., et al.: TIP-LAS: an open source toolkit for Tibetan word segmentation and pos tagging. J. Chin. Inf. Process. 29(6), 203–207 (2015)
Ren, Y., Teng, C., Li, F., et al.: Relation classification via sequence features and bi-directional LSTMs. Wuhan Univ. J. Nat. Sci. 22(6), 489–497 (2017)
Mikolov, T., Chen, K., Corrado, G., et al.: Efficient estimation of word representations in vector space. Comput. Sci., 1–12 (2013)
Soutner, D., Müller, L.: Application of LSTM neural networks in language modelling. In: Habernal, I., Matoušek, V. (eds.) TSD 2013. LNCS (LNAI), vol. 8082, pp. 105–112. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40585-3_14
Krishnapriya, V., Sreesha, P., Harithalakshmi, T.R., et al.: Design of a POS tagger using conditional random fields for Malayalam. In: First International Conference on Computational Systems and Communications (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Wang, L., Chen, Z., Yang, H. (2019). TPOS Tagging Method Based on BiLSTM_CRF Model. In: Cheng, X., Jing, W., Song, X., Lu, Z. (eds) Data Science. ICPCSEE 2019. Communications in Computer and Information Science, vol 1058. Springer, Singapore. https://doi.org/10.1007/978-981-15-0118-0_38
Download citation
DOI: https://doi.org/10.1007/978-981-15-0118-0_38
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0117-3
Online ISBN: 978-981-15-0118-0
eBook Packages: Computer ScienceComputer Science (R0)