Abstract
Paraphrase identification is the process of analyzing two text entities (sentences) and determining whether the two entities represent the similar sense or not. This is a task of Natural Language Processing (NLP) in which we need to identify the sentences whether it is a paraphrase or not. Here, the chosen approach for this task is a deep Learning model that is Recurrent Neural Network-LSTM with word embedding features. Word embedding is an approach, from where we can extract the semantics of the word in dense vector representation. The word embedding models that are used for the feature extraction in Telugu are Word2Vec, Glove and Fasttext. These extracted feature models are added in the embedding layer of Long Short-Term Memory algorithm in order to classify the Telugu sentence pairs whether they are Paraphrase or not. The corpus for Telugu is generated manually from various Telugu newspapers. The sentences for word embedding model is also gathered from Telugu newspapers. This is the first attempt for paraphrase identification in Telugu using deep learning approach.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Brockett, C., Dolan, W.B.: Support vector machines for paraphrase identification and corpus construction. In: Proceedings of the 3rd International Workshop on Paraphrasing (IWP2005), pp. 1–8 (2005)
Socher, R., Huang, E.H., Pennin, J., Manning, C.D., Ng, A.Y.: Dynamic pooling and unfolding recursive autoencoders for para-phrase detection. In: Advances in Neural Information Processing Systems, pp. 801–809 (2011)
He, H., Gimpel, K., Lin, J.: Multi-perspective sentence similarity modelling with CNN. In: International Conference on Emperical Methods in NLP, pp. 1576–1586 (2015)
Praveena. R., Anand Kumar, M., Soman, K.P.: Chunking based Malayalam paraphrase identification using unfolding recursive autoencoders, pp. 922–928. https://doi.org/10.1109/ICACCI.2017.8125959
Mahalaksmi, S., Anand Kumar, M., Soman, K.P.: Paraphrase detection for Tamil language using deep learning algorithms. Int. J. Appl. Eng. Res. 10(17), 13929–13934 (2015)
Abraham, S.S., Idicula, S.M.: Comparison of statistical and semantic similarity techniques for paraphrase identification, pp. 209–213. IEEE (2012)
He, H., Gimpel, K., Lin, J.: Emperical Methods in NLP, pp. 1576–1586. (2015)
Chitra, A., Rajkumar, A.: Paraphrase extraction using fuzzy hierarchical clustering. Appl. Soft Comput. 34, 426–437 (2015)
Fernando, S., Stevenson, M.: A semantic similarity approach to paraphrase detection. In: Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics, pp. 45–52 (2008)
Mahalaksmi, S., Anand Kumar, M., Soman, K.P.: Paraphrase detection for Tamil language using deep learning algorithms. Int. J. Appl. Eng. Res. 10(17), 13929–13934 (2015)
Chitra, A., Rajkumar, A.: Paraphrase extraction using fuzzy hierarchical clustering. Appl. Soft Comput. 34, 426–437 (2015)
Aravinda Reddy, D., Anand Kumar, M., Soman, K.P.: Paraphrase identification in Telugu using machine learning. In: Advances in Intelligent Systems and Computing. Springer (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Aravinda Reddy, D., Anand Kumar, M., Soman, K.P. (2019). LSTM Based Paraphrase Identification Using Combined Word Embedding Features. In: Wang, J., Reddy, G., Prasad, V., Reddy, V. (eds) Soft Computing and Signal Processing . Advances in Intelligent Systems and Computing, vol 898. Springer, Singapore. https://doi.org/10.1007/978-981-13-3393-4_40
Download citation
DOI: https://doi.org/10.1007/978-981-13-3393-4_40
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-3392-7
Online ISBN: 978-981-13-3393-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)