LSTM Based Paraphrase Identification Using Combined Word Embedding Features

Aravinda Reddy, D.; Anand Kumar, M.; Soman, K. P.

doi:10.1007/978-981-13-3393-4_40

LSTM Based Paraphrase Identification Using Combined Word Embedding Features

D. Aravinda Reddy¹⁸,
M. Anand Kumar¹⁸ &
K. P. Soman¹⁸

Conference paper
First Online: 14 February 2019

774 Accesses
14 Citations

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 898))

Abstract

Paraphrase identification is the process of analyzing two text entities (sentences) and determining whether the two entities represent the similar sense or not. This is a task of Natural Language Processing (NLP) in which we need to identify the sentences whether it is a paraphrase or not. Here, the chosen approach for this task is a deep Learning model that is Recurrent Neural Network-LSTM with word embedding features. Word embedding is an approach, from where we can extract the semantics of the word in dense vector representation. The word embedding models that are used for the feature extraction in Telugu are Word2Vec, Glove and Fasttext. These extracted feature models are added in the embedding layer of Long Short-Term Memory algorithm in order to classify the Telugu sentence pairs whether they are Paraphrase or not. The corpus for Telugu is generated manually from various Telugu newspapers. The sentences for word embedding model is also gathered from Telugu newspapers. This is the first attempt for paraphrase identification in Telugu using deep learning approach.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Brockett, C., Dolan, W.B.: Support vector machines for paraphrase identification and corpus construction. In: Proceedings of the 3rd International Workshop on Paraphrasing (IWP2005), pp. 1–8 (2005)
Google Scholar
Socher, R., Huang, E.H., Pennin, J., Manning, C.D., Ng, A.Y.: Dynamic pooling and unfolding recursive autoencoders for para-phrase detection. In: Advances in Neural Information Processing Systems, pp. 801–809 (2011)
Google Scholar
He, H., Gimpel, K., Lin, J.: Multi-perspective sentence similarity modelling with CNN. In: International Conference on Emperical Methods in NLP, pp. 1576–1586 (2015)
Google Scholar
Praveena. R., Anand Kumar, M., Soman, K.P.: Chunking based Malayalam paraphrase identification using unfolding recursive autoencoders, pp. 922–928. https://doi.org/10.1109/ICACCI.2017.8125959
Mahalaksmi, S., Anand Kumar, M., Soman, K.P.: Paraphrase detection for Tamil language using deep learning algorithms. Int. J. Appl. Eng. Res. 10(17), 13929–13934 (2015)
Google Scholar
Abraham, S.S., Idicula, S.M.: Comparison of statistical and semantic similarity techniques for paraphrase identification, pp. 209–213. IEEE (2012)
Google Scholar
He, H., Gimpel, K., Lin, J.: Emperical Methods in NLP, pp. 1576–1586. (2015)
Google Scholar
Chitra, A., Rajkumar, A.: Paraphrase extraction using fuzzy hierarchical clustering. Appl. Soft Comput. 34, 426–437 (2015)
Article Google Scholar
Fernando, S., Stevenson, M.: A semantic similarity approach to paraphrase detection. In: Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics, pp. 45–52 (2008)
Google Scholar
Mahalaksmi, S., Anand Kumar, M., Soman, K.P.: Paraphrase detection for Tamil language using deep learning algorithms. Int. J. Appl. Eng. Res. 10(17), 13929–13934 (2015)
Google Scholar
Chitra, A., Rajkumar, A.: Paraphrase extraction using fuzzy hierarchical clustering. Appl. Soft Comput. 34, 426–437 (2015)
Article Google Scholar
Aravinda Reddy, D., Anand Kumar, M., Soman, K.P.: Paraphrase identification in Telugu using machine learning. In: Advances in Intelligent Systems and Computing. Springer (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Amrita School of Engineering, Center for Computational Engineering and Networking (CEN), Coimbatore Amrita Vishwa Vidyapeetham, Coimbatore, TN, India
D. Aravinda Reddy, M. Anand Kumar & K. P. Soman

Authors

D. Aravinda Reddy
View author publications
You can also search for this author in PubMed Google Scholar
M. Anand Kumar
View author publications
You can also search for this author in PubMed Google Scholar
K. P. Soman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. Aravinda Reddy .

Editor information

Editors and Affiliations

Department of Computer Science and Software Engineering, Monmouth University, West Long Branch, NJ, USA
Jiacun Wang
Department of Information Technology, National Institute of Technology Karnataka, Surathkal, Mangaluru, Karnataka, India
G. Ram Mohana Reddy
Department of Computer Science and Engineering, JNTUH College of Engineering Hyderabad, Hyderabad, Telangana, India
V. Kamakshi Prasad
Department of Electronics and Communication Engineering, Malla Reddy College of Engineering & Technology, Secunderabad, Telangana, India
V. Sivakumar Reddy

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aravinda Reddy, D., Anand Kumar, M., Soman, K.P. (2019). LSTM Based Paraphrase Identification Using Combined Word Embedding Features. In: Wang, J., Reddy, G., Prasad, V., Reddy, V. (eds) Soft Computing and Signal Processing . Advances in Intelligent Systems and Computing, vol 898. Springer, Singapore. https://doi.org/10.1007/978-981-13-3393-4_40

Download citation

DOI: https://doi.org/10.1007/978-981-13-3393-4_40
Published: 14 February 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-3392-7
Online ISBN: 978-981-13-3393-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics