Skip to main content

LSTM Based Paraphrase Identification Using Combined Word Embedding Features

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 898))

Abstract

Paraphrase identification is the process of analyzing two text entities (sentences) and determining whether the two entities represent the similar sense or not. This is a task of Natural Language Processing (NLP) in which we need to identify the sentences whether it is a paraphrase or not. Here, the chosen approach for this task is a deep Learning model that is Recurrent Neural Network-LSTM with word embedding features. Word embedding is an approach, from where we can extract the semantics of the word in dense vector representation. The word embedding models that are used for the feature extraction in Telugu are Word2Vec, Glove and Fasttext. These extracted feature models are added in the embedding layer of Long Short-Term Memory algorithm in order to classify the Telugu sentence pairs whether they are Paraphrase or not. The corpus for Telugu is generated manually from various Telugu newspapers. The sentences for word embedding model is also gathered from Telugu newspapers. This is the first attempt for paraphrase identification in Telugu using deep learning approach.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Brockett, C., Dolan, W.B.: Support vector machines for paraphrase identification and corpus construction. In: Proceedings of the 3rd International Workshop on Paraphrasing (IWP2005), pp. 1–8 (2005)

    Google Scholar 

  2. Socher, R., Huang, E.H., Pennin, J., Manning, C.D., Ng, A.Y.: Dynamic pooling and unfolding recursive autoencoders for para-phrase detection. In: Advances in Neural Information Processing Systems, pp. 801–809 (2011)

    Google Scholar 

  3. He, H., Gimpel, K., Lin, J.: Multi-perspective sentence similarity modelling with CNN. In: International Conference on Emperical Methods in NLP, pp. 1576–1586 (2015)

    Google Scholar 

  4. Praveena. R., Anand Kumar, M., Soman, K.P.: Chunking based Malayalam paraphrase identification using unfolding recursive autoencoders, pp. 922–928. https://doi.org/10.1109/ICACCI.2017.8125959

  5. Mahalaksmi, S., Anand Kumar, M., Soman, K.P.: Paraphrase detection for Tamil language using deep learning algorithms. Int. J. Appl. Eng. Res. 10(17), 13929–13934 (2015)

    Google Scholar 

  6. Abraham, S.S., Idicula, S.M.: Comparison of statistical and semantic similarity techniques for paraphrase identification, pp. 209–213. IEEE (2012)

    Google Scholar 

  7. He, H., Gimpel, K., Lin, J.: Emperical Methods in NLP, pp. 1576–1586. (2015)

    Google Scholar 

  8. Chitra, A., Rajkumar, A.: Paraphrase extraction using fuzzy hierarchical clustering. Appl. Soft Comput. 34, 426–437 (2015)

    Article  Google Scholar 

  9. Fernando, S., Stevenson, M.: A semantic similarity approach to paraphrase detection. In: Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics, pp. 45–52 (2008)

    Google Scholar 

  10. Mahalaksmi, S., Anand Kumar, M., Soman, K.P.: Paraphrase detection for Tamil language using deep learning algorithms. Int. J. Appl. Eng. Res. 10(17), 13929–13934 (2015)

    Google Scholar 

  11. Chitra, A., Rajkumar, A.: Paraphrase extraction using fuzzy hierarchical clustering. Appl. Soft Comput. 34, 426–437 (2015)

    Article  Google Scholar 

  12. Aravinda Reddy, D., Anand Kumar, M., Soman, K.P.: Paraphrase identification in Telugu using machine learning. In: Advances in Intelligent Systems and Computing. Springer (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Aravinda Reddy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aravinda Reddy, D., Anand Kumar, M., Soman, K.P. (2019). LSTM Based Paraphrase Identification Using Combined Word Embedding Features. In: Wang, J., Reddy, G., Prasad, V., Reddy, V. (eds) Soft Computing and Signal Processing . Advances in Intelligent Systems and Computing, vol 898. Springer, Singapore. https://doi.org/10.1007/978-981-13-3393-4_40

Download citation

Publish with us

Policies and ethics