Skip to main content

Paraphrase Identification in Telugu Using Machine Learning

  • Conference paper
  • First Online:

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 750))

Abstract

Paraphrase identification is the task of determining whether two sentences convey similar meaning or not. Here, we have chosen count-based text representation methods, such as term-document matrix and term frequency-inverse document frequency matrix, along with the distributional representation methods of singular value decomposition and non-negative matrix factorization, which is iteratively used with different word share and minimum document frequency values. With the help of the above methods, the system will be able to learn features from the representations. These learned features are then used for measuring phrase-wise similarity between two sentences. The features are given to various machine learning classification algorithms and cross-validation accuracy is obtained. The corpus for this task has been created manually from different news domains. Due to the limitation of unavailability of the parser, only a set of collected data in the corpus has been used for this task. This is a first attempt in the task of paraphrase identification in Telugu language using this approach.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Dolan, B, Quirk, C., Brockett, C: Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In: Proceedings of the 20th international conference on Computational Linguistics, p. 350. Association for Computational Linguistics (2004)

    Google Scholar 

  2. Fernando, S., Stevenson, M.: A semantic similarity approach to paraphrase detection. In: Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics, pp. 45–52 (2008)

    Google Scholar 

  3. Finch, A., Hwang, Y.-S., Sumita, E.: Using machine translation evaluation techniques to determine sentence-level semantic equivalence. In: Proceedings of the Third International Workshop on Paraphrasing (IWP2005), pp. 17–24 (2005)

    Google Scholar 

  4. Praveena, R., Anand Kumar, M., Soman, K.P.: Chunking based malayalam paraphrase identification using unfolding recursive autoencoders, 922–928. https://doi.org/10.1109/ICACCI.2017.8125959

  5. Mahalaksmi, S., Anand Kumar, M., Soman, K.P.: Paraphrase Detection for Tamil language using Deep learning algorithms. Int. J. Appl. Eng. Res. 10(17), 13929–13934 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Aravinda Reddy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Aravinda Reddy, D., Anand Kumar, M., Soman, K.P. (2019). Paraphrase Identification in Telugu Using Machine Learning. In: Peter, J., Alavi, A., Javadi, B. (eds) Advances in Big Data and Cloud Computing. Advances in Intelligent Systems and Computing, vol 750. Springer, Singapore. https://doi.org/10.1007/978-981-13-1882-5_43

Download citation

Publish with us

Policies and ethics