Paraphrase Identification in Telugu Using Machine Learning

Aravinda Reddy, D.; Anand Kumar, M.; Soman, K. P.

doi:10.1007/978-981-13-1882-5_43

D. Aravinda Reddy¹⁷,
M. Anand Kumar¹⁷ &
K. P. Soman¹⁷

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 750))

879 Accesses
3 Citations

Abstract

Paraphrase identification is the task of determining whether two sentences convey similar meaning or not. Here, we have chosen count-based text representation methods, such as term-document matrix and term frequency-inverse document frequency matrix, along with the distributional representation methods of singular value decomposition and non-negative matrix factorization, which is iteratively used with different word share and minimum document frequency values. With the help of the above methods, the system will be able to learn features from the representations. These learned features are then used for measuring phrase-wise similarity between two sentences. The features are given to various machine learning classification algorithms and cross-validation accuracy is obtained. The corpus for this task has been created manually from different news domains. Due to the limitation of unavailability of the parser, only a set of collected data in the corpus has been used for this task. This is a first attempt in the task of paraphrase identification in Telugu language using this approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Dolan, B, Quirk, C., Brockett, C: Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In: Proceedings of the 20th international conference on Computational Linguistics, p. 350. Association for Computational Linguistics (2004)
Google Scholar
Fernando, S., Stevenson, M.: A semantic similarity approach to paraphrase detection. In: Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics, pp. 45–52 (2008)
Google Scholar
Finch, A., Hwang, Y.-S., Sumita, E.: Using machine translation evaluation techniques to determine sentence-level semantic equivalence. In: Proceedings of the Third International Workshop on Paraphrasing (IWP2005), pp. 17–24 (2005)
Google Scholar
Praveena, R., Anand Kumar, M., Soman, K.P.: Chunking based malayalam paraphrase identification using unfolding recursive autoencoders, 922–928. https://doi.org/10.1109/ICACCI.2017.8125959
Mahalaksmi, S., Anand Kumar, M., Soman, K.P.: Paraphrase Detection for Tamil language using Deep learning algorithms. Int. J. Appl. Eng. Res. 10(17), 13929–13934 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Computational Engineering and Networking (CEN), Amrita Vishwa Vidyapeetham, Amrita School of Engineering, Coimbatore, India
D. Aravinda Reddy, M. Anand Kumar & K. P. Soman

Authors

D. Aravinda Reddy
View author publications
You can also search for this author in PubMed Google Scholar
M. Anand Kumar
View author publications
You can also search for this author in PubMed Google Scholar
K. P. Soman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to D. Aravinda Reddy .

Editor information

Editors and Affiliations

Department of Computer Sciences Technology, Karunya Institute of Technology & Sciences, Coimbatore, Tamil Nadu, India
J. Dinesh Peter
Department of Civil and Environmental Engineering, University of Missouri, Columbia, MO, USA
Amir H. Alavi
School of Computing, Engineering and Mathematics, University of Western Sydney, Sydney, NSW, Australia
Bahman Javadi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Aravinda Reddy, D., Anand Kumar, M., Soman, K.P. (2019). Paraphrase Identification in Telugu Using Machine Learning. In: Peter, J., Alavi, A., Javadi, B. (eds) Advances in Big Data and Cloud Computing. Advances in Intelligent Systems and Computing, vol 750. Springer, Singapore. https://doi.org/10.1007/978-981-13-1882-5_43

Download citation

DOI: https://doi.org/10.1007/978-981-13-1882-5_43
Published: 12 December 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1881-8
Online ISBN: 978-981-13-1882-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics