Abstract
Paraphrase identification is the task of determining whether two sentences convey similar meaning or not. Here, we have chosen count-based text representation methods, such as term-document matrix and term frequency-inverse document frequency matrix, along with the distributional representation methods of singular value decomposition and non-negative matrix factorization, which is iteratively used with different word share and minimum document frequency values. With the help of the above methods, the system will be able to learn features from the representations. These learned features are then used for measuring phrase-wise similarity between two sentences. The features are given to various machine learning classification algorithms and cross-validation accuracy is obtained. The corpus for this task has been created manually from different news domains. Due to the limitation of unavailability of the parser, only a set of collected data in the corpus has been used for this task. This is a first attempt in the task of paraphrase identification in Telugu language using this approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dolan, B, Quirk, C., Brockett, C: Unsupervised construction of large paraphrase corpora: Exploiting massively parallel news sources. In: Proceedings of the 20th international conference on Computational Linguistics, p. 350. Association for Computational Linguistics (2004)
Fernando, S., Stevenson, M.: A semantic similarity approach to paraphrase detection. In: Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics, pp. 45–52 (2008)
Finch, A., Hwang, Y.-S., Sumita, E.: Using machine translation evaluation techniques to determine sentence-level semantic equivalence. In: Proceedings of the Third International Workshop on Paraphrasing (IWP2005), pp. 17–24 (2005)
Praveena, R., Anand Kumar, M., Soman, K.P.: Chunking based malayalam paraphrase identification using unfolding recursive autoencoders, 922–928. https://doi.org/10.1109/ICACCI.2017.8125959
Mahalaksmi, S., Anand Kumar, M., Soman, K.P.: Paraphrase Detection for Tamil language using Deep learning algorithms. Int. J. Appl. Eng. Res. 10(17), 13929–13934 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Aravinda Reddy, D., Anand Kumar, M., Soman, K.P. (2019). Paraphrase Identification in Telugu Using Machine Learning. In: Peter, J., Alavi, A., Javadi, B. (eds) Advances in Big Data and Cloud Computing. Advances in Intelligent Systems and Computing, vol 750. Springer, Singapore. https://doi.org/10.1007/978-981-13-1882-5_43
Download citation
DOI: https://doi.org/10.1007/978-981-13-1882-5_43
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-1881-8
Online ISBN: 978-981-13-1882-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)