Representing Contexual Relations with Sanskrit Word Embeddings

Sharma, Ishank; Anand, Shrey; Goyal, Rinkaj; Misra, Sanjay

doi:10.1007/978-3-319-62407-5_18

Representing Contexual Relations with Sanskrit Word Embeddings

Ishank Sharma²³,
Shrey Anand²³,
Rinkaj Goyal²³ &
…
Sanjay Misra²⁴

Conference paper
First Online: 15 July 2017

2449 Accesses
3 Citations
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10409))

Abstract

Language processing of Sanskrit presents various challenges in the field of computational linguistics. Prosodical, orthographic and inflectional complexities encountered in Sanskrit texts makes it difficult to apply linguistic analysis methods relevant for western European languages. The inadequacy of contemporary computational approaches in the analysis of Sanskrit language is vivdly apparent. In this exposition, we focus on the challenge of learning syntactic and semantic similarities in a rich Sanskrit literature. We present a simple yet effective approach of representing Sanskrit words in a continuous vector space. We utilise word embeddings in similarity, compositionality and visualization tasks to test its efficacy. Experiments show that our method produces interpretable vector offsets exhibiting shared relationships.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
A word represented as dense vector.
2.
https://code.google.com/archive/p/word2vec/.
3.
Period symbol in Sanskrit.
4.
Sandhi splitting.
5.
Sanskrit word for synonym. For example- (ArohaNa) can refer to words such as mount, climb, ride, depending upon the context.
6.
https://radimrehurek.com/gensim/models/word2vec.html.
7.
Although, we also applied Skip Gram model with same settings, the resultant word vectors were not of adequate quality.
8.
https://github.com/lvdmaaten/bhtsne.
9.
From here on we will use TOPn to refer n closest similar words.

References

Begum, R., Husain, S., Dhwaj, A., Sharma, D.M., Bai, L., Sangal, R.: Dependency annotation scheme for Indian languages. In: IJCNLP, pp. 721–726 (2008)
Google Scholar
Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)
Google Scholar
Bharati, A., Chaitanya, V., Sangal, R., Ramakrishnamacharyulu, K.: Natural Language Processing: A Paninian Perspective. Prentice Hall of India Pvt. Ltd., New Delhi (1995)
Google Scholar
Chowdhury, G.G.: Natural language processing. Ann. Rev. Inf. Sci. Technol. 37(1), 51–89 (2003)
Article MathSciNet Google Scholar
Donoho, D.L., et al.: High-dimensional data analysis: the curses and blessings of dimensionality. AMS Math Challenges Lect. 1, 32 (2000)
Google Scholar
Elman, J.L.: Finding structure in time. Cogn. Sci. 14(2), 179–211 (1990)
Article Google Scholar
Goldberg, Y., Levy, O.: word2vec explained: deriving Mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722 (2014)
Goyal, P., Huet, G.P., Kulkarni, A.P., Scharf, P.M., Bunker, R.: A distributed platform for Sanskrit processing. In: COLING, pp. 1011–1028 (2012)
Google Scholar
Hellwig, O.: Detecting sentence boundaries in Sanskrit texts. In: Proceedings of COLING (2016)
Google Scholar
Hellwig, O.: Improving the morphological analysis of classical Sanskrit. WSSANLP 2016, 142 (2016)
Google Scholar
Huet, G.: Towards computational processing of sanskrit. In: International Conference on Natural Language Processing (ICON). Citeseer (2003)
Google Scholar
Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8), 651–666 (2010)
Article Google Scholar
Kak, S.C.: The paninian approach to natural language processing. Int. J. Approx. Reason. 1(1), 117–130 (1987)
Article Google Scholar
Kashyap, L., Joshi, S.R., Bhattacharyya, P.: Insights on Hindi Wordnet coming from the IndoWordNet. In: Dash, N.S. et al. (eds.) The WordNet in Indian Languages, pp. 19–44. Springer, Heidelberg (2017)
Google Scholar
Kerschen, G., Golinval, J.C.: Feature extraction using auto-associative neural networks. Smart Mater. Struct. 13(1), 211 (2003)
Article Google Scholar
Krishna, A., Santra, B., Satuluri, P., Bandaru, S.P., Faldu, B., Singh, Y., Goyal, P.: Word segmentation in Sanskrit using path constrained random walks. In: Proceedings of COLING (2016)
Google Scholar
Krishna, A., Satuluri, P., Sharma, S., Kumar, A., Goyal, P.: Compound type identification in sanskrit: what roles do the corpus and grammar play? WSSANLP 2016, 1 (2016)
Google Scholar
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)
Google Scholar
Manning, C.D., Schütze, H., et al.: Foundations of Statistical Natural Language Processing, vol. 999. MIT Press, Cambridge (1999)
Google Scholar
Mikolov, T., Kopecky, J., Burget, L., Glembek, O., et al.: Neural network based language models for highly inflective languages. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 2009, pp. 4725–4728. IEEE (2009)
Google Scholar
Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Google Scholar
Mikolov, T., Yih, W.T., Zweig, G.: Linguistic regularities in continuous space word representations. In: Hlt-naacl, vol. 13, pp. 746–751 (2013)
Google Scholar
Mishra, A.: Modelling the grammatical circle of the pāṇinian system of Sanskrit grammar. In: Kulkarni, A., Huet, G. (eds.) ISCLS 2009. LNCS, vol. 5406, pp. 40–55. Springer, Heidelberg (2008). doi:10.1007/978-3-540-93885-9_4
Chapter Google Scholar
Nandi, D., Pati, D., Rao, K.S.: Implicit processing of LP residual for language identification. Comput. Speech Lang. 41, 68–87 (2017)
Article Google Scholar
Pandey, R.K., Jha, G.N.: Error analysis of sahit-a statistical Sanskrit-Hindi translator. Procedia Comput. Sci. 96, 495–501 (2016)
Article Google Scholar
Staal, J.: Sanskrit philosophy of language. In: History of Linguistic Thought and Contemporary Linguistics, pp. 102–136 (1976)
Google Scholar
Van Der Maaten, L.: Accelerating t-SNE using tree-based algorithms. J. Mach. Learn. Res. 15(1), 3221–3245 (2014)
MathSciNet MATH Google Scholar
Žalik, K.R.: An efficient k-means clustering algorithm. Pattern Recogn. Lett. 29(9), 1385–1391 (2008)
Article Google Scholar
Zass, R., Shashua, A.: Nonnegative sparse PCA. Adv. Neural Inf. Process. Syst. 19, 1561 (2007)
Google Scholar

Download references

Author information

Authors and Affiliations

USICT, GGS Indraprastha University, New Delhi, India
Ishank Sharma, Shrey Anand & Rinkaj Goyal
Covenant University, Ota, Nigeria
Sanjay Misra

Authors

Ishank Sharma
View author publications
You can also search for this author in PubMed Google Scholar
Shrey Anand
View author publications
You can also search for this author in PubMed Google Scholar
Rinkaj Goyal
View author publications
You can also search for this author in PubMed Google Scholar
Sanjay Misra
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sanjay Misra .

Editor information

Editors and Affiliations

University of Perugia, Perugia, Italy
Osvaldo Gervasi
University of Basilicata, Potenza, Italy
Beniamino Murgante
Covenant University, Ota, Nigeria
Sanjay Misra
University of Trieste, Trieste, Italy
Giuseppe Borruso
Polytechnic University of Bari, Bari, Italy
Carmelo M. Torre
University of Minho, Braga, Portugal
Ana Maria A.C. Rocha
Monash University, Clayton, Victoria, Australia
David Taniar
Kyushu Sangyo University, Fukuoka, Japan
Bernady O. Apduhan
Saint Petersburg State University, Saint Petersburg, Russia
Elena Stankova
University of Trieste, Trieste, Italy
Alfredo Cuzzocrea

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sharma, I., Anand, S., Goyal, R., Misra, S. (2017). Representing Contexual Relations with Sanskrit Word Embeddings. In: Gervasi, O., et al. Computational Science and Its Applications – ICCSA 2017. ICCSA 2017. Lecture Notes in Computer Science(), vol 10409. Springer, Cham. https://doi.org/10.1007/978-3-319-62407-5_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-62407-5_18
Published: 15 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-62406-8
Online ISBN: 978-3-319-62407-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics