Advertisement

Complex Search Queries in the Corpus Management System

  • Damir Mukhamedshin
  • Olga Nevzorova
  • Aidar Khusainov
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10449)

Abstract

This article discusses the advanced features of the newly developed search engine of the “Tugan tel” corpus management system. This corpus consists of texts written in the Tatar language. The new features include executing complex queries with arbitrary logical formulas for direct and reverse search; executing complex queries using a thesaurus or word form/lemma list and extracting some types of named entities.

Complex queries enable to automatically extract and annotate semantic data from a corpus for linguistic applications. These options improve the search process and also enable to test the lexicon and collocations in the corpus.

Notes

Acknowledgment

The reported study was funded by Russian Science Foundation (research project № 16-18-02074).

References

  1. 1.
    Aibaidulla, Y., Lua, K.T.: The development of tagged Uyghur corpus. In: Proceedings of PACLIC17, pp. 1–3 (2003)Google Scholar
  2. 2.
    Anthony, L.: AntConc: a learner and classroom friendly, multi-platform corpus analysis toolkit. In: Proceedings of IWLeL 2004: An Interactive Workshop on Language e-Learning, pp. 7–13 (2004)Google Scholar
  3. 3.
    Kilgarriff, A., Baisa, V., Bušta, J., Jakubíček, M., Kovář, V., Michelfeit, J., Suchomel, V.: The Sketch Engine: ten years on. Lexicography 1(1), 7–36 (2014)CrossRefGoogle Scholar
  4. 4.
    Křen, M.: Recent developments in the Czech National Corpus. In: Proceedings of the 3rd Workshop on Challenges in the Management of Large Corpora (CMLC-3), pp. 1–4 (2015)Google Scholar
  5. 5.
    Scott, M.: Wordsmith Tools. Oxford University Press, Oxford (1996)Google Scholar
  6. 6.
    Asahara, M., Maekawa, K., Imada, M., Kato, S., Konishi, H.: Archiving and analysing techniques of the ultra-large-scale web-based Corpus Project of NINJAL, Japan. Alexandria 25(1–2), 129–148 (2014)CrossRefGoogle Scholar
  7. 7.
    Kouklakis, G., Mikros, G., Markopoulos, G., Koutsis, I.: Corpus manager a tool for multilingual corpus analysis. In: Proceedings of Corpus Linguistics Conference 2007. http://www.birmingham.ac.uk/documents/college-artslaw/corpus/conference-archives/2007/244Paper.pdf
  8. 8.
    Nevzorova, O., Mukhamedshin, D., Kurmanbakiev, M.: Semantic aspects of metadata representation in corpus manager system. In: Open Semantic Technologies for Intelligent Systems (OSTIS-2016), pp. 371–376 (2016)Google Scholar
  9. 9.
    Suleymanov, D., Nevzorova, O., Gatiatullin, A., Gilmullin, R., Hakimov, B.: National corpus of the Tatar language “Tugan Tel”: grammatical annotation and implementation. Proc. Soc. Behav. Sci. 95, 68–74 (2013)CrossRefGoogle Scholar
  10. 10.
    Zakharov, V.: Corpora of the Russian language. In: Habernal, I., Matoušek, V. (eds.) TSD 2013. LNCS, vol. 8082, pp. 1–13. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-40585-3_1CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Damir Mukhamedshin
    • 1
  • Olga Nevzorova
    • 1
  • Aidar Khusainov
    • 1
  1. 1.Institute of Applied SemioticsTatarstan Academy of SciencesKazanRussia

Personalised recommendations