Skip to main content

Towards a Unified Exploitation of Electronic Dialectal Corpora: Problems and Perspectives

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8655))

Abstract

In this paper, we deal with the problem of storing and retrieving dialectal data in a unified framework. In particular, we discuss issues concerning the design and implementation of a multimedia database which will contain written and oral data from three Greek dialects in Asia Minor. At first, we describe the overall architecture of a system aiming at providing the user with the possibility to store audio recordings, text transcripts, and other annotations. Then we discuss the possibilities and limitations of a retrieval module aiming at combining different linguistic levels for a unified exploitation of oral and written corpora.

This research has been co-financed by the European Union (European Social Fund - ESF) and Greek national funds through the Operational Program ”Education and Lifelong Learning” of the National Strategic Reference Framework (NSRF) - Research Funding Program: Thalis. Investing in knowledge society through the European SocialFund.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Anhoj, J.: Generic Design of Web-Based Clinical Databases. Journal Medical Internet Research 4 (2003)

    Google Scholar 

  2. Barbiers, S., et al.: Dynamic Syntactic Atlas of the Dutch dialects (DynaSAND). Meertens Institute, Amsterdam (2006), http://www.meertens.knaw.nl/sand/

    Google Scholar 

  3. Boersma, P.: The use of Praat in corpus research. In: Jacques Durand, J., Gut, U., Kristofferson, G. (eds.) Handbook of Corpus Phonology, OUP, Oxford (2012)

    Google Scholar 

  4. Boersma, P., Weenink, D.: Praat: Doing phonetics by computer (2013), http://www.praat.org

  5. Buttcher, S., Clarke, C., Cormack, G.: Information Retrieval: Implementing and Evaluating Search Engines. MIT Press, Cambridge (2010)

    Google Scholar 

  6. ELAN: Max Planck Institute for Psycholinguistics, The Language Archive, Nijmegen, The Netherlands, http://tla.mpi.nl/tools/tla-tools/elan/

  7. Fromont, R., Hay, J.: ONZE Miner: the development of a browser-based research tool. Corpora 3(2), 173–193 (2008)

    Article  Google Scholar 

  8. Galiotou, E., Karanikolas, N., Manolessou, I., Pantelidis, N., Papazachariou, D., Ralli, A., Xydopoulos, G.: Asia Minor Greek: Towards a Computational Processing. In: Procedia: Social and Behavioral Science. Elsevier (in press, 2014)

    Google Scholar 

  9. Johnson, S.B., Chatziantoniou, D.: Extended SQL for manipulating clinical warehouse data. In: AMIA 1999, pp. 819–823 (1999)

    Google Scholar 

  10. IPA chart, http://www.langsci.ucl.ac.uk/ipa/ipachart.html

  11. Karanikolas, N.N., Galiotou, E., Xydopoulos, G.J., Ralli, A., Athanasakos, K., Koronakis, G.: Structuring a Multimedia tridialectal dictionary. In: Habernal, I. (ed.) TSD 2013. LNCS, vol. 8082, pp. 509–518. Springer, Heidelberg (2013)

    Google Scholar 

  12. Koliopoulou, M., Markopoulos, T., Pantelidis, N.: Pontus, Cappadocia, Aivali: Challenges of a digital corpus of written material. In: The 11th International Conference of Greek Linguistics, Rhodes (September 2013) (in Greek)

    Google Scholar 

  13. Koutsoukos, N., Ralli, A.: From derivation to inflection: a process of grammaticalization. In: Morphology Meeting 2012. Leiden, the Netherlands (2012)

    Google Scholar 

  14. LaBB-CAT (formerly ONZE Miner), http://onzeminer.sourceforge.net/

  15. Manolessou, I., Beis, S., Bassea-Bezantakou: The phonetic transcription of Modern Greek dialects. Lexicographicon Deltion 26, 161–222 (2012) (in Greek)

    Google Scholar 

  16. Nadkarni, P.: Clinical Patient Record Systems Architecture: An Overview. Journal of Postgraduate Medicine 46(3), 199–204 (2000)

    Google Scholar 

  17. Nadkarni, P.: An introduction to entity-attribute-value design for generic clinical study data management systems. Presentation in: National GCRC Meeting, Baltimore, MD (2002)

    Google Scholar 

  18. Nerbonne, J., Kleiweg, P.: Lexical distance in LAMSAS. Computers and the Humanities 37(3), 339–357 (2003)

    Article  Google Scholar 

  19. Ralli, A., Papazachariou, D., Karasimos, A.: Laboratory of Modern Greek Dialects and the project GreeD. In: Ralli, A., et al. (eds.) Proc. 4th Int. Conf. of Modern Greek Dialects and Linguistic Theory (2010)

    Google Scholar 

  20. Sloetjes, H., Wittenburg, P.: Annotation by category - ELAN and ISO DCR. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008 (2008)

    Google Scholar 

  21. Themistocleous, C., Katsogiannou, M., Armosti, S., Christodoulou, K.: Cypriot Greek Lexicography: An Online Lexical Database. In: Proceedings of Euralex, pp. 889–891 (2012)

    Google Scholar 

  22. Wallis, S., Nelson, G.: Knowledge discovery in grammatically analyzed corpora. Data Mining & Knowledge Discovery 5, 305–335 (2001)

    Article  MATH  Google Scholar 

  23. Wells, J.C.: ’SAMPA computer readable phonetic alphabet’. In: Gibbon, D., Moore, R., Winski, R. (eds.) Handbook of Standards and Resources for Spoken Language Systems 1997, Part IV, section B. Mouton de Gruyter, Berlin (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Karanikolas, N.N., Galiotou, E., Ralli, A. (2014). Towards a Unified Exploitation of Electronic Dialectal Corpora: Problems and Perspectives. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-10816-2_32

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-10815-5

  • Online ISBN: 978-3-319-10816-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics