Skip to main content

A Parallel Greek-Bulgarian Corpus: A Digital Resource of the Shared Cultural Heritage

  • Conference paper
  • First Online:
Language Technology for Cultural Heritage

Abstract

There has been a long tradition in the digitization and manual documentation of cultural heritage data, yet the need for indexing and retrieval that goes beyond mere bibliographic information has only recently been recognized. This chapter reports on completed work aimed at highlighting textual cultural resources that, as of yet, remain under-exploited by creating the necessary infrastructure with the support and customization of Language Technologies (LT). The ultimate goal was to promote the study of cultural heritage of the neighboring areas of Greece and Bulgaria and to raise awareness about their common cultural identity, the focus being on literature, folklore and language. To this end, a bilingual collection of literary and folklore texts in Greek and Bulgarian was developed along with a number of accompanying resources. The authors present the methodology adopted for the automatic annotation of the textual data at various levels of linguistic analysis elaborating on the Greek and Bulgarian text processing tools that are integrated in the cross-lingual search and retrieval mechanisms, and discuss issues and problems encountered in the course of the project life-cycle.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aarne, A.: The Types of the Folktale: A Classification and Bibliography., 2nd rev. ed. edn. Suomalainen Tiedeakatemia / FF Communications, Helsinki (1961). Translated and Enlarged by Stith Thompson.

    Google Scholar 

  2. Bontcheva, K., Maynard, D., Cunningham, H., Saggion, H.: Using human language technology for automatic annotation and indexing of digital library content. In: Proc. of the 6th European Conference on Research and Advanced Technology for Digital Libraries., Lecture Notes In Computer Science, vol. 2458, pp. 613–625 (2002)

    Google Scholar 

  3. Borin, L., Forsberg, M., Kokkinakis, D.: Diabase: Towards a diachronic BLARK in support of historical studies. In: Proc. of LREC (2010)

    Google Scholar 

  4. Borin, L., Kokkinakis, D., Olsson, L.J.: Naming the past: Named entity and animacy recognition in the 19th century swedish literature. In: Proc. of the ACL Workshop: Language Technology for Cultural Heritage Data (LaTeCH.)., pp. 1–8. ACL, Prague (2007)

    Google Scholar 

  5. Boutsis, S., Prokopidis, P., Giouli, V., Piperidis., S.: A robust parser for unrestricted greek text. In: Proc. of the 2nd Language and Resources Evaluation Conference, pp. 467–473. Athens, Greece (2000)

    Google Scholar 

  6. Brill, E.: A corpus-based approach to language learning. Ph.D. thesis, University of Pennsylvania (1997)

    Google Scholar 

  7. Crane, G.: Cultural heritage digital libraries: Needs and components. In: Proc. of the 6th European Conference on Research and Advanced Technology for Digital Libraries., Lecture Notes In Computer Science, vol. 2458, pp. 51–60 (2002)

    Google Scholar 

  8. Georgantopoulos, B., Piperidis, S.: Term-based identification of sentences for text summarization. In: Proceedings of LREC2000 (2000)

    Google Scholar 

  9. Giouli, V., Konstandinidis, A., Desypri, E., Papageorgiou., H.: Multi-domain multi-lingual named entity recognition: Revisiting & grounding the resources issue. In: Proceedings of LREC 2006 (2006)

    Google Scholar 

  10. IMDI: Metadata elements for session descriptions, version 2.1 (June 2001)

    Google Scholar 

  11. IMDI: Metadata elements for session descriptions, version 3.0.4 (Sept. 2003). http://www.mpi.nl/IMDI/documents/Proposals/IMDI_MetaData_3.0.4.pdf. Accessed 22.01.2007.

  12. Liddy, E.D., Allen, E., Harwell, S., Corieri, S., Yilmazel, O., Ozgencil, N., Diekema, A., McCracken, N., Silverstein, J., Sutton, S.: Automatic metadata generation & evaluation. In: The 25th Annual International ACM-SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002), pp. 401–402. Tampere, Finland (2002)

    Google Scholar 

  13. Nissim, M., Matheson, C., Reid, J.: Recognizing geographical entities in scottish historical documents. In: Proc. of the Workshop on Geographic Information Retrieval at SIGIR 2004 (2004)

    Google Scholar 

  14. Papageorgiou, H., Cranias, L., Piperidis., S.: Automatic alignment in parallel corpora. In: Proceedings of ACL 1994 (1994)

    Google Scholar 

  15. Papageorgiou, H., Prokopidis, P., Giouli, V., Demiros, I., Konstantinidis, A., Piperidis, S.: Multi-level XML-based corpus annotation. In: Proceedings of the 3nd Language and Resources Evaluation Conference (2002)

    Google Scholar 

  16. Papageorgiou, H., Prokopidis, P., Giouli, V., Piperidis, S.: A unified pos tagging architecture and its application to greek. In: Proceedings of the 2nd Language and Resources Evaluation Conference, pp. 1455–1462. Athens, Greece (2000)

    Google Scholar 

  17. Piperidis, S.: Interactive corpus based translation drafting tool. In: ASLIB Proceedings, vol. 47(3) (1995)

    Google Scholar 

  18. Raptis, S., Spais, I., Tsiakoulis., P.: A tool for enhancing web accessibility: Synthetic speech and content restructuring. In: Proc. HCII 2005: 11th International Conference on Human-Computer Interaction. Las Vegas, Nevada, USA (2005)

    Google Scholar 

  19. Simov, K., Osenova, P.: A hybrid system for MorphoSyntactic disambiguation in Bulgarian. In: Proc. of the RANLP 2001 Conference, pp. 288–290. Tzigov Chark, Bulgaria (2001)

    Google Scholar 

  20. Witte, R., Gitzinger, T., Kappler, T., Krestel, R.: A semantic Wiki approach to cultural heritage data management. In: Language Technology for Cultural Heritage Data (LaTeCH 2008), Workshop at LREC 2008. Marrakech, Morocco (2008)

    Google Scholar 

Download references

Acknowledgements

The work presented here was conducted in the framework of a project funded under the Community Initiative Programme INTERREG III A / PHARE CBC Greece – Bulgaria. The project was implemented by the Institute for Language and Speech Processing (ILSP, http://www.ilsp.gr) and a group of researchers from the Bulgarian Academy of Sciences, (http://www.bultreebank.org/).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Voula Giouli .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Giouli, V., Simov, K., Osenova, P. (2011). A Parallel Greek-Bulgarian Corpus: A Digital Resource of the Shared Cultural Heritage. In: Sporleder, C., van den Bosch, A., Zervanou, K. (eds) Language Technology for Cultural Heritage. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20227-8_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20227-8_6

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20226-1

  • Online ISBN: 978-3-642-20227-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics